Puppeteer examples

In this post, you can find examples that show how to perform some common tasks with Puppeteer in Node. You can take a look at what we’ll cover here in the table of contents below. Let’s start with the recommended structure for your project.

Table of contents

Project Structure

Write your code inside an async IIFE in the index.js:

// index.js
(async () => {
  // Your code goes here.
})();

Or create a new file:

// screenshot.js
module.exports = async () => {
  // Your code goes here.
};

Also, a nodemon script can be useful:

// package.json
{
  "scripts": {
    "start": "nodemon src/index.js --ignore data"
  }
}

Launch and create a new page

Launch a browser and open a new page:

const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();

Puppeteer class documentation

Emulate a device

You can emulate a device with page.emulate. This is a list with the available devices.

const devices = require("puppeteer/DeviceDescriptors");
await page.emulate(devices["iPhone X"]);

Set page viewport

Instead of emulating a device, you can set the page viewport with page.setViewport:

await page.setViewport({ width: 1920, height: 1080 });

Visit a URL

After you set the device or the viewport, you can visit a URL with page.goto

await page.goto(url, options);

A screenshot script

You can take screeshots of a page with page.screenshot. Below you can find a screenshot script where I put all the previous sections together:

// screenshot.js

const puppeteer = require("puppeteer");

module.exports = async ({
  url = "https://example.com",
  filename = null,
  fullPage = false,
  device = null,
  headless = true
} = {}) => {
  const browser = await puppeteer.launch({ headless });
  const page = await browser.newPage();

  if (typeof device === "string") {
    let devices = require("puppeteer/DeviceDescriptors");
    await page.emulate(devices[device]);
  } else {
    await page.setViewport({ width: 1920, height: 1080 });
  }

  await page.goto(url);
  await page.screenshot({    path: filename      ? `screenshots/${filename}.png`      : `screenshots/${new Date().getTime()}.png`,    fullPage  });
  await browser.close();
};

You should also create the directories somewhere:

// index.js
const fs = require("fs");
const mkdirp = require("mkdirp");

// Create directories.
const directories = ["screenshots"];
directories.forEach(dir => {
  if (!fs.existsSync(dir)) mkdirp(`./${dir}/`);
});

Close the browser

Don’t forget to close the browser after you finish your work:

await browser.close();

Create pdfs

Besides screenshots, you can also create pdfs with page.pdf:

await page.pdf({
  path: filename ? `pdfs/${filename}.pdf` : `pdfs/${new Date().getTime()}.pdf`,
  format: "A4"
});

Get access to the window object

If you want access to the window or document objects—to scrape some information from the page, for example—you can use the page.evaluate method:

const imageUrls = await page.evaluate(() => {
  const images = document.querySelectorAll("article img");
  const urls = Array.from(images).map(({ src }) => ({ src }));
  return urls;
});

The function you pass to evaluate is not a closure. As a result, it doesn’t have access to variables defined in the parent scope. Because of that, if you want to use an outside variable (a selector, for example) inside the function, you have to pass that variable as an argument to evaluate:

const imageSelector = "article img";

const imageUrls = await page.evaluate(selector => {  const images = document.querySelectorAll(selector);  const urls = Array.from(images).map(({ src }) => ({ src }));
  return urls;
}, imageSelector);

By the way, you can’t pass an external library (get-urls, for example) as an argument. Instead, assign the result of the evaluate to a variable and use the library outside.

Type and click

If you want to type some text inside an input, use the page.type method:

await page.type("#searchbox input", "Headless Chrome");

And if you want to click something, use the page.click method:

await page.click("my-button");

Wait for selector to appear

After you perform an action, you can wait for a selector to appear before you proceed any further. You can do that with the page.waitForSelector method:

await page.waitForSelector("img");

Page events

Handle events emited by the Page with the page.on or page.once methods:

page.once('load', () => console.log('Page loaded.')

Use on to handle every event and once to handle only the first. Additionally, you can use the page.removeListener() method to remove the listener:

const requestLogger = request => console.log(request.url());
page.on("request", requestLogger);
// later
page.removeListener(requestLogger);

Skip requests

You can skip requests by enabling request interception with page.setRequestInterception(true). You then listen for the request event, and you call request.abort(), request.continue(), or request.respond():

await page.setRequestInterception(true);
page.on("request", request => {
  if (request.resourceType() === "image") request.abort();
  else request.continue();
});

Other things to read

Popular posts

Other notes