Puppeteer examples

26 August, 2019

Time to read: 2 mins

Table of contents

Project Structure
Launch and create a new page
Emulate a device
Set page viewport
Visit a URL
A screenshot script
Close the browser
Create pdfs
Get access to the window object
Type and click
Wait for selector to appear
Page events
Skip requests
Further Reading

This post will help you get started with Puppeteer in Node and learn how to perform some common tasks. Let’s start with a recommended structure for your project.

Project Structure

Write your code inside an async IIFE in the index.js:

index.js

(async () => {
  // Your code goes here.
})();

Or create a new file that you will import:

screenshot.js

module.exports = async () => {
  // Your code goes here.
};

Also, a nodemon script can be useful here to re-run your code after you make changes:

package.json

{
  "scripts": {
    "start": "nodemon src/index.js --ignore data"
  }
}

Emulate a device

You can emulate a device with page.emulate. This is a list with the available devices.

const devices = require("puppeteer/DeviceDescriptors");
await page.emulate(devices["iPhone X"]);

Set page viewport

Instead of emulating a device, you can set the page viewport with page.setViewport:

await page.setViewport({ width: 1920, height: 1080 });

Visit a URL

After you set the device or the viewport, you can visit a URL with page.goto

await page.goto(url, options);

A screenshot script

You can take screenshots of a page with page.screenshot. Below you can find a screenshot script where I put all the previous sections together:

screenshot.js

const puppeteer = require("puppeteer");

module.exports = async ({
  url = "https://example.com",
  filename = null,
  fullPage = false,
  device = null,
  headless = true,
} = {}) => {
  const browser = await puppeteer.launch({ headless });
  const page = await browser.newPage();

  if (typeof device === "string") {
    let devices = require("puppeteer/DeviceDescriptors");
    await page.emulate(devices[device]);
  } else {
    await page.setViewport({ width: 1920, height: 1080 });
  }

  await page.goto(url);
  await page.screenshot({
    path: filename
      ? `screenshots/${filename}.png`
      : `screenshots/${new Date().getTime()}.png`,
    fullPage,
  });

  await browser.close();
};

You should also create the directories that you’ll store the screenshots, somewhere in your code:

index.js

const fs = require("fs");
const mkdirp = require("mkdirp");

// Create directories.
const directories = ["screenshots"];
directories.forEach((dir) => {
  if (!fs.existsSync(dir)) mkdirp(`./${dir}/`);
});

Close the browser

Don’t forget to close the browser after you finish your work:

await browser.close();

Create pdfs

Besides screenshots, you can also create pdfs with page.pdf:

await page.pdf({
  path: filename
    ? `pdfs/${filename}.pdf`
    : `pdfs/${new Date().getTime()}.pdf`,
  format: "A4",
});

Get access to the window object

If you want access to the window or document objects—to scrape some information from the page, for example—you can use the page.evaluate method:

const imageUrls = await page.evaluate(() => {
  const images = document.querySelectorAll("article img");
  const urls = Array.from(images).map(({ src }) => ({ src }));
  return urls;
});

The function you pass to evaluate is not a closure. As a result, it doesn’t have access to variables defined in the parent scope. Because of that, if you want to use an outside variable (a selector, for example) inside the function, you have to pass that variable as an argument to evaluate:

const imageSelector = "article img";

const imageUrls = await page.evaluate((selector) => {
  const images = document.querySelectorAll(selector);
  const urls = Array.from(images).map(({ src }) => ({ src }));
  return urls;
}, imageSelector);

By the way, you can’t pass an external library (get-urls, for example) as an argument. Instead, assign the result of the evaluate to a variable and use the library outside.

Type and click

If you want to type some text inside an input, use the page.type method:

await page.type("#searchbox input", "Headless Chrome");

And if you want to click something, use the page.click method:

await page.click("my-button");

Wait for selector to appear

After you perform an action, you can wait for a selector to appear before you proceed any further. You can do that with the page.waitForSelector method:

await page.waitForSelector("img");

Page events

Handle events emitted by the Page with the page.on or page.once methods:

page.once('load', () => console.log('Page loaded.')

Use on to handle every event and once to handle only the first. Additionally, you can use the page.removeListener() method to remove the listener:

const requestLogger = (request) => console.log(request.url());
page.on("request", requestLogger);
// later
page.removeListener(requestLogger);

Skip requests

You can skip requests by enabling request interception with page.setRequestInterception(true). You then listen for the request event, and you call request.abort(), request.continue(), or request.respond():

await page.setRequestInterception(true);
page.on("request", (request) => {
  if (request.resourceType() === "image") request.abort();
  else request.continue();
});