Puppeteer examples
Table of contents
This post will help you get started with Puppeteer in Node and learn how to perform some common tasks. Let’s start with a recommended structure for your project.
Project Structure
Write your code inside an async
IIFE in the index.js
:
(async () => {
// Your code goes here.
})();
Or create a new file that you will import:
module.exports = async () => {
// Your code goes here.
};
Also, a nodemon script can be useful here to re-run your code after you make changes:
{
"scripts": {
"start": "nodemon src/index.js --ignore data"
}
}
Launch and create a new page
Launch a browser and open a new page:
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
Emulate a device
You can emulate a device with page.emulate. This is a list with the available devices.
const devices = require("puppeteer/DeviceDescriptors");
await page.emulate(devices["iPhone X"]);
Set page viewport
Instead of emulating a device, you can set the page viewport with page.setViewport:
await page.setViewport({ width: 1920, height: 1080 });
Visit a URL
After you set the device or the viewport, you can visit a URL with page.goto
await page.goto(url, options);
A screenshot script
You can take screenshots of a page with page.screenshot. Below you can find a screenshot script where I put all the previous sections together:
const puppeteer = require("puppeteer");
module.exports = async ({
url = "https://example.com",
filename = null,
fullPage = false,
device = null,
headless = true,
} = {}) => {
const browser = await puppeteer.launch({ headless });
const page = await browser.newPage();
if (typeof device === "string") {
let devices = require("puppeteer/DeviceDescriptors");
await page.emulate(devices[device]);
} else {
await page.setViewport({ width: 1920, height: 1080 });
}
await page.goto(url);
await page.screenshot({
path: filename
? `screenshots/${filename}.png`
: `screenshots/${new Date().getTime()}.png`,
fullPage,
});
await browser.close();
};
You should also create the directories that you’ll store the screenshots, somewhere in your code:
const fs = require("fs");
const mkdirp = require("mkdirp");
// Create directories.
const directories = ["screenshots"];
directories.forEach((dir) => {
if (!fs.existsSync(dir)) mkdirp(`./${dir}/`);
});
Close the browser
Don’t forget to close the browser after you finish your work:
await browser.close();
Create pdfs
Besides screenshots, you can also create pdfs with page.pdf:
await page.pdf({
path: filename
? `pdfs/${filename}.pdf`
: `pdfs/${new Date().getTime()}.pdf`,
format: "A4",
});
Get access to the window object
If you want access to the window
or document
objects—to scrape some information from the page, for example—you can use the page.evaluate method:
const imageUrls = await page.evaluate(() => {
const images = document.querySelectorAll("article img");
const urls = Array.from(images).map(({ src }) => ({ src }));
return urls;
});
The function you pass to evaluate is not a closure. As a result, it doesn’t have access to variables defined in the parent scope. Because of that, if you want to use an outside variable (a selector, for example) inside the function, you have to pass that variable as an argument to evaluate:
const imageSelector = "article img";
const imageUrls = await page.evaluate((selector) => {
const images = document.querySelectorAll(selector);
const urls = Array.from(images).map(({ src }) => ({ src }));
return urls;
}, imageSelector);
By the way, you can’t pass an external library (get-urls, for example) as an argument. Instead, assign the result of the evaluate to a variable and use the library outside.
Type and click
If you want to type some text inside an input, use the page.type method:
await page.type("#searchbox input", "Headless Chrome");
And if you want to click something, use the page.click method:
await page.click("my-button");
Wait for selector to appear
After you perform an action, you can wait for a selector to appear before you proceed any further. You can do that with the page.waitForSelector method:
await page.waitForSelector("img");
Page events
Handle events emitted by the Page
with the page.on or page.once
methods:
page.once('load', () => console.log('Page loaded.')
Use on
to handle every event and once
to handle only the first. Additionally, you can use the page.removeListener()
method to remove the listener:
const requestLogger = (request) => console.log(request.url());
page.on("request", requestLogger);
// later
page.removeListener(requestLogger);
Skip requests
You can skip requests by enabling request interception with page.setRequestInterception(true). You then listen for the request event, and you call request.abort()
, request.continue()
, or request.respond()
:
await page.setRequestInterception(true);
page.on("request", (request) => {
if (request.resourceType() === "image") request.abort();
else request.continue();
});
Further Reading
- Puppeteer documentation.
- Web Performance Recipes With Puppeteer by Addy Osmani.
Other things to read
Popular
- Reveal animations on scroll with react-spring
- Gatsby background image example
- Extremely fast loading with Gatsby and self-hosted fonts