Introduction to Puppeteer

Jan 31, 2020

Puppeteer is Google's officially supported library for controlling Chrome from Node.js. You can open Chrome from Node.js, navigate to Google, search for something, and see the results. Or you can run Puppeteer in headless mode and make it run in the background.

For example, here's how you can make Chrome load Google's home page using Puppeteer and Node.js:

const puppeteer = require('puppeteer');

run().then(() => console.log('Done')).catch(error => console.log(error));

async function run() {
  // Setting `headless: false` opens up a browser
  // window so you can watch what happens.
  const browser = await puppeteer.launch({ headless: false });

  // Open a new page and navigate to google.com
  const page = await browser.newPage();
  await page.goto('https://google.com');

  // Wait 5 seconds
  await new Promise(resolve => setTimeout(resolve, 5000));

  // Close the browser and exit the script
  await browser.close();
}

The output looks like this:

Evaluating JavaScript

Puppeteer pages have a handy evaluate() function that lets you execute JavaScript in the Chrome window. The evaluate() function is the most flexible way to interact with Puppeteer, because it lets you control Chrome using browser APIs like document.querySelector().

For example, the below script searches for "JavaScript" on Google, and gets the top 10 results.

const puppeteer = require('puppeteer');

// Run in the background (headless mode)
const browser = await puppeteer.launch({ headless: true });

// Navigate to Google
const page = await browser.newPage();
await page.goto('https://google.com');

// Type "JavaScript" into the search bar
await page.evaluate(() => {
  document.querySelector('input[name="q"]').value = 'JavaScript';
});

// Click on the "Google Search" button and wait for the page to load
await Promise.all([
  page.waitForNavigation(),
  page.evaluate(() => {
    document.querySelector('input[value="Google Search"]').click();
  })
]);

// Get all the search result URLs
const links = await page.evaluate(function getUrls() {
  return Array.from(document.querySelectorAll('a cite').values()).
    map(el => el.innerHTML);
});

await browser.close();

Using Puppeteer with a Local Web Server

Because Node.js uses an event loop, it is easy to start an Express server and connect Puppeteer to your Express server in the same script. This means it is easy to test Vue apps with Puppeteer.

const express = require('express');
const puppeteer = require('puppeteer');

// Start an Express app that renders a Vue app with a counter
const app = express();
app.get('/', function(req, res) {
  res.send(`
  <html>
    <body>
      <script src="https://unpkg.com/vue/dist/vue.js"></script>

      <div id="content"></div>

      <script type="text/javascript">      
        const app = new Vue({
          data: () => ({ count: 0 }),
          template: \`
            <div>
              <div id="count">
                Count: {{count}}
              </div>
              <button v-on:click="++count">Increment</button>
            </div>
          \`
        });
        app.$mount('#content');
      </script>
    </body>
  </html>
  `);
});
const server = await app.listen(3000);

// Run in the background (headless mode)
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('http://localhost:3000');

// Load the current count
let count = await page.evaluate(() => {
  return document.querySelector('#count').innerHTML.trim();
});
count; // 'Count: 0'

// Increment the count and check that the counter was incremented
await page.evaluate(() => {
  document.querySelector('button').click();
});

count = await page.evaluate(() => {
  return document.querySelector('#count').innerHTML.trim();
});
count; // 'Count: 1'

await browser.close();
await server.close();

Did you find this tutorial useful? Say thanks by starring our repo on GitHub!

More Fundamentals Tutorials