Web Programming Web Scraping

Puppeteer VS Selenium

Today when it comes to automated web testing, Puppeteer and Selenium are the two names that come up. One of the main reasons why they are well-known is their ability to execute headless browsers. Therefore before we proceed with the article, let’s have a quick look at what headless browsers are and their advantages.

In basic terms, headless browsers are browsers that can be used for testing usability of web pages and executing browser interactions just like you would with your regular browser. The only difference here is that there is no Graphical User Interface (GUI) and they are usually executed from the terminal.

Headless browsers:

  • help reduce resource usage greatly
  • they are faster
  • they are ideal for web scraping purposes
  • they can be used to monitor network application performance

Now that we have known a major factor for both tools, we can proceed.

Puppeteer

Puppeteer is a Node library from Google that provides a simple API to control headless Chrome. Through Puppeteer, common tasks such as typing in inputs, clicking on buttons, testing usability of web pages and even web scraping can be carried out easily.

Puppeteer is official from the Chrome team, and uses the Chrome Remote Debug Protocol, just as we would find with the Chrome Devtools. This library supports the modern JavaScript syntax available in Google Chrome.

Setup

Installing and getting started with Puppeteer is very easy. Since Puppeteer is a Node library, it can be installed using the npm tool.

Installation can be done with the command below:

npm i puppeteer

Running the command above installs Puppeteer. It is expected to also download a recent version of Chromium that would work with the API.

The size of Chromium is varies according to operating system:

  • ~170MB for Mac
  • ~282MB for Linux
  • ~280MB for Windows

After installation of Puppeteer, you can find out more information on how to get started, you can as well check out more code examples.

Features

While Puppeteer’s ability to launch a headless browser is one feature that has gained it some fame, that is not the only feature that makes it awesome. Puppeteer also has a couple of other features that makes it useful, let’s take a quick look at some of them.

Easy Automation:

While there are other tools that can be used for web automation, Puppeteer comes out tops. This is due to the fact that it works fine for one browser only, which is the Headless Chrome browser, therefore it carries out web automation tasks in the most efficient way possible. Puppeteer also works fine with popular unit testing libraries such as Mocha and Jasmine.

Screenshot Testing:

This is a vital feature for any automated web testing task. Screenshots are important, and help keep track of result of interactions with elements on a web page. Libraries such as Puppeteer-screenshot-tester also exist in Puppeteer that provides the capability of comparing screenshots generated while testing. Asides generating screenshots of tests, PDFs can also be generated from tested web pages in puppeteer.

Performance Testing:

Chrome provides DevTools that allow the recording of the Performance Timeline of web pages, and Puppeteer takes advantage of this too. With Puppeteer, timeline traces of websites can be captured to examine performance issues. Due to the Puppeteer’s high-level API control over Chrome Developers Tools Protocol, it gives users the ability to control service workers and test caching of websites.

Web Scraping:

A talk about features would not be completed without acknowledging the ability of Puppeteer to be used for web scraping purposes. Learning to use Puppeteer as a web scraper is quite easy, take a look at the API documentation.

Pros

  1. Works fine for visual testing.
  2. Great for end to end testing.
  3. Fast when compared to Selenium.
  4. Can take screenshots of webpages.
  5. More control over tests through Chrome.
  6. Can test offline mode.

Cons

  1. Supports only JavaScript (Node)
  2. Supports only Chrome

Selenium

Selenium is a powerful web testing framework, that has the capability of automating web applications for testing purposes. Selenium is also known for its ability to automate web based administration tasks.

Selenium comes in two parts; the Selenium WebDriver for creating powerful, browser based automation suites and test and the Selenium IDE for creating quick bug reproduction scripts.

Not forgetting that Selenium also supports headless browsers as seen with Puppeteer.

Setup

Unlike with Puppeteer, setting up Selenium is not straightforward. Selenium supports many languages and different browsers, therefore those possible conditions need to be taken care of.

Listed below are links to official tutorials on how to setup Selenium bindings for different languages.

Asides supporting different languages, Selenium also supports multiple browsers. Unlike Puppeteer which installs Chromium during installation, you may have to install web drivers for the web browser of your choice.

Here are links to web drivers for Mozilla Firefox and Google Chrome.

If you wish to use the Selenium IDE too, it also exists for multiple browsers. Here are links to Selenium IDE for Mozilla Firefox and Google Chrome.

Features

It’s ability to work with headless browsers has made it unarguably the most popular web automation tool, but there are other features that make it powerful.

Multi-Language Support:

This is one very important Selenium feature. With its multiple language support, more developers can get to use the tool for their web automation testing tasks.  While one may think its multi-language support would make it slow, Selenium still runs at a good speed as starting up a server in Web Driver is not required.

Multi-Platform Support:

The same way Selenium is not restricted by language barriers, it is also not restricted by platform barrier. It is no news that web application behave differently on multiple platforms. Selenium gives testers the ability to test across major web browsers to provide a smooth user experience for users across different browsers.  Asides browsers, Selenium can also be used to test on mobile such as Android, iOS, Windows, Blackberry apps.

Recording Tool:

With Selenium IDE, it is easy to record web automation tests. Selenium IDE allows testers make use of the recording capability as well as the autocomplete support and ability to navigate commands. The Recording Tool has stopped working on Firefox 55 and later versions, however there are other plugins on Firefox that serve the same purpose. Therefore, the ability to record tests remains a major Selenium feature.

Web Scraping:

While Selenium is used for testing web applications, it also scales well as a web scraper. Selenium can be used to scrape AJAX websites and the most difficult websites to scrape, provided you can understand the HTML structure. You can check out this tutorial on using Selenium for web scraping with Python.

Pros

  1. Multi-platform support.
  2. Multi-language support.
  3. Ability to record tests.
  4. Can take screenshots too.
  5. Huge community of users.

Cons

  1. Slow when compared to Puppeteer.
  2. Limited control over tests when compared to Puppeteer.

Conclusion

If you are not bothered about testing web pages on other platforms asides Chrome, then you are fine working with Puppeteer, provided you are able to work with JavaScript(Node). However if you are concerned about multiple platforms, then using Selenium is a no-brainer. Talking about their web scraping abilities, both tools even themselves out there. It should be noted though that Puppeteer could be faster than Selenium.

Any tool you choose at the end of the day should be fine, just enjoy writing your automation scripts.

About the author

Habeeb Kenny Shopeju

Habeeb Kenny Shopeju

I love building software, very proficient with Python and JavaScript. I'm very comfortable with the linux terminal and interested in machine learning. In my spare time, I write prose, poetry and tech articles.