Selenium Web Scraping

Locating Elements by CSS Selectors with Selenium

Locating and selecting elements from the web page is the key to web scraping with Selenium. For locating and selecting elements from the web page, you can use CSS selectors in Selenium.In this article, I am going to show you how to locate and select elements from web pages using CSS selectors in Selenium with the Selenium python library. So, let’s get started.

Prerequisites:

To try out the commands and examples of this article, you must have,

1) A Linux distribution (preferably Ubuntu) installed on your computer.
2) Python 3 installed on your computer.
3) PIP 3 installed on your computer.
4) Python virtualenv package installed on your computer.
5) Mozilla Firefox or Google Chrome web browsers installed on your computer.
6) Must know how to install the Firefox Gecko Driver or Chrome Web Driver.

For fulfilling the requirements 4, 5, and 6, read my article Introduction to Selenium with Python 3 at Linuxhint.com.

You can find many articles on the other topics on LinuxHint.com. Be sure to check them out if you need any assistance.

Setting Up a Project Directory:

To keep everything organized, create a new project directory selenium-css-selector/ as follows:

$ mkdir -pv selenium-css-selector/drivers

Navigate to the selenium-css-selector/ project directory as follows:

$ cd selenium-css-selector/

Create a Python virtual environment in the project directory as follows:

$ virtualenv .venv

Activate the virtual environment as follows:

$ source .venv/bin/activate

Install Selenium Python library using PIP3 as follows:

$ pip3 install selenium

Download and install all the required web driver in the drivers/ directory of the project. I have explained the process of downloading and installing web drivers in my article Introduction to Selenium with Python 3. If you need any assistance, search on LinuxHint.com for that article.

Get CSS Selector using Chrome Developer Tool:

In this section, I am going to show you how to find the CSS selector of the web page element you want to select with Selenium using the built-in Developer Tool of the Google Chrome web browser.

To get the CSS selector using the Google Chrome web browser, open Google Chrome and visit the web site from which you want to extract data. Then, press the right mouse button (RMB) on an empty area of the page and click on Inspect to open the Chrome Developer Tool.

You can also press <Ctrl> + Shift + I to open the Chrome Developer Tool.

Chrome Developer Tool should be opened.

To find the HTML representation of your desired web page element, click on the Inspect() icon as marked in the screenshot below.

Then, hover over your desired web page element and press the left mouse button (LMB) to select it.

The HTML representation of the web element you have selected will be highlighted in the Elements tab of Chrome Developer Tool as you can see in the screenshot below.

To get the CSS selector of your desired element, select the element from the Elements tab of Chrome Developer Tool and right-click (RMB) on it. Then, select Copy > Copy selector as marked in the screenshot below.

I have pasted the CSS selector in a text editor. The CSS selector looks as shown in the screenshot below.

Get CSS Selector using Firefox Developer Tool:

In this section, I am going to show you how to find the CSS selector of the web page element you want to select with Selenium using the built-in Developer Tool of the Mozilla Firefox web browser.

To get the CSS selector using the Firefox web browser, open Firefox and visit the web site from which you want to extract data. Then, press the right mouse button (RMB) on an empty area of the page and click on Inspect Element (Q) to open the Firefox Developer Tool.

Firefox Developer Tool should be opened.

To find the HTML representation of your desired web page element, click on the Inspect() icon as marked in the screenshot below.

Then, hover over your desired web page element and press the left mouse button (LMB) to select it.

The HTML representation of the web element you have selected will be highlighted in the Inspector tab of Firefox Developer Tool as you can see in the screenshot below.

To get the CSS selector of your desired element, select the element from the Inspector tab of Firefox Developer Tool and right-click (RMB) on it. Then, select Copy > CSS selector as marked in the screenshot below.

The CSS selector of your desired element should look something like this.

Extracting Data using CSS Selector with Selenium:

In this section, I am going to show you how to select web page elements and extract data from them using CSS selectors with Selenium Python library.

First, create a new Python script ex00.py and type in the following lines of codes.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
options = webdriver.ChromeOptions()
options.headless = True
browser = webdriver.Chrome(executable_path="./drivers/chromedriver", options=options)
browser.get("https://www.unixtimestamp.com/")
timestamp = browser.find_element_by_css_selector('h3.text-danger:nth-child(3)')
print('Current timestamp: %s' % (timestamp.text.split(' ')[0]))
browser.close()

Once you’re done, save the ex00.py Python script.

Line 1-3 imports all the required Selenium components.

Line 5 creates a Chrome Options object and line 6 enables headless mode for the Chrome web browser.

Line 8 creates a Chrome browser object using the chromedriver binary from the drivers/ directory of the project.

Line 10 tells the browser to load the website unixtimestamp.com.

Line 12 finds the element that has the timestamp data from the page using CSS selector and stores it in the timestamp variable.

Line 13 parses the timestamp data from the element and prints it on the console.

This is how the HTML structure of the UNIX timestamp data in unixtimestamp.com looks like.

Line 14 closes the browser.

Run the Python script ex00.py as follows:

$ python3 ex00.py

As you can see, the timestamp data is printed on the screen.

Here, I have used the browser.find_element(By, selector) method.

As we are using CSS selectors, the first parameter will be By.CSS_SELECTOR and the second parameter will be the CSS selector itself.

Instead of browser.find_element() method, you can also use browser.find_element_by_css_selector(selector) method. This method only needs a CSS selector to work. The result will be the same.

The browser.find_element() and browser.find_element_by_css_selector() methods are used to find and select a single element from the web page. If you want to find and select multiple elements using the CSS selectors, then you have to use browser.find_elements() and browser.find_elements_by_css_selector() methods.

The browser.find_elements() method takes the same arguments as the browser.find_element() method.

The browser.find_elements_by_css_selector() method takes the same argument as the browser.find_element_by_css_selector() method.

Let’s see an example of extracting a list of names using CSS selectors from random-name-generator.info with Selenium.

As you can see, the unordered list has the class name nameList. So, we can use the CSS selector .nameList li to select all the names from the web page.

Let’s go through an example of selecting multiple elements from the web page using CSS selectors.

Create a new Python script ex01.py and type in the following lines of codes in it.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
options = webdriver.ChromeOptions()
options.headless = True
browser = webdriver.Chrome(executable_path="./drivers/chromedriver", options=options)
browser.get("http://random-name-generator.info/")
names = browser.find_elements(By.CSS_SELECTOR, '.nameList li')
for name in names:
print(name.text)
browser.close()

Once you’re done, save the ex01.py Python script.

Line 1-8 is the same as in ex00.py Python script. So, I am not going to explain them here again.

Line 10 tells the browser to load the website random-name-generator.info.

Line 12 selects the name list using the browser.find_elements() method. This method uses the CSS selector .nameList li to find the name list. Then, the name list is stored in the names variable.

In lines 13 and 14, a for loop is used to iterate through the names list and print the names on the console.

Line 16 closes the browser.

Run the Python script ex01.py as follows:

$ python3 ex01.py

As you can see, the names are extracted from the web page and printed on the console.

Instead of using the browser.find_elements() method, you can also use the browser.find_elements_by_css_selector() the method as before. This method only needs a CSS selector to work. The result will be the same.

Basics of CSS Selectors:

You can always find the CSS selector of a web page element using the Developer Tool of Firefox or Chrome web browser. This auto-generated CSS selector may not be what you want. At times you may have to write your CSS selector.

In this section, I am going to talk about the basics of CSS selectors so that you can understand what a certain CSS selector is selecting from a web page and write your custom CSS selector if needed.

If you want to select an element from the web page using the ID message, the CSS selector will be #message.

The CSS selector .green will select an element using a class name green.

If you want to select an element (class msg) inside another element (class container), the CSS selector will be .container .msg

The CSS selector .msg.success will select the element which has two CSS classes msg and success.

To select all the p tags, you can use the CSS selector p.

To select only the p tags inside the div tags, you can use the CSS selector div p

To select the p tags which are the direct siblings of the div tags, you can use the CSS selector div > p

To select all the span and p tags, you can use the CSS selector p, span

To select the p tag immediately after the div tag, you can use the CSS selector div + p

To select the p tag after the div tag, you can use the CSS selector div ~ p

To select all the p tags that have the class name msg, you can use the CSS selector p.msg

To select all the span tags that have the class name msg, you can use the CSS selector span.msg

To select all the elements that have the attribute href, you can use the CSS selector [href]

To select the element that has the attribute name and the value of the name attribute is username, you can use the CSS selector [name=”username”]

To select all the elements that have the attribute alt and the value of the alt attribute containing the substring vscode, you can use the CSS selector [alt~=”vscode”]

To select all the elements that have the href attribute and the value of the href attribute starts with the string https, you can use the CSS selector [href^=”https”]

To select all the elements that have the href attribute and the value of the href attribute ending with the string .com, you can use the CSS selector [href$=”.com”]

To select all the elements that have the href attribute and the value of the href attribute has the substring google, you can use the CSS selector [href*=”google”]

If you want to select the first li tag inside the ul tag, you can use the CSS selector ul li:first-child

If you want to select the first li tag inside the ul tag, you can also use the CSS selector ul li:nth-child(1)

If you want to select the last li tag inside the ul tag, you can use the CSS selector ul li:last-child

If you want to select the last li tag inside the ul tag, you can also use the CSS selector ul li:nth-last-child(1)

If you want to select the second li tag inside the ul tag starting from the beginning, you can use the CSS selector ul li:nth-child(2)

If you want to select the third li tag inside the ul tag starting from the beginning, you can use the CSS selector ul li:nth-child(3)

If you want to select the second li tag inside the ul tag starting from the end, you can use the CSS selector ul li:nth-last-child(2)

If you want to select the third li tag inside the ul tag starting from the end, you can use the CSS selector ul li:nth-last-child(3)

These are the most common CSS selectors. You will find yourself using these almost on every Selenium projects. There are many more CSS selectors. You can find a list of all of them in the w3schools.com CSS Selectors Reference.

Cnclusion:

In this article, I have shown how to locate and select web page elements using CSS selectors with Selenium. I have also discussed the basics of CSS selectors. You should be able to use CSS selectors comfortably for your Selenium projects.

About the author

Shahriar Shovon

Shahriar Shovon

Freelancer & Linux System Administrator. Also loves Web API development with Node.js and JavaScript. I was born in Bangladesh. I am currently studying Electronics and Communication Engineering at Khulna University of Engineering & Technology (KUET), one of the demanding public engineering universities of Bangladesh.