Python Web Scraping

Python Requests Module Tutorial

Requests is a popular apache2 licensed module in Python that can be used to interact with HTTP servers such as world wide web servers to download content that can be used for parsing websites or automatically posting to web forms. You can make a GET request, a POST request, passing parameters in URLs, get response content and addition of custom headers.

In this article, we’ll look at the Requests module in python and its basic operation with some examples and then finally we will conclude.

Installation

Python versions 2.6-2.7 and 3.3-3.6 supports the request module. Request is an external module so you have to install it by writing the following in your command prompt or terminal:

$ pipenv install requests

Before we move on you need to make sure of two things:

– Requests library is installed properly if not follow the link (http://docs.python-requests.org/en/master/user/install/#install)

-Requests library is up-to-date if not follow the link to check (http://docs.python-requests.org/en/master/community/updates/#updates)

GET and POST Requests

Start off by importing requests. Now we are going to try to get a webpage using get request.

Import requests
R_webpage = requests.get(‘http://www.dataversity.net/’)

R_webpage is a response object. All the information about the web page can be extracted from this object.

Now, if you want to make a post request:

import requests
R_post = requests.post(‘http://www.dataversity.net/’, data = {‘key’ : ‘value’})

See how easy it is to make requests. Let’s move on to passing parameters in URLs:

Passing parameters in URLs

Parameters in URLs can be passed in a formal way. Requests allow us to give these arguments as a dictionary of strings. params is the keyword to use in the arguments for that purpose.
See the following example to get a clear idea:

import requests
R_par = requests.get(‘http://www.dataversity.net, params = {‘key0’ : ‘value0’ ,
‘key1’ : ‘value1’})
print(R_par.url)

print statement helps identify if the URL has been encoded correctly.

Response Content

Response of the server can be viewed completely as text:

import requests
R_Content = requests.get(‘http://www.dataversity.net)
R_Content.text

The complete text will be decoded after getting it from the server and displayed as text.

Custom Headers

Custom headers can be added to requests. headers is the parameter which will have a dictionary passed by argument in order to specify the header.

Import requests
R_head = requests.get(‘http://www.dataversity.net, headers= {‘key’ : ‘value’})

In place of key and value, you can put your desired values throughout.

Conclusion

You were given a basic introduction of Python request module along with its working. Now, if you practice the above given examples on your own and add, eliminate and substitute things then you will get a better idea of its working. If you have made it here, congratulations because you have learned how to make basic requests to a server, passing parameters or arguments to the URLs, getting response content and showing it and passing custom headers. This will be very useful when you are trying to scrape webpages for information.

About the author

Talha Saif Malik

I’m a computer scientist currently pursuing my Masters in Computer Science from COMSATS Institute of Information Technology Islamabad which is No.1 ranked IT University in Pakistan. I have done research in multiple domains and I’m well aware of WHATs and HOWs of research. My aim is to contribute in the research world as much as possible and change the world.