GitHub Python

Interfacing with GitHub API using Python 3

GitHub as a web application is a huge and complex entity. Think about all the repositories, users, branches, commits, comments, SSH keys and third party apps that are a part of it. Moreover, there are multiple ways of communicating with it. There are desktop apps for GitHub, extensions for Visual Studio Code and Atom Editor, git cli, Android and iOS apps to name a few.

People at GitHub, and third party developers alike, can’t possibly manage all this complexity without a common interface. This common interface is what we call the GitHub API. Every GitHub utility like a cli, web UI, etc uses this one common interface to manage resources (resources being entities like repositories, ssh keys, etc).

In this tutorial we will learn a few basics of how one interfaces with an API using GitHub API v3 and Python3. The latest v4 of GitHub API requires you to learn about GraphQL which results in steeper learning curve. So I will stick to just version three which is still active and pretty popular.

How to talk to a web API

Web APIs are what enable you to use all the services offered by a web app, like GitHub, programmatically using language of your choice. For example, we are going to use Python for our use case, here. Technically, you can do everything you do on GitHub using the API but we will restrict ourselves to only reading the publicly accessible information.

Your Python program will be talking to an API just the same way as your browser talks to a website. That is to say, mostly via HTTPS requests. These requests will contain different ‘parts’, starting from the method of the request [GET, POST, PUT, DELETE], the URL itself, a query string, an HTTP header and a body or a payload. Most of these are optional. We will however need to provide a request method and the URL to which we are making the request.

What these are and how they are represented in an HTTPS request is something we will see slow as we start writing Python Scripts to interact with GitHub.

An Example

Adding SSH keys to a newly created server is always a clumsy process. Let’s write a Python script that will retrieve your public SSH keys from GitHub and add it to the authorized_keys file on any Linux or Unix server where you run this script. If you don’t know how to generate or use SSH keys, here is an excellent article on how to do exactly that. I will assume that you have created and added your own public SSH keys to your GitHub account.

A very simple and naive Python implementation to achieve the task we described above is as shown below:

import requests
import os
 
# Getting user input
unix_user = input("Enter your Unix username: ")
github_user = input("Enter your GitHub username: ")
 
# Making sure .ssh directory exists and opening authorized_keys file
ssh_dir = '/home/'+unix_user+'/.ssh/'
if not os.path.exists(ssh_dir):
    os.makedirs(ssh_dir)
 
authorized_keys_file = open(ssh_dir+'authorized_keys','a')
 
# Sending a request to the GiHub API and storing the response in a variable named'response'
api_root = "https://api.github.com"
request_header = {'Accept':'application/vnd.github.v3+json'}
response = requests.get(api_root+'/users/'+github_user+'/keys', headers = request_header)
 
## Processing the response and appending keys to authorized_keys file
for i in response.json():
    authorized_keys_file.write(i['key']+'\n')

Let’s ignore Python file handling and miscellaneous details and look strictly at the request and response. First we imported the requests module import requests this library allows us to make API calls very easily. This library is also one of the best examples of an open source project done right. Here’s the official site in case you want to have a closer look at the docs.

Next we set a variable api_root.

api_root = "https://api.github.com"

This is the common substring in all of the URLs to which we will be making API calls. So instead of typing “https://api.github.com” everytime we need to access https://api.github.com/users or https://api.github.com/users/<username> we just write api_root+'/users/' or api_root+'/users/<username>', as shown in the code snippet.

Next, we set the header in our HTTPS request, indicating that responses are meant for version 3 API and should be JSON formatted. GitHub would respect this header information.

1.  GET Request

So now that we have our URL and (an optional) header information stored in different variables, it’s time to make the request.

response = requests.get(api_root+'/users/'+github_user+'/keys', headers = request_header)

The request is of type ‘get’ because we are reading publicly available information from GitHub. If you were writing something under your GitHub user account you would use POST. Similarly other methods are meant for other functions like DELETE is for deletion of resources like repositories.

2.  API Endpoint

The API endpoint that we are reaching out for is:

https://api.github.com/users/<username>/keys

Each GitHub resource has its own API endpoint. Your requests for GET, PUT, DELETE, etc are then made against the endpoint you supplied. Depending on the level of access you have, GitHub will then either allow you to go through with that request or deny it.

Most organizations and users on GitHub set a huge amount of information readable and public. For example, my GitHub user account has a couple of public repositories and public SSH keys that anyone can read access(even without a GitHub user account). If you want to have a more fine-grained control of your personal account you can generate a “Personal Access Token” to read and write privileged information stored in your personal GitHub account. If you are writing a third party application, meant to be used by users other than you, then an OAuth Token of the  said user is what your application would require.

But as you can see, a lot of useful information can be accessed without creating any token.

3.  Response

The response is returned from the GitHub API server and  is stored in the variable named response. The entire response could be read in several ways as documented here. We explicitly asked for JSON type content from GitHub so we will process the request, as though it is JSON. To do this we call the json() method from the requests module which will decode it into Python native objects like dictionaries and lists.

You can see the keys being appended to the authorized_keys file in this for loop:

for i in response.json():
    authorized_keys_file.write(i['key']+'\n')

If you print the response.json() object, you will notice that it is a Python list with Python dictionaries as members. Each dictionary has a key named ‘key’ with your public SSH key as value to that key. So you can append these values one by one to your authorized_keys file. And now you can easily SSH into your server from any computer that has anyone of the private SSH keys corresponding to one of the public keys we just appended.

Exploring Further

A lot of work with APIs involves careful inspection of the API documentation itself more than writing lines of code. In case of GitHub, the documentation is one of the finest in the industry. But reading up on API docs and making API calls using Python is rather uninteresting as a standalone activity.

Before you go any further, I would recommend you to come up with one task that you would like to perform using Python on your GitHub account. Then try to implement it by reading only the official documentations provided by Python, its dependent libraries and GitHub. This will also help you adopt a healthier mindset where you understand what’s going on inside your code and improve it gradually over time.

About the author

Ranvir Singh

Ranvir Singh

I am a tech and science writer with quite a diverse range of interests. A strong believer of the Unix philosophy. Few of the things I am passionate about include system administration, computer hardware and physics.