Python

Psycopg2 Tutorial

Python, PostgreSQL and Psycopg2 Tutorial

To develop an application beyond a simple script, it is necessary to persist data outside of memory into a database.  There are many possible choices for a database, but PostgreSQL is a robust open source platform that can easily scale to production.

Python and PostgreSQL can be interfaced to develop powerful applications quickly.  Psycopg is a PostgreSQL adapter that can be used to harness PostgreSQL through the Python based library.  This tutorial will walk through the install of Psycopg2 and some Python code to demonstrate its use.

You can install Psycopg2 through the below terminal pip command.

$ pip install psycopg2

When installing you should see the terminal output below.

Collecting psycopg2
Downloading psycopg2-2.7.3.2-cp27-cp27m-
macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10
_10_x86_64.whl (1.7MB)

100% |████████████████████████████████| 1.7MB 397kB/s
Installing collected packages: psycopg2
Successfully installed psycopg2-2.7.3.2
Bradleys-Mini:~ BradleyPatton$

To import the Psycopg2 package into your Python application you use the below line of code.

import psycopg2

In order to get some data to load into our database, I have borrowed some code from a previous tutorial on pandas. The below code will create a pandas DataFrame with historical data.  This will then be leveraged to create a table in PostgreSQL table.

def get_data(symbols, start_date, end_date):

    panel = data.DataReader(symbols, 'yahoo', start_date, end_date)
    df = panel['Close']
    df.columns = map(str.lower, df.columns)
    hd = list(df)
    print df.head()
    print hd

    return df

I will now set up some housekeeping code used to run the tutorial. These two methods will be used to call the Psycopg2 methods that we create.

def tutorial_run():
    symbols = ['SPY', 'AAPL','GOOG']
    df = get_data(symbols, '2006-01-03', '2017-12-31')

if __name__ == "__main__":
    tutorial_run()

In order to connect to the PostgreSQL database, we will need to add the below method. The Try\Except provides some error handling in the event that the local database is not running, or incorrect connection parameters are passed to the database. The connect method in the Psycopg2 library connects to the database with the parameters passed in the connection string. Your parameters for dbname, user, and password may differ. If the connection fails for some reason, the error message will be written to the console. This method returns the connection object back to our call method where it can be used for further database operations.

def connect():
    cons = "dbname='tutorial' user='postgres' host='localhost' password='password'"
    try:
        conn = psycopg2.connect(cons)
        print "Connected"
    except:
        print "I am unable to connect to the database"
    return conn

Once we have established the connection to the PostgreSQL database, we can load our data from the get_data() method into our database. Psycopg2 and pandas make this a very simple process.

The first line defines the method that pandas should use to connect to the database in order to copy the DataFrame. You will provide the same parameters as your connection method. The second line of code persists the DataFrame to the PostgreSQL database with the to_sql() method.

def create_table(table, df):
    engine = create_engine('postgresql+psycopg2://postgres:[email protected]:5432/tutorial')
    df.to_sql(table, engine, if_exists='replace')

A quick look in our PostgreSQL pgAdmin terminal shows that the code successfully loaded the DataFrame into the table “close”. Now that we have some data loaded into our database. We can use psycopg to run some queries on the data. The below method is constructed to take the connection established in our first method and run a query on our PostgreSQL database. In order to create the 4 SQL objects we need to add another import statement.

from psycopg2 import sql

In order to create dynamic SQL commands, psycopg uses string formatting to populate variables into the string using the %s and {} operators.

PostrgreSQL is case sensitive. In the get_data() method we forced our column headers to lowercase. The index was not included in this instruction. In order to pass the capital “Data” column header in the query, we need to pass it to PostgreSQL in double quotes. To do this in a string in Python, you need to send the escape character “\” before the double quotes.

We can replace the “%s” in the string using the python string formatting syntax below. This replaces the %s with our date parameter dt.

To execute the SQL query that was created. You then need to pass it to the cursor’s .execute() method. By calling the .fetchall() method, you return the results of the query. When printed to the console you can display the results.

def get_row(dt, conn):

    cr = conn.cursor()
    query = sql.SQL("SELECT aapl from close WHERE "Date" = '%s'" % dt)
    cr.execute(query)
    print cr.fetchall()

To run this function we add the below line of code to the tutorial_run() method. You should get similar results to the below.

get_row("2017-12-29",conn)

In the next method, we will utilize the string format methods to pass in multiple parameters into our query. This query will take a date and three columns. In additional to using the %s operator, we will utilize the {} operator to join string variables into a string and inject them into our query string. Our query string now uses the join below with a “,” separator to pass multiple column names into our query.

def get_cols(dt, col1, col2, col3, conn):
    cr = conn.cursor()
    query = sql.SQL("SELECT {} from close WHERE "Date" = '%s'" % dt).format(
    sql.SQL(', ').join([sql.Identifier(col1), sql.Identifier(col2), sql.Identifier(col3)]))
    cr.execute(query)
    print cr.fetchall()

In order to use our new method I will add the below line to our tutorial_run() method. You should see the results below.

get_cols("2017-12-29","aapl","spy", "goog", conn)

The next method that we write will use two {} string replacements to pull all of the data in our table with the exception of our index. This method builds on our previous method by adding a second replace bracket notation “{1}”. This time the brackets are numbered so that they are replaced in the order format notion code. Our new method joins the three column parameters with comma separator. In addition, the second parameter in the format method is the table variable. The query string is then constructed by replacing the brackets with the parameters in the format method in order. That is {0} = columns and {1} = table name.

def get_tab(table, col1, col2, col3, conn):
    cr = conn.cursor()
    query = sql.SQL("SELECT {0} from {1} ").format(
    sql.SQL(', ').join([sql.Identifier(col1), sql.Identifier(col2),
    sql.Identifier(col3)]), sql.Identifier(table))
    cr.execute(query)
    print cr.fetchall()

In order to use our new method I will add the below line to our tutorial_run() method. You should see the results below.

get_tab("close", "aapl", "spy", "goog", conn)

There are many more methods to explore in the psycopg library. This should get you started with a good understanding of psycopg functions. I have provided some more resources below in documentation pages that will allow you to more extensively explore the library.

Full Code

import psycopg2
from psycopg2 import sql
import pandas_datareader as data

def get_data(symbols, start_date, end_date):

    panel = data.DataReader(symbols, 'yahoo', start_date, end_date)
    df = panel['Close']
    df.columns = map(str.lower, df.columns)
    hd = list(df)
    print df.head()
    print hd
    return df

def connect():
    cons = "dbname='tutorial' user='postgres' host='localhost' password='password'"
    try:
        conn = psycopg2.connect(cons)
        print "Connected"
    except:
        print "I am unable to connect to the database"

    return conn

def create_table(table, df):

    engine = create_engine('postgresql+psycopg2://postgres:[email protected]:5432/tutorial')
    df.to_sql(table, engine, if_exists='replace')

def get_row(dt, conn):

    cr = conn.cursor()
    query = sql.SQL("SELECT aapl from close WHERE "Date" = '%s'" % dt)
    cr.execute(query)
    print cr.fetchall()

def get_cols(dt, col1, col2, col3, conn):
    cr = conn.cursor()
    query = sql.SQL("SELECT {} from close WHERE "Date" = '%s'" % dt).format(
    sql.SQL(', ').join([sql.Identifier(col1),
    sql.Identifier(col2), sql.Identifier(col3)]))
    cr.execute(query)
    print cr.fetchall()

def get_tab(table,col1, col2, col3, conn):
    cr = conn.cursor()
    query = sql.SQL("SELECT {0} from {1} ").format(
    sql.SQL(', ').join([sql.Identifier(col1), sql.Identifier(col2),
    sql.Identifier(col3)]), sql.Identifier(table))
    cr.execute(query)
    print cr.fetchall()

def tutorial_run():

    conn = connect()
    symbols = ['SPY', 'AAPL','GOOG']
    df = get_data(symbols, '2006-01-03', '2017-12-31')

    create_table("close", df)
    get_row("2017-12-29",conn)
    get_cols("2017-12-29","aapl","spy", "goog", conn)
    get_tab("close", "aapl", "spy", "goog", conn)

if __name__ == "__main__":
    tutorial_run()

References

initd.org/psycopg
initd.org/psycopg/docs/install.html
http://initd.org/psycopg/docs/sql.html
wiki.postgresql.org/wiki/Psycopg2_Tutorial

About the author

Brad Patton

Brad Patton is a software engineering consultant and writer who delivers high quality software systems to his clients. He received his undergraduate and graduate degree in Computer Science. Brad leverages his expertise in programming and mathematics to develop profitable algorithmic trading strategies. He shares his experience by writing about his tools and strategies. He can be found on Twitter at @BradleyPatton.