Python Pandas

Pandas read_sql

In pandas, we can create a DataFrame from SQL databases. pandas.read_sql is one of the functions that will read SQL query into an empty DataFrame. In this guide, we will see how to use this function by specifying different parameters with examples.

pandas.read_sql

pandas.read_sql() reads SQL query or a table from the Database into the pandas DataFrame. It will support only two databases. SQLAlchemy and sqlite3. All the examples that we discuss in this guide are sqlite3.

Syntax:

Let’s see the syntax and the parameters passed to this function.

pandas.read_sql(sql, con, index_col,parse_dates, columns, coerce_float, chunksize, dtype_backend, ...)
  1. sql is the SQL query or Table Name. Example: SELECT columns from table
  2. index_col (By default = None) – Existing column/s are utilized as Index for the DataFrame. If not, Index column is None and row indices are assigned starting from 0.
  3. parse_dates (By default = None): If the sql query/table holds the date type columns, we can parse the columns with specific format by specifying this parameter. It takes a dictionary such that Key is the column and Value takes the format. Example: {“Column”: {“format”: “%d/%m/%y”},…}.
  4. chunksize (By default = None) – Specifies the number of rows to include in each chunk.

Example 1

In this example, we will create a DataFrame with 5 records and store the records in the Campaign table by establishing a connection to an in−memory SQLite database using the to_sql() function.

Finally, we will read the SQL query that contains all the columns from the Campaign table into the DataFrame – result. Here, connect(‘:memory:’) is the in−memory SQLite database.

from sqlite3 import *
import pandas

camps_data = pandas.DataFrame(data=[[1, 'Webinar', 'Planned'],
                                [2, 'Conference', 'Aborted'],
                                [3, 'Trade Show', 'In Progress'],
                                [4, 'Public Relations', 'Aborted'],
                                [5, 'Conference', 'Completed']],columns=['Id', 'Type','Status'])

# Store the records of the DataFrame in the Campaign table
# by establishing the connection to an in−memory SQLite database.
conn=connect(':memory:')
camps_data.to_sql('Campaign', conn)

# Read SQL query that hold all the columns from the Campaign table into the DataFrame - result
result= pandas.read_sql('SELECT * from Campaign', conn)
print(result)

Output

Now, the result DataFrame contains all the columns since we selected all the columns (*). An index column is created for the DataFrame, and row index starts from 0.

Example 2: parse_dates Parameter

Create a DataFrame with five records related to Campaign. StartDate and EndDate are the two columns that we need to parse as date in the DataFrame. Pass the parse_dates parameter to the read_sql() function and assign the format – “%d/%m/%y” to it. {“StartDate”: {“format”: “%d/%m/%y”},”EndDate”: {“format”: “%d/%m/%y”}}.

from sqlite3 import *
import pandas

# Create DataFrame with 5 records related to Campaign
camps_data = pandas.DataFrame(data=[[1, 'Webinar', 'Planned', '25/05/23','27/05/23'],
                                [2, 'Conference', 'Aborted', '01/01/23','03/03/23'],
                                [3, 'Trade Show', 'In Progress', '22/01/22','22/02/22'],
                                [4, 'Public Relations', 'Aborted', '22/01/22','22/02/22'],
                                [5, 'Conference', 'Completed', '11/06/23','17/06/23']],columns=['Id', 'Type','Status', 'StartDate', 'EndDate'])

conn=connect(':memory:')
camps_data.to_sql('Campaign', conn)

# Read SQL query into the DataFrame - result by parsing StartDate and EndDate columns
result= pandas.read_sql('SELECT StartDate, EndDate from Campaign', conn,parse_dates={"StartDate": {"format": "%d/%m/%y"},"EndDate": {"format": "%d/%m/%y"}})
print(result)

Output

You can see the values in these two columns are in the format “YYYY-MM-DD,” where “25/05/23” represents the date/month/year.

Example 3: index_col Parameter

Let’s set the Id column as Index to the DataFrame. Set the index_col to ‘Id”.

from sqlite3 import *
import pandas

camps_data = pandas.DataFrame(data=[[1, 'Webinar', 'Planned'],
                                [2, 'Conference', 'Aborted'],
                                [3, 'Trade Show', 'In Progress'],
                                [4, 'Public Relations', 'Aborted'],
                                [5, 'Conference', 'Completed']],columns=['Id', 'Type','Status'])
conn=connect(':memory:')
camps_data.to_sql('Campaign', conn)

# Set Id column as Index
result= pandas.read_sql('SELECT * from Campaign', conn,index_col='Id')
print(result)

Output

For the above DataFrame, the Id column will be the index.

Example 4: Columns Parameter

  1. Create a database – Case_Detail in sqlite3 and create a cursor object.
  2. Create a table – Cases in the above database with three columns and execute the creation using the cursor object.
  3. Insert 5 records into the table.
  4. Read the query by specifying the connection.
  5. Get the data into the DataFrame with ‘Case_Type’, ‘Status’ columns.
import sqlite3
import pandas

# Create database - Case_Detail in sqlite3 and create cursor
conn = sqlite3.connect('Case_Detail.db')
cursor = conn.cursor()

# Create table - Cases in the above database with three columns and execute it using the cursor object.
create_table_query ="""CREATE TABLE Cases(Case_Type VARCHAR(255), Source VARCHAR(255),Status VARCHAR(255));"""
cursor.execute(create_table_query)

# Insert 5 records into the table
cursor.execute('''INSERT INTO Cases VALUES ('Computer', 'Phone', 'Closed')''')
cursor.execute('''INSERT INTO Cases VALUES ('Mechanic', 'Web', 'Open')''')
cursor.execute('''INSERT INTO Cases VALUES ('Electrical', 'Phone', 'Open')''')
cursor.execute('''INSERT INTO Cases VALUES ('Mechanic', 'Web', 'Closed')''')
cursor.execute('''INSERT INTO Cases VALUES ('Others', 'Email', 'Closed')''')

# Read the query by specifying the connection
query = pandas.read_sql('SELECT * FROM cases',conn)

# Get the data into the DataFrame with 'Case_Type','Status' columns.
result=pandas.DataFrame(query,columns=['Case_Type','Status'])
print(result)

Output

The DataFrame holds only two columns.

Conclusion

pandas.read_sql() reads an SQL query or a table from the Database into a pandas DataFrame. In this guide, we saw four examples by considering different parameters passed to this function. An n−memory SQLite database is used in the first three examples. In the Last example, we created the database with the table inside it.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain