Python Pandas

How to Fix KeyError in Pandas

While retrieving rows or columns from a pandas DataFrame, we need to verify whether the row or column label exists or not. An invalid column name (one that does not exist in the DataFrame) or an invalid row index (referring to a non-existent row) will raise a KeyError. In this guide, we will discuss the KeyError in pandas first. Then, we will reproduce this error with an example by creating a pandas DataFrame. Finally, we will look at different ways to solve this error.

pandas – KeyError

KeyError occurs when the specified key does not exist in a dictionary. In pandas, this error occurs if the specified column label or row label does not exist in the DataFrame. Let’s discuss the error by reproducing it with examples.

Error Scenario 1: Accessing Invalid Column Name

Create a DataFrame named campaign_details with three columns (‘Name’, ‘Location’, ‘Budget’) and three rows. Try to access the column ‘Location’ from it.

import pandas

campaign_details=pandas.DataFrame([['Marketing','USA',25000],
                                   ['Sales','India',15000],
                                   ['Technical','Italy',20000]],columns=['Name','Location','Budget'])
print(campaign_details,"\n")

# Try to get the location
print(campaign_details['location'])

Output

This is the existing DataFrame.

The column ‘Location’ does not exist, so a KeyError is raised.

Solution 1: Specify the Correct Spelling

Developers can get rid of this error by returning all the column names. If they want to know the column names first, use the pandas.DataFrame.columns property to get all the column labels of the DataFrame.

import pandas

campaign_details=pandas.DataFrame([['Marketing','USA',25000],
                                   ['Sales','India',15000],
                                   ['Technical','Italy',20000]],columns=['Name','Location','Budget'])

# Get the columns using the columns property
print(campaign_details.columns,"\n")

# Specify the correct spelling
print(campaign_details['Location'])

Output

The first output represents the existing columns present in the campaign_details DataFrame, and a KeyError is not raised. This is becuase the column spelling is correct in the second output.

Solution 2: Utilizing pandas.DataFrame.get()

The pandas.DataFrame.get() function is used to get the item from a pandas object based on a given key, where the key corresponds to a DataFrame column. This function will not raise an error if the column label is incorrect; instead, it displays the default value.

Syntax

Let’s see the syntax of this function with parameters.

pandas.DataFrame.get(key, default)
  1. key: Specify the column label as the key parameter.
  2. default (Default = None): If the column key doesn’t exist, the value passed to this parameter is returned.

Example 1

Try to get the location with the get() function by specifying the default parameter as “location not exist”.

import pandas

campaign_details=pandas.DataFrame([['Marketing','USA',25000],
                                   ['Sales','India',15000],
                                   ['Technical','Italy',20000]],columns=['Name','Location','Budget'])

# Try to get the location with the get() function by specifying
# the default parameter
print(campaign_details.get('location', default="location not exist"))

Output

The default value is displayed because the ‘location’ column does not exist.

Example 2

Try to get the ‘location’ with the get() function without the default parameter.

import pandas

campaign_details=pandas.DataFrame([['Marketing','USA',25000],
                                   ['Sales','India',15000],
                                   ['Technical','Italy',20000]],columns=['Name','Location','Budget'])

# Try to get the location with the get() function
# without specifying the default parameter
print(campaign_details.get('location'))

Output

The default parameter ‘None’ is returned since the column name is not provided.

Example 3

Get the column ‘Location’ by passing the correct label.

import pandas

campaign_details=pandas.DataFrame([['Marketing','USA',25000],
                                   ['Sales','India',15000],
                                   ['Technical','Italy',20000]],columns=['Name','Location','Budget'])

# Pass the correct column name to the get() function
print(campaign_details.get('Location'))

Output

The column values are returned since the column label is correct.

Error Scenario 2: Accessing Invalid Row Index Label

Create a DataFrame named campaign_details with three columns (‘Name’, ‘Location’, ‘Budget’) and three rows with index labels – C1, C2 and C3. Try to access the crow – C4 from it.

import pandas

campaign_details=pandas.DataFrame([['Marketing','USA',25000],
                                   ['Sales','India',15000],
                                   ['Technical','Italy',20000]],columns=['Name','Location','Budget'],index=['C1','C2','C3'])
print(campaign_details,"\n")

# Try to get the index- 'C4'
print(campaign_details.loc['C4'])

Output

This is the existing DataFrame.

The row index ‘C4’ does not exist, so KeyError is raised.

Solution: Specify the Correct Index Label

Developers can get rid of this error by returning all the index labels. If they want to know the indices first, use the pandas.DataFrame.index property to get all the row index labels of the DataFrame. Let us now access the row C3.

import pandas

campaign_details=pandas.DataFrame([['Marketing','USA',25000],
                                   ['Sales','India',15000],
                                   ['Technical','Italy',20000]],columns=['Name','Location','Budget'],index=['C1','C2','C3'])

print(campaign_details.index,"\n")

# Get the row - C3
print(campaign_details.loc['C3'])

Output

The first output represents the existing index labels present in the campaign_details DataFrame, while in the second output, a KeyError is not raised because the row C3 exists in the campaign_details DataFrame.

Conclusion

We learned how to fix the KeyError issues in pandas in different scenarios. The sources of this error will be the incorrect column label or row index label. First, we reproduced the error, and then resolved it by specifying the correct column label and row index label by displaying all the columns and row index labels. Also, we utilized the pandas.DataFrame.get() function to display default value instead of error when an incorrect column label is specified.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain