Python Pandas

How to Fix Can only compare identically-labeled DataFrame objects

Data analysts compare two or more DataFrames to check for redundant data and allow only the data from one of the DataFrames. In these scenarios, they will compare the DataFrames. One way to compare the DataFrames is by using the ‘==’ operator. If the columns or row indices are different in these DataFrames, you may encounter the ValueError: Can only compare identically-labeled DataFrame objects.

In this guide, we will fix this error by providing three different solutions with examples. As we know, a pandas DataFrame is a 2D data structure that holds data in the form of rows and columns.

Reproducing the Error

Previously, we discussed this error occurring when the column labels or row indices do not match.

Scenario 1: With Different Column Labels

Create two pandas DataFrames, Industry1 and Industry2. Industry1 holds ‘Type’ & ‘Budget’ columns, while Industry2 holds ‘Type’ & ‘Amount’ columns. Try to compare both the DataFrames with different column labels using ‘==’ (equals to) operator.

import pandas

# Create DataFrame - Industry1 with 2 columns - 'Type' & 'Budget'
Industry1 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]})
print(Industry1,"\n")

# Create DataFrame - Industry2 with 2 columns - 'Type' & 'Amount'
Industry2 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Amount': [71000,34000,20000]})
print(Industry2,"\n")

# Try to compare both the DataFrames having different column labels
Industry1==Industry2

Output

You can see that an error is encountered due to a mismatch in the second column labels (‘Budget’ and ‘Amount’) in both DataFrames.

Scenario 2: With Different Row Indices

Create two pandas DataFrames with different indices. Industry1 has row labels 1,2,3, and Industry2 row labels 0,1,2.

import pandas

# Create DataFrame - Industry1 with 2 columns - 'Type' & 'Budget'
Industry1 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[1,2,3])
print(Industry1,"\n")

# Create DataFrame - Industry2 with 2 columns - 'Type' & 'Budget'
Industry2 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[0,1,2])
print(Industry2,"\n")

# Try to compare both the DataFrames having different indices
Industry1==Industry2

Output

You can see that an error is encountered due to a mismatch in the row index (index-0 in Industry1 and index-3 in Industry2).

Solution 1: Compare using pandas.DataFrame.equals()

This function compares whether two DataFrames are equal or not. It returns True if all are the same (column labels, row indices, elements); otherwise, it returns False. Unlike the ‘==’ operator, it doesn’t generate errors, so we can utilize this function to compare both the pandas DataFrames.

Syntax

It takes another DataFrame to be compared as a parameter.

pandas.DataFrame.equals(other_DataFrame)

Example 1: Different Columns

Create the same DataFrames (Scenario – 1) and compare them using the equals() function.

import pandas

# Create DataFrame - Industry1 with 2 columns - 'Type' & 'Budget'
Industry1 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]})
print(Industry1,"\n")

# Create DataFrame - Industry2 with 2 columns - 'Type' & 'Amount'
Industry2 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Amount': [71000,34000,20000]})
print(Industry2,"\n")

# Compare both the DataFrames having different column labels using
# equals()
print(Industry1.equals(Industry2))

Output

Error is not raised and the output is False since the column labels are not the same.

Example 2: Different Row Indices

Create the same DataFrames (Scenario – 2) and compare them using the equals() function.

import pandas

# Create DataFrame - Industry1 with 2 columns - 'Type' & 'Budget'
Industry1 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[1,2,3])
print(Industry1,"\n")

# Create DataFrame - Industry2 with 2 columns - 'Type' & 'Budget'
Industry2 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[0,1,2])
print(Industry2,"\n")

# Compare both the DataFrames having different indices
# using equals()
print(Industry1.equals(Industry2))

Output

Error is not raised and the output is False since the row indices are not the same.

Solution 2: Compare Using pandas.DataFrame.equals() by Ignoring the Row Indices

We can ignore the row indices from both DataFrames using the pandas.DataFrame.reset_index() function. This will reset the index and use the default index for the DataFrame, which is 0,1,2,…n. Pass the drop and inplace parameters to this function by setting them to True.

Syntax:

Let’s see how to use this function with the pandas.DataFrame.equals() function.

DataFrame1.reset_index(drop=True,inplace=False).equals
(DataFrame2.reset_index(drop=True,inplace=False))

Example

Create the same DataFrames (Scenario – 2) and compare them using the equals() function while ignoring the existing indices.

import pandas

# Create DataFrame - Industry1 with 2 columns - 'Type' & 'Budget'
Industry1 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[1,2,3])
print(Industry1,"\n")

# Create DataFrame - Industry2 with 2 columns - 'Type' & 'Budget'
Industry2 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[0,1,2])
print(Industry2,"\n")

# Compare both the DataFrames having different indices
# using equals()
print(Industry1.reset_index(drop=True,inplace=False).equals
(Industry2.reset_index(drop=True,inplace=False)))

Output

The existing indices are ignored, and default indices are used. Since the elements are the same in both the DataFrames. True is returned.

Solution 3: Compare Using ‘==’ Operator by Ignoring the Row Indices

Use the pandas.DataFrame.reset_index() function like in the Solution 2 and compare using the ‘==’ operator.

Syntax

Let’s see how to use this function with the ‘==’ operator.

DataFrame1.reset_index(drop=True,inplace=True)== DataFrame2.reset_index(drop=True,inplace=True)

Example

Create the same DataFrames (Scenario – 2) and compare them using the ‘==’ operator while ignoring the existing indices.

import pandas

# Create DataFrame - Industry1 with 2 columns - 'Type' & 'Budget'
Industry1 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[1,2,3])
print(Industry1,"\n")

# Create DataFrame - Industry2 with 2 columns - 'Type' & 'Budget'
Industry2 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[0,1,2])
print(Industry2,"\n")

# Compare both the DataFrames having different indices
# using '=='
print(Industry1.reset_index(drop=True,inplace=True)== Industry2.reset_index(drop=True,inplace=True))

Output

The existing indices are ignored, and default indices are used. The elements are the same in both DataFrames, True is returned.

Conclusion

The ValueError: can only compare identically-labeled DataFrame objects can be fixed by comparing both the DataFrames using the pandas.DataFrame.equals() function with and without ignoring the index. We utilized the pandas.Dataframe.reset_index() function to set the default index by ignoring the existing indices. Also, we used the ‘==’ (equals to) operator by resetting the index using the pandas.Dataframe.reset_index() function with examples.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain