In this guide, we will fix this error by providing three different solutions with examples. As we know, a pandas DataFrame is a 2D data structure that holds data in the form of rows and columns.
Reproducing the Error
Previously, we discussed this error occurring when the column labels or row indices do not match.
Scenario 1: With Different Column Labels
Create two pandas DataFrames, Industry1 and Industry2. Industry1 holds ‘Type’ & ‘Budget’ columns, while Industry2 holds ‘Type’ & ‘Amount’ columns. Try to compare both the DataFrames with different column labels using ‘==’ (equals to) operator.
# Create DataFrame - Industry1 with 2 columns - 'Type' & 'Budget'
Industry1 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]})
print(Industry1,"\n")
# Create DataFrame - Industry2 with 2 columns - 'Type' & 'Amount'
Industry2 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Amount': [71000,34000,20000]})
print(Industry2,"\n")
# Try to compare both the DataFrames having different column labels
Industry1==Industry2
Output
You can see that an error is encountered due to a mismatch in the second column labels (‘Budget’ and ‘Amount’) in both DataFrames.
Scenario 2: With Different Row Indices
Create two pandas DataFrames with different indices. Industry1 has row labels 1,2,3, and Industry2 row labels 0,1,2.
# Create DataFrame - Industry1 with 2 columns - 'Type' & 'Budget'
Industry1 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[1,2,3])
print(Industry1,"\n")
# Create DataFrame - Industry2 with 2 columns - 'Type' & 'Budget'
Industry2 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[0,1,2])
print(Industry2,"\n")
# Try to compare both the DataFrames having different indices
Industry1==Industry2
Output
You can see that an error is encountered due to a mismatch in the row index (index-0 in Industry1 and index-3 in Industry2).
Solution 1: Compare using pandas.DataFrame.equals()
This function compares whether two DataFrames are equal or not. It returns True if all are the same (column labels, row indices, elements); otherwise, it returns False. Unlike the ‘==’ operator, it doesn’t generate errors, so we can utilize this function to compare both the pandas DataFrames.
Syntax
It takes another DataFrame to be compared as a parameter.
Example 1: Different Columns
Create the same DataFrames (Scenario – 1) and compare them using the equals() function.
# Create DataFrame - Industry1 with 2 columns - 'Type' & 'Budget'
Industry1 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]})
print(Industry1,"\n")
# Create DataFrame - Industry2 with 2 columns - 'Type' & 'Amount'
Industry2 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Amount': [71000,34000,20000]})
print(Industry2,"\n")
# Compare both the DataFrames having different column labels using
# equals()
print(Industry1.equals(Industry2))
Output
Error is not raised and the output is False since the column labels are not the same.
Example 2: Different Row Indices
Create the same DataFrames (Scenario – 2) and compare them using the equals() function.
# Create DataFrame - Industry1 with 2 columns - 'Type' & 'Budget'
Industry1 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[1,2,3])
print(Industry1,"\n")
# Create DataFrame - Industry2 with 2 columns - 'Type' & 'Budget'
Industry2 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[0,1,2])
print(Industry2,"\n")
# Compare both the DataFrames having different indices
# using equals()
print(Industry1.equals(Industry2))
Output
Error is not raised and the output is False since the row indices are not the same.
Solution 2: Compare Using pandas.DataFrame.equals() by Ignoring the Row Indices
We can ignore the row indices from both DataFrames using the pandas.DataFrame.reset_index() function. This will reset the index and use the default index for the DataFrame, which is 0,1,2,…n. Pass the drop and inplace parameters to this function by setting them to True.
Syntax:
Let’s see how to use this function with the pandas.DataFrame.equals() function.
(DataFrame2.reset_index(drop=True,inplace=False))
Example
Create the same DataFrames (Scenario – 2) and compare them using the equals() function while ignoring the existing indices.
# Create DataFrame - Industry1 with 2 columns - 'Type' & 'Budget'
Industry1 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[1,2,3])
print(Industry1,"\n")
# Create DataFrame - Industry2 with 2 columns - 'Type' & 'Budget'
Industry2 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[0,1,2])
print(Industry2,"\n")
# Compare both the DataFrames having different indices
# using equals()
print(Industry1.reset_index(drop=True,inplace=False).equals
(Industry2.reset_index(drop=True,inplace=False)))
Output
The existing indices are ignored, and default indices are used. Since the elements are the same in both the DataFrames. True is returned.
Solution 3: Compare Using ‘==’ Operator by Ignoring the Row Indices
Use the pandas.DataFrame.reset_index() function like in the Solution 2 and compare using the ‘==’ operator.
Syntax
Let’s see how to use this function with the ‘==’ operator.
Example
Create the same DataFrames (Scenario – 2) and compare them using the ‘==’ operator while ignoring the existing indices.
# Create DataFrame - Industry1 with 2 columns - 'Type' & 'Budget'
Industry1 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[1,2,3])
print(Industry1,"\n")
# Create DataFrame - Industry2 with 2 columns - 'Type' & 'Budget'
Industry2 = pandas.DataFrame({'Type': ['Agriculture','Energy','Others'],'Budget': [71000,34000,20000]},index=[0,1,2])
print(Industry2,"\n")
# Compare both the DataFrames having different indices
# using '=='
print(Industry1.reset_index(drop=True,inplace=True)== Industry2.reset_index(drop=True,inplace=True))
Output
The existing indices are ignored, and default indices are used. The elements are the same in both DataFrames, True is returned.
Conclusion
The ValueError: can only compare identically-labeled DataFrame objects can be fixed by comparing both the DataFrames using the pandas.DataFrame.equals() function with and without ignoring the index. We utilized the pandas.Dataframe.reset_index() function to set the default index by ignoring the existing indices. Also, we used the ‘==’ (equals to) operator by resetting the index using the pandas.Dataframe.reset_index() function with examples.