In this guide, we will discuss how to fix this error by adding a suffix to the similar column labels present in both DataFrames. We will also explore using the pandas.DataFrame.merge() function to join both DataFrames.
Reproducing the Error
As we discussed, this error will be raised when we try to join two DataFrames that have one or more columns with the same labels (similar column labels), but no suffix is specified to differentiate them during the join. Let’s see the error first.
Create two DataFrames – camps and budgets with ‘Name’ column as common in both of them. Try to join these two DataFrames using the join() function.
# Create two DataFrames
camps = pandas.DataFrame({'Name': ['Java webinar','Marketing','Finance'],
'Type': ['Webinar','Public relations','Email']})
budgets = pandas.DataFrame({'Name': ['Java webinar','Sales','Marketing'],
'Budget': [1000,2000,1500]})
# Try to join camps and budgets
print(camps.join(budgets))
Output
You can see that error is encountered as the Name column is similar in both the DataFrames.
Solution 1: Specifying the suffix
We can specify different suffixes for the columns in both the DataFrames so that when joining the DataFrames using the join() function, the error won’t be raised. First, look at the syntax.
Syntax
Let’s see the syntax and parameters passed to this function.
- other – This specifies the DataFrame to be joined with the first DataFrame
- on (Default = None) – This optional parameter column or index label to join.
- how (Default = left) – This parameter specifies the type of join – {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}.
- lsuffix (Default = ‘’) – This parameter takes the suffix from the left DataFrame for overlapping columns.
- rsuffix (Default = ‘’) – This parameter takes the suffix from the right DataFrame for overlapping columns.
Example
Join the above DataFrames by specifying lsuffix as ‘_First’ and rsuffix as ‘_Second.’
# Create two DataFrames
camps = pandas.DataFrame({'Name': ['Java webinar','Marketing','Finance'],
'Type': ['Webinar','Public relations','Email']})
print(camps,"\n")
budgets = pandas.DataFrame({'Name': ['Java webinar','Marketing','Finance'],
'Budget': [1000,2000,1500]})
print(budgets,"\n")
# Join camps and budgets using join()
# by specifying lsuffix & rsuffix parameters.
print(camps.join(budgets,lsuffix='_First', rsuffix='_Second',how='outer'))
Output
Both the DataFrames are joined without any error. You can see that Name column labels are changed to ‘Name_First’ and ‘Name_Second’.
Solution 2: Using pandas.DataFrame.merge()
This function is similar to the join() function, which merges the DataFrames based on column labels. The pandas.DataFrame.merge() function will remove one duplicate column from the result.
Syntax
Let’s see the syntax and parameters passed to this function.
- right_DataFrame – This specifies the DataFrame to be joined with left_DataFrame
- on (Default = None) – This optional parameter specifies the column or index label to join.
- how (Default = inner) – This parameter specifies the type of join – {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}.
- left_on (Default = None) – You can pass the column names (List) to join on in the left_DataFrame.
- right_on (Default = None) – You can pass the Column names (List) to join on in the right_DataFrame.
Example
Join the above DataFrames with pandas.DataFrame.merge() function by specifying the parameters – how=’outer’, left_on= ‘Name’, right_on = ‘Name’.
# Create two DataFrames
camps = pandas.DataFrame({'Name': ['Java webinar','Marketing','Finance'],
'Type': ['Webinar','Public relations','Email']})
print(camps,"\n")
budgets = pandas.DataFrame({'Name': ['Java webinar','Marketing','Finance'],
'Budget': [1000,2000,1500]})
print(budgets,"\n")
# merge()
print(pandas.merge(camps,budgets,how='outer', left_on= 'Name', right_on = 'Name'))
Output
Both DataFrames are joined, and the ‘Name’ column is included in the result without any errors.
Conclusion
We learned how to join two DataFrames without getting ValueError: columns overlap but no suffix specified by providing two different solutions. The first solution is to provide the lsuffix and rsuffix parameters to the join() function while joining the DataFrames. The second solution utilizes the merge() function, which merges the DataFrame while returning only one colum in the result.