Python Pandas

How to Fix Columns Overlap but no Suffix Specified

There are many sources from which data will come from. In pandas DataFrame, it is possible to join data from multiple DataFrames based on the key columns. pandas.DataFrame.join() is one of the functions which will join the data from two DataFrames and return the joined DataFrame. If two DataFrames have the same column label, and you have not specified the suffix for these two column labels, ValueError: columns overlap but no suffix specified is raised.

In this guide, we will discuss how to fix this error by adding a suffix to the similar column labels present in both DataFrames. We will also explore using the pandas.DataFrame.merge() function to join both DataFrames.

Reproducing the Error

As we discussed, this error will be raised when we try to join two DataFrames that have one or more columns with the same labels (similar column labels), but no suffix is specified to differentiate them during the join. Let’s see the error first.

Create two DataFrames – camps and budgets with ‘Name’ column as common in both of them. Try to join these two DataFrames using the join() function.

import pandas

# Create two DataFrames
camps = pandas.DataFrame({'Name': ['Java webinar','Marketing','Finance'],
                          'Type': ['Webinar','Public relations','Email']})

budgets = pandas.DataFrame({'Name': ['Java webinar','Sales','Marketing'],
                        'Budget': [1000,2000,1500]})

# Try to join camps and budgets
print(camps.join(budgets))

Output

You can see that error is encountered as the Name column is similar in both the DataFrames.

Solution 1: Specifying the suffix

We can specify different suffixes for the columns in both the DataFrames so that when joining the DataFrames using the join() function, the error won’t be raised. First, look at the syntax.

Syntax

Let’s see the syntax and parameters passed to this function.

pandas.DataFrame1.join(other, on, how, lsuffix, rsuffix,...)
  1. other – This specifies the DataFrame to be joined with the first DataFrame
  2. on (Default = None) – This optional parameter column or index label to join.
  3. how (Default = left) – This parameter specifies the type of join – {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}.
  4. lsuffix (Default = ‘’) – This parameter takes the suffix from the left DataFrame for overlapping columns.
  5. rsuffix (Default = ‘’) – This parameter takes the suffix from the right DataFrame for overlapping columns.

Example

Join the above DataFrames by specifying lsuffix as ‘_First’ and rsuffix as ‘_Second.’

import pandas

# Create two DataFrames
camps = pandas.DataFrame({'Name': ['Java webinar','Marketing','Finance'],
                          'Type': ['Webinar','Public relations','Email']})
print(camps,"\n")

budgets = pandas.DataFrame({'Name': ['Java webinar','Marketing','Finance'],
                        'Budget': [1000,2000,1500]})
print(budgets,"\n")


# Join camps and budgets using join()
# by specifying lsuffix & rsuffix parameters.
print(camps.join(budgets,lsuffix='_First', rsuffix='_Second',how='outer'))

Output

Both the DataFrames are joined without any error. You can see that Name column labels are changed to ‘Name_First’ and ‘Name_Second’.

Solution 2: Using pandas.DataFrame.merge()

This function is similar to the join() function, which merges the DataFrames based on column labels. The pandas.DataFrame.merge() function will remove one duplicate column from the result.

Syntax

Let’s see the syntax and parameters passed to this function.

left_DataFrame.merge(right_DataFrame, how, on, left_on, right_on,...)
  1. right_DataFrame – This specifies the DataFrame to be joined with left_DataFrame
  2. on (Default = None) – This optional parameter specifies the column or index label to join.
  3. how (Default = inner) – This parameter specifies the type of join – {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}.
  4. left_on (Default = None) – You can pass the column names (List) to join on in the left_DataFrame.
  5. right_on (Default = None) – You can pass the Column names (List) to join on in the right_DataFrame.

Example

Join the above DataFrames with pandas.DataFrame.merge() function by specifying the parameters – how=’outer’, left_on= ‘Name’, right_on = ‘Name’.

import pandas

# Create two DataFrames
camps = pandas.DataFrame({'Name': ['Java webinar','Marketing','Finance'],
                          'Type': ['Webinar','Public relations','Email']})
print(camps,"\n")

budgets = pandas.DataFrame({'Name': ['Java webinar','Marketing','Finance'],
                        'Budget': [1000,2000,1500]})
print(budgets,"\n")

# merge()
print(pandas.merge(camps,budgets,how='outer', left_on= 'Name', right_on = 'Name'))

Output

Both DataFrames are joined, and the ‘Name’ column is included in the result without any errors.

Conclusion

We learned how to join two DataFrames without getting ValueError: columns overlap but no suffix specified by providing two different solutions. The first solution is to provide the lsuffix and rsuffix parameters to the join() function while joining the DataFrames. The second solution utilizes the merge() function, which merges the DataFrame while returning only one colum in the result.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain