Python Pandas

Pandas Merge on Multiple Columns

Pandas, a popular Python library, offers various methods and functions to work with tabular data, such as DataFrames and Series. In data analysis, merging or joining different data sources based on common columns or indexes is an important task. For this purpose, Pandas provides the “pandas.merge()” method that helps us merge multiple columns of multiple DataFrame.

This post will present a complete guide on how to combine two DataFrame based on one or more columns.

How to Merge Multiple Columns of Pandas DataFrame in Python?

The “pandas.merge()” method is used to merge the multiple columns of two Pandas DataFrame in Python.

The syntax of the “pandas.merge()” is shown below:

DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, right_index=False, validate=None, left_index=False, suffixes=('_x', '_y'), sort=False, copy=None, indicator=False)

In the above syntax:

  • The “right” parameter/attribute is utilized to indicate the Pandas DataFrame object that needs to be merged.
  • The “how” parameter specifies the merge type that needs to apply on DataFrame.
  • The “on” parameter specifies the columns or index level names to join/merge on.
  • The “left_on” and “rigt_on” parameters are used to specify the columns or index level names to join/merge on in the left and right DataFrame.
  • The other parameters are not mandatory and are utilized for specific purposes.

Example 1: Merge Multiple Columns of Pandas DataFrame

The following example is used to merge multiple columns of Pandas DataFrame:

import pandas
df = pandas.DataFrame({'Name': ['Lily', 'Anna', 'Joseph', 'Tim'],
                       'Age' : [12, 22, 23, 33],
                       'Salary':['$550', '$300', '$1000', '$4000']})
df1 = pandas.DataFrame({'Name': ['Anna', 'Adam', 'Tim', 'Joseph'],
                        'Age' : [22, 42, 33, 23],
                        'Salary':['$300', '$200', '$4000', '$1000']})
print('First DataFrame:\n', df)
print('\nSecond DataFrame:\n', df1)
merged_df = pandas.merge(df,df1)
print('\n', merged_df)

In the above code:

  • The “pandas” library is imported.
  • The “DataFrame()” is used twice to create two DataFrame named “df” and “df1”.
  • The “merge()” method accepts two DataFrame as an argument and merges them by returning the common columns from both DataFrame.

Output

The multiple columns of two Pandas DataFrame have been merged successfully.

Example 2: Merged the Specified Multiple Columns of Pandas DataFrame

This example merges the specified columns of DataFrame:

import pandas
df = pandas.DataFrame({'Name': ['Lily', 'Anna', 'Joseph', 'Tim'],
                       'Age' : [12, 22, 23, 33],
                       'Salary':['$550', '$300', '$1000', '$4000']})
df1 = pandas.DataFrame({'Name': ['Anna', 'Adam', 'Tim', 'Joseph'],
                        'Age' : [22, 42, 33, 23],
                        'Salary':['$300', '$200', '$4000', '$1000']})
print('First DataFrame:\n', df)
print('\nSecond DataFrame:\n', df1)
merged_df = pandas.merge(df,df1,on=['Name','Salary'])
print('\n', merged_df)

In the above code:

  • The “pandas” library is imported, and the DataFrame is created using the “DataFrame()” method.
  • The “merge()” method is used to merge the given multiple columns by passing the “on= [col_names]” parameter value.

Output

The specified columns have been merged successfully.

Conclusion

In Python, the “pandas.merge()” method of the “pandas” module is utilized to merge the multiple columns of two Pandas DataFrame. We can merge all the columns of Pandas DataFrame or pass the specified columns using the “on=” parameter to the “pandas.merge()” method. This article delivered a comprehensive guide on merging pandas DataFrame on multiple columns using numerous examples.

About the author

Haroon Javed

Hi, I'm Haroon. I am an electronics engineer and a technical content writer. I am a tech geek who loves to help people to the best of my knowledge.