Python

Pandas Concatenate Two DataFrames

The information we need often appears in numerous sources in real-world scenarios. To evaluate the statistics, we frequently require the integration of multiple files into one uniform DataFrame. With the help of Pandas, it is possible to quickly combine the Series as well as the DataFrame with the different types of predefined logic for the indexes plus relational algebra capabilities for the join and merge-type functions. Additionally, Pandas have tools that allow you to compare two Series or Dataframes and list any variances. You might require combining the data using a variety of approaches. For instance, you may concatenate the datasets to merge them. Concatenating the datasets may be done in a variety of methods.

Concatenation actions across the axis are handled by the concat() method and the additional setting logic for the indices on adjacent planes is also performed. We must consider certain selections while concatenating or appending the Dataframes. Such choices could involve things like if we want to keep the original indices, add more beneficial keys, and more.

Utilizing the Pandas concatenate method requires the following syntax:

syntax.jpg

We have a shed load of numerous settings at our disposal with this method to further tailor the concatenation of the data. You won’t necessarily need to entirely comprehend each of these in order to navigate. Nevertheless, it’s important to be aware of their existence as well as whatever they perform if your use case demands them.

Example 1: Concatenate Similar Columns of Two DataFrames Using Pandas Concatenate Function

The most simple and easiest example to start with is to concatenate the same columns of the two different DataFrames.

As we know, working with Python’s Pandas module requires the import of the Pandas library. So, we begin our practical implementation of the example codes by importing the Pandas library in Python as pd.

pandas.jpg

Once done, we are now ready to start working on our main script as the Pandas features are currently accessible to us.

We then create our foundational DataFrames. We need two Dataframes here as we have to perform the concatenation.

data.jpg

The variables “d1” and “d2” were generated and are shown in the given example. We utilized the Pandas DataFrame function to construct the DataFrames. The pd.DataFrame() method is invoked. Inside its braces, we have given it 2 values – id and name. The values for both columns of the DataFrames are assigned. We employed the Print() method to display both the DataFrames d1 and d2.

The following output image shows 2 DataFrames with the same columns:

data out.jpg

We successfully created our DataFrames. The next step is to concatenate them. For this purpose, we employ the Pandas concatenate method – pd.concate(). This method merges the data of the same columns of both the DataFrames d1 and d2.

con 1.jpg

We constructed a variable “con_output” which stores the result of invoking the pd.concat() function. You only need to supply the pd.concat() function with the objects that you wish to concatenate so the list of variables can be simply passed in. Considering this, we can enter the [d1, d2]. Make sure that if you are directly putting the list inside the pd.concate() function, you have to use the “[]” brackets. Otherwise, it gives an error prompt. We hen invoke the Print() method and passe it the “con_output” variable to display whatever we stored in it.

The concatenated DataFrames containing the similar columns are obtained by running the aforementioned program.

con 1 out.jpg

The DataFrames are merged like they were since we did not put in any parameters. Due to these factors, the actual indices settings are included. The index may occasionally require an adjustment. The ignore index=True parameter can be utilized to do this.

ignore.jpg

As an outcome, the indexes are altered starting at 0 going all the way to the endpoint of the size. The modified index values are shown in the following snapshot:

ignore out.jpg

Example 2: Concatenate Different Columns of Two DataFrames Using Pandas Concatenate Function with Join Parameter

We append our DataFrames to one another, vertically, to concatenate them. Utilizing the columns from each dataset that have similar values such as a shared unique id is a further method of combining the Dataframes. “Joining” is the process of merging the DataFrames by utilizing a shared field. The “Join key(s)” refers to the columns that include the shared data. This method of combining the DataFrames is frequently advantageous where a DataFrame serves as a “lookup table” for the supplementary content that we intend to integrate in the second table. Identical to how we connect the tables in a Relational database, this method joins many datasets together.

You possess flexibility in ways to treat the additional axes whenever we bind numerous Dataframes altogether, excluding the one that is getting combined.

There are two approaches to accomplish this. The first approach is to enter the join=’outer’ to obtain the combination of all these. The said setting is the default setting because no data is compromised. The other strategy is to make the crossing into account with the join=’inner’.

Let’s consider the following illustration:

nan.jpg

Here, we created two DataFrames with different columns. The first dataframe “d1” consists of 2 columns – id and name. Whereas the second DataFrame “d3” have 2 columns – city and age. We created a variable “outcome” to store the output of calling the pd.concat() function.

Between the parentheses of the Pandas concatenate function, we specified the name of the DataFrames as d1 and d3. The script’s final line calls the print() method.

This yields us the following output:

nan out.jpg

The two DataFrames in the previously mentioned instance are merged. Nonetheless, as some columns were absent from either DataFrames, they were supplied using the blank entries. These entries are inserted since the “join=” argument’s default value is “outer”, which explains their inclusion. As a result, all the data among the entities are retained.

“Inner” is a different viable argument for the method.

diff.jpg

We utilized the arguments “join” and “axis” in this case. For the argument “join”, we put the value “inner” whereas, for the “axis” argument, it is set to “1”. The “axis” is the axis along which we concatenate our DataFrames. It is set to 0 by default. The additional columns are introduced if there is a requirement to increase the number of objects across the axis=1. The pd.concat() performs an outer join across the rows by default. Now, we changed the default values so it performs the column-wise inner-join on the DataFrames for concatenation.

The output we get from the previously executed code is shown in the following:

diff out.jpg

Conclusion

This discussion focused on the Pandas concatenate function. We gave a detailed introduction to the Pandas concatenate function and the need to employ this method. The syntax for using this technique is provided at the beginning and all the parameters that you can use inside this function are identified. We elaborated the concatenation of the two DataFrames with a practical demonstration of the example codes. Combining the same columns of the different DataFrames as well as combining the DataFrame with different columns is explained in this writing. Learning to work with the pandas.concat() function accommodates you in handling and analyzing the data.

About the author

Aqsa Yasin

I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.