Python

Pandas Cross Join

Python is a well-built language for data analysis, mainly due to the strong ecosystem of data-centric Python tools. Pandas has attributes for performing a left, right, inner, or outer join and merge over two DataFrames or Series. Moreover, there is currently no technique to do a cross join for merging or combining two operations with the how=”cross” argument. We will do so by utilizing the Pandas “merge()” method in this article with two examples.

Pandas Merge() Method

The “pd.merge()” can be utilized by the following given syntax:

Here, some necessary parameters are “left” which means the left DataFrame, the “right” which means the right DataFrame, the “how” on how to join the DataFrames, the “on” which refers to the column we used for joining the two DataFrames., and the “left_on” and “right_on” that help in specifying the left or right column for joining.

We will perform some practical examples to implement this method in this learning to find a cross join between two DataFrames.

Example 1: Utilizing Pandas Pd.Merge() Method to Get a Cross Join Between Two DataFrames with a Single Column

Beginning with the first illustration for the practical implementation of Python code to perform a cross join on Pandas DataFrames, we have to look for a tool or software that can run our Python codes. Many tools support the Python language. Among these different choices, we select the “Spyder” tool. We first need to install the setup of the “Spyder” tool. Once it’s done, we launch the tool. Open a new file by clicking the “File” button, pressing the file symbol or hitting the “Ctrl+N” keys.

Our new file with the “.py” extension which refers to “python” is ready to start working. Let’s now focus on the code. You might observed that the first word of our article’s title is “pandas” which means that something is going to be done by utilizing the “pandas” library. We understand that our prerequisite to implementing this code is to import the Pandas library into the Python file. We wrote a code line “import pandas as pd”. This imports all the features of the Pandas library. Also, we used the “as pd” which means that wherever  we need to access any Pandas method in this code, we have to write “pd” instead of writing the full form “pandas”.

As we perform the cross join, we are required to have two Pandas DataFrames where we exercise this method.  You will learn here how to construct a user-defined DataFrame. To create a DataFrame, Pandas gives us a “pd.DataFrame()” function where “pd” is the “pandas”. So, we access a Pandas method. The “DataFrame()” is the keyword of this function which, when invoked, generates a DataFrame. We make a DataFrame using this “pd.DataFrame()” method and initialize it with one single column “num”. This column holds two values which are “4” and “5”. Calling the “pd.DataFrame()” method generates a DataFrame with these provided values.

Now, to store this DataFrame, we create a DataFrame object “v1”.  The newly generated DataFrame is now accessible by this variable “v1”. To see this DataFrame on the terminal, we employ the “print()” method. Then, we create our second DataFrame by following the same steps mentioned while creating the first DataFrame “v1”. Invoke the “pd.DataFrame()” to create a DataFrame that has been initialized by one column having three values “r”, “s”, and “t”. To store this DataFrame, we create a variable “v2”. To display the “v2” DataFrame, we again utilize the “print()” method.

If you are new to the “Spyder” tool, you might be wondering how you will run the code. To execute this Python file, click the “Run file” button or press the “Shift+Enter” keys.  Now, you can see two DataFrames we just created that are displayed on the terminal of the “Spyder” tool.

The main task starts from here. We now have to apply the cross join on both of these DataFrames. To perform a cross join on two DataFrames, there must be some “key” column that is present in both the DataFrames to create a link between them so we can merge them by using it. As we can see, there isn’t any of it so we now add one in both DataFrames “v1” and “v2” which is a common column. We add the same “key” column to both DataFrames as “v1[‘key]=0” and “v2[‘key’]=0”. Now, we can merge them on this “key” column.

To merge them, we utilize the “pd.merge()” method.  Between its parentheses, we provide both DataFrames “v1” and “v2”. The “on” parameter asks us to give the common column name based on which we can merge them. So, it looks like “on = ‘key'”. With this function, we use the “.drop()” method to drop the “key” column once the merge is performed. The “drop()” function has two parameters – the name of the column “key” and the “axis=1” which means that the drop is column-wise. We create a variable “store” to hold the output of the “pd.merge()” function. We call the “print()” method to see the output.

The execution of the given program gives us a DataFrame that has all the possible row combinations from the given DataFrame.

Example 2: Utilizing Pandas Pd.Merge() Method to Get a Cross Join Between Two DataFrames with Multiple Columns

We will perform another example here on the same topic, the Pandas cross join.  For this, we launch our “Spyder” tool and open a new file by pressing the “Ctrl+N”. The most important requirement of the code is to import the necessary libraries. We utilize a Pandas method, so we import the Pandas library as pd.  Now, we construct our first DataFrame using the “pd.DataFrame()” method.

We initialize this DataFrame with two columns – “Color” and “Num”. The “Color” column holds three values which are “Red”, “Green”, and “Blue”. Whereas the “Num” column has the same length of values which are “101”, “110”, and “100”. We create a variable “P1” to store the output of calling the “pd.DataFrame()” method. Now, we can get the DataFrame by using this variable. We employ the “print()” function to display the first DataFrame on the terminal.

Our first DataFrame is successfully created. We generate the second DataFrame now. Again, we utilize the “pd.DataFrame()” method and create a column within its parentheses. This column “Serial” stores four values. These values are “C1”, “C2”, “C3”, and “C4”. To store this DataFrame, we create a variable “P2”. Then, we invoke the “print()” function to display the “P2” DataFrame.

Running the previous Python code yields us the following output which displays 2 DataFrames:

We generate a “key” column in each DataFrame where we can merge them. Here, we use the value “2” for both the “P1[‘key’]” and “P2[‘key’]” columns. Finally, we invoke the “pd.merge()” function to merge the DataFrames on the bases of the “key” column. and  the “.drop()” method to remove the “key” column after merging both DataFrames. We create a “paint” variable to store the merged DataFrame. The “print()” is utilized to display the final cross-joined DataFrame stored in “paint”.

This gets us to the following displayed single cross-joined DataFrame generated from merging two DataFrames.

Conclusion

Merging two DataFrames into a single cross-joined DataFrame is a very easy and important technique to learn. This article emphasized and explained the concept of the cross join on Pandas DataFrame. We elaborated on every minor detail from downloading the required tool to the achievement of the desired output. Through practical examples of Python codes implemented and executed on the “Spyder” tool, we made an intentional effort to bring to you a fruitful learning and easy-to-understand concept of Python’s Pandas.

About the author

Aqsa Yasin

I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.