Python Pandas

Pandas Sort by Multiple Columns

In this guide, we will learn how to sort the DataFrame by multiple columns using the pandas.DataFrame.sort_values() function. It is possible to sort the first column in ascending or descending order and the second column in ascending or descending order by controlling the order using the ascending parameter, etc. Let’s see all the parameters in this guide along, with examples.

pandas.DataFrame.sort_values

pandas.DataFrame.sort_values is used to sort the DataFrame by values along any axis (rows or columns). We can sort a DataFrame based on single or multiple columns.

Syntax

Let’s see the syntax of this function and parameters passed to it.

pandas.DataFrame.sort_values(by,  axis, ascending, inplace, kind, na_position,)

Parameters

1. The by parameter is required and takes a column name such that the records in the pandas DataFrame will be sorted based on this column values. To sort the DataFrame by multiple columns, we need to pass a list with column labels.

2. The axis parameter (default: 0 for index) specifies the axis. The by parameter can accept the column levels and/or index labels if axis=1. Accept Index levels and/or column labels if axis=0.

3. Records in the pandas DataFrame are sorted in ascending order by default if the ascending parameter is not specified. Set this parameter to False, if you want to sort the records in descending order. You can pass the list of boolean values (equal to the total number of columns passed in the by parameter).

4. DataFrame is sorted in-place if the inplace parameter is set to True. Otherwise, it is False by default.

5. Records in the pandas DataFrame are sorted based on the quicksort algorithm by default. You can also specify any of these algorithms – ‘mergesort’, ‘heapsort’, ‘stable’ to sort the records.

6. By default, the missing values (None/NaN) that exist in the pandas DataFrame are placed at the end of the DataFrame. It is possible to include all the records that hold the missing values with the na_position parameter by setting it to ‘first’.

In this entire guide, we will use this DataFrame. Make sure that you implemented the code before going to the examples.

DataFrame

Create a pandas DataFrame related to ‘campaign_data’ with 10 records and sort the records in the DataFrame based on multiple columns. The columns in this DataFrame are: [‘Campaign_id’, ‘Campaign_Name’,’ Type’, ‘Status’, ‘Budget’].

import pandas

campaign_data = [[None,'Java related','Webinar','Completed',25000],
                  [None,'Java related','Conference','Completed',5000],
                 [3,'Python Bootcamp','Conference','Planned',2000],
                  [None,'Tutorial camp','Webinar','In-Progress',1000],
                 [5,'Python Bootcamp','Webinar','Planned',2000],
                  [6,'Services','Webinar','Completed',2000],
                 [None,'Python Bootcamp','Webinar','Planned',2000],
                  [None,'Tutorial camp','Conference','In-Progress',500],
                 [9,'Tutorial camp','Webinar','Aborted',7000],
                  [None,'Services','Trade-Show','Completed',2000]]

# Create DataFrame from the campaign_data with 4 columns
df_from_campaign_data = pandas.DataFrame(campaign_data,columns=['Campaign_id','Campaign_Name','Type','Status','Budget'])

Output

Example 1: Sort by Two Columns

We are passing the ‘Campaign_Name’ and ‘Type’ columns to the by parameter through the List. The DataFrame is sorted based on the data in these two columns in ascending order by default.

# Sort the DataFrame based on 'Campaign_Name', 'Type' columns.
print(df_from_campaign_data.sort_values(by = ['Campaign_Name', 'Type']))

Output

You can see the DataFrame is sorted by these two columns. For ‘Java related.’ there are two types: ‘Conference’ and ‘Webinar,’ which are also sorted. Similarly, you can verify this for all the records in the Dataframe.

Example 2: Sort by Three Columns

We are passing ‘Status,’ ‘Type,’ and ‘Campaign_Name’ columns to the by parameter through the List. The DataFrame is sorted based on the data in these three columns in ascending order by default.

# Sort the DataFrame based on 'Status', 'Type', 'Campaign_Name' columns.
print(df_from_campaign_data.sort_values(by = ['Status', 'Type', 'Campaign_Name']))

Output

You can see the DataFrame is sorted by these three columns. First, the values are sorted in the ‘Status’ column, then the values are sorted in the ‘Type’ column based on the values in the ‘Status’ column. Finally, the values are sorted in the ‘Campaign_Name’ column based on the values in the ‘Type’ column.

Example 3: Sort by Multiple Columns in Different Order

1. Sort the DataFrame by ‘Campaign_Name’ and ‘Type.’ Set the ascending parameter to [True, False].

2. Sort the DataFrame by ‘Campaign_Name’ and ‘Type.’ Set the ascending parameter to [False, True].

# ascending parameter.
print(df_from_campaign_data.sort_values(by = ['Campaign_Name', 'Type'],ascending = [True, False]),"\n")
print(df_from_campaign_data.sort_values(by = ['Campaign_Name', 'Type'],ascending = [False, True]),"\n")

Output

1. In the first output, DataFrame is sorted such that Campaign_Name is sorted in ascending order, and Type is sorted in descending order with respect to the Campaign_Name.

2. In the second output, DataFrame is sorted such that Campaign_Name is sorted in the descending order, and Type is sorted in ascending order with respect to the Campaign_Name.

Example 4: Sort by Multiple Columns with na_position Parameter

1. Sort the DataFrame by ‘Campaign_id’ and ‘Campaign_Name’ by placing all the records with missing values at first.

2. Sort the DataFrame by ‘Campaign_id’ and ‘Campaign_Name’ by placing all the records with missing values at last.

# na_position parameter.
print(df_from_campaign_data.sort_values(by = ['Campaign_id', 'Campaign_Name'],na_position='first'),"\n")
print(df_from_campaign_data.sort_values(by = ['Campaign_id', 'Campaign_Name'],na_position='last'))

Output

1. In the first output, DataFrame is sorted based on the specified columns with NaN (missing values) records at first.

2. In the second output, DataFrame is sorted based on the specified columns with NaN (missing values) records at last.

Example 5: Sort by Multiple Columns with Kind Parameter

Sort the DataFrame by ‘Campaign_Name’ and ‘Type’ using the “mergesort” technique. Set this to the kind parameter.

# kind parameter
print(df_from_campaign_data.sort_values(by = ['Campaign_Name', 'Type'],kind='mergesort'))

Output

Conclusion

In pandas, sort_values() is a function that will sort the DataFrame based on a single column or multiple columns. All the parameters are discussed with separate examples. Also, we can visualize all the records with missing values in one place (Start/End of the DataFrame) with the na_position parameter.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain