Python Pandas

Pandas Sort Values

It is good to sort the data when visualizing a pandas DataFrame. If there are any missing values in the DataFrame, we can place all the records with the missing values at the start or end of the DataFrame. In this guide, we will explore how to sort a pandas DataFrame using pandas.DataFrame.sort_values function. Each parameter that we are going to pass to this function will be discussed with an example in detail.

pandas.DataFrame.sort_values

The pandas.DataFrame.sort_values is used to sort the DataFrame by values along any axis (rows or columns).

Syntax:

Let’s see the syntax of this function and parameters passed to it.

pandas.DataFrame.sort_values(by,  axis, ascending, inplace, kind, na_position,)

Parameters:

1. The by parameter is required and takes the column name such that the records in the pandas DataFrame will be sorted based on this column values. We can also pass multiple columns through the List.
2. The axis parameter, by default set to 0 (index), specifies the axis. The by parameter can accept the column levels and/or index labels if axis=1. Accept Index levels and/or column labels if axis=0.
3. Records in the pandas DataFrame are sorted in ascending order by default if the ascending parameter is not specified. Set this parameter to False if you want to sort the records in the descending order.
4. The DataFrame is sorted in-place if the inplace parameter is set to True. Otherwise, it is False by default.
5. Records in the pandas DataFrame are sorted based on the quicksort algorithm by default. You can also specify any of these algorithms – ‘mergesort,’ ‘heapsort,’ ‘stable’ – to sort the records.
6. By default, missing values (None/NaN) in the pandas DataFrame are placed at the end of the DataFrame. It is possible to include all the records that hold the missing values with the na_position parameter by setting it to ‘first’.

Example 1: By Parameter

Create pandas DataFrame related to ‘campaign_data’ with five records and sort the records in the DataFrame based on the columns.

import pandas

campaign_data = [['Java related','Webinar','Completed',25000],
                  ['Java related','Conference','Completed',5000],
                 ['Python Bootcamp','Webinar','Planned',2000],
                  ['Tutorial camp','Webinar','In-Progress',1000],
                  ['Services','Trade-Show','Completed',2000]]
                 
df_from_campaign_data = pandas.DataFrame(campaign_data,columns=['Campaign_Name','Type','Status','Budget'])

# Single Column
print(df_from_campaign_data.sort_values(by="Campaign_Name"),"\n")

# Multiple columns
print(df_from_campaign_data.sort_values(by=["Campaign_Name","Type"]))

Output

1. In the first output, the DataFrame is sorted based on the data present in the ‘Campaign_Name’ column.
2. In the second output, the DataFrame is sorted based on the data present in the ‘Campaign_Name’ and ‘Type’ columns.

Example 2: Ascending Parameter

Utilize the above pandas DataFrame and sort the records based on the ‘Type’ column in ascending & descending order.

import pandas

campaign_data = [['Java related','Conference','Completed',25000],
                  ['Sales camp','Conference','Completed',5000],
                 ['Python Bootcamp','Webinar','Planned',2000],
                  ['Tutorial camp','Webinar','In-Progress',1000],
                  ['Services','Trade-Show','Completed',2000]]
                 
df_from_campaign_data = pandas.DataFrame(campaign_data,columns=['Campaign_Name','Type','Status','Budget'])

# Ascending Order
print(df_from_campaign_data.sort_values(by='Type',ascending = True),"\n")

# Descending Order
print(df_from_campaign_data.sort_values(by='Type',ascending = False))

Output

In the first output, the DataFrame is sorted in ascending order based on the ‘Type’ column, while in the second output, the DataFrame is sorted in the descending order based on the ‘Type’ column.

Example 3: inplace Parameter

Utilize the pandas DataFrame above and sort the records based on the ‘Budget’ column, both with and without using the inplace parameter.

import pandas

campaign_data = [['Java related','Conference','Completed',25000],
                  ['Sales camp','Conference','Completed',5000],
                 ['Python Bootcamp','Webinar','Planned',2000],
                  ['Tutorial camp','Webinar','In-Progress',1000],
                  ['Services','Trade-Show','Completed',2000]]
                 
df_from_campaign_data = pandas.DataFrame(campaign_data,columns=['Campaign_Name','Type','Status','Budget'])

# inplace =False
df_from_campaign_data.sort_values(by='Budget',inplace =False)
print(df_from_campaign_data,"\n")

# inplace =True
df_from_campaign_data.sort_values(by='Budget',inplace =True)
print(df_from_campaign_data)

Output

1. In the first output, when inplace is set to False, the existing DataFrame is not updated, and actual DataFrame is returned.
2. In the second output, when inplace is set to True, the DataFrame is sorted based on the values in the ‘Budget’ column in ascending order.

Example 4: kind Parameter

Sort the records in the DataFrame with quicksort, heapsort, stable and mergesort. Pass these sorting algorithms one after another to the kind parameter.

import pandas

campaign_data = [['Java related','Conference','Completed',25000],
                  ['Sales camp','Conference','Completed',5000],
                 ['Python Bootcamp','Webinar','Planned',2000],
                  ['Tutorial camp','Webinar','In-Progress',1000],
                  ['Services','Trade-Show','Completed',2000]]
                 
df_from_campaign_data = pandas.DataFrame(campaign_data,columns=['Campaign_Name','Type','Status','Budget'])

# quicksort
print(df_from_campaign_data.sort_values(by='Budget',kind='quicksort'),"\n")

# mergesort
print(df_from_campaign_data.sort_values(by='Budget',kind='mergesort'),"\n")

# heapsort
print(df_from_campaign_data.sort_values(by='Budget',kind='heapsort'),"\n")

# stable
print(df_from_campaign_data.sort_values(by='Budget',kind='stable'))

Output

Example 5: na_position Parameter

1. Place all the records with missing values present in the Budget column at the end (na_position = ‘last’).
2. Place all the records with missing values present in the Budget column at the beginning (na_position = ‘first’).

import pandas

campaign_data = [[None,'Conference',None,25000],
                  ['Sales camp','Conference','Completed',5000],
                 ['Python Bootcamp','Webinar','Planned',None],
                  ['Tutorial camp','Webinar','In-Progress',None],
                  ['Services','Trade-Show','Completed',2000]]
                 
df_from_campaign_data = pandas.DataFrame(campaign_data,columns=['Campaign_Name','Type','Status','Budget'])

# Place all the records at last with missing values present in the Budget column.
print(df_from_campaign_data.sort_values(by='Budget',na_position ='last'),"\n")

# Place all the records at first with missing values present in the Budget column.
print(df_from_campaign_data.sort_values(by='Budget',na_position ='first'))

Output

There are two missing values exist in the Budget column. So, the corresponding records are placed last, and in the second output, these two records are placed at the first.

Conclusion

We discussed how to sort the values present in a pandas DataFrame using pandas.DataFrame.sort_values. It is used to sort the DataFrame by values along any axis (rows or columns). All the parameters are discussed with code snippets and output. Mostly, one DataFrame with five records and four columns is utilized in all the examples to understand the concept better.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain