pandas.DataFrame.sort_values
The pandas.DataFrame.sort_values is used to sort the DataFrame by values along any axis (rows or columns).
Syntax:
Let’s see the syntax of this function and parameters passed to it.
Parameters:
1. The by parameter is required and takes the column name such that the records in the pandas DataFrame will be sorted based on this column values. We can also pass multiple columns through the List.
2. The axis parameter, by default set to 0 (index), specifies the axis. The by parameter can accept the column levels and/or index labels if axis=1. Accept Index levels and/or column labels if axis=0.
3. Records in the pandas DataFrame are sorted in ascending order by default if the ascending parameter is not specified. Set this parameter to False if you want to sort the records in the descending order.
4. The DataFrame is sorted in-place if the inplace parameter is set to True. Otherwise, it is False by default.
5. Records in the pandas DataFrame are sorted based on the quicksort algorithm by default. You can also specify any of these algorithms – ‘mergesort,’ ‘heapsort,’ ‘stable’ – to sort the records.
6. By default, missing values (None/NaN) in the pandas DataFrame are placed at the end of the DataFrame. It is possible to include all the records that hold the missing values with the na_position parameter by setting it to ‘first’.
Example 1: By Parameter
Create pandas DataFrame related to ‘campaign_data’ with five records and sort the records in the DataFrame based on the columns.
campaign_data = [['Java related','Webinar','Completed',25000],
['Java related','Conference','Completed',5000],
['Python Bootcamp','Webinar','Planned',2000],
['Tutorial camp','Webinar','In-Progress',1000],
['Services','Trade-Show','Completed',2000]]
df_from_campaign_data = pandas.DataFrame(campaign_data,columns=['Campaign_Name','Type','Status','Budget'])
# Single Column
print(df_from_campaign_data.sort_values(by="Campaign_Name"),"\n")
# Multiple columns
print(df_from_campaign_data.sort_values(by=["Campaign_Name","Type"]))
Output
1. In the first output, the DataFrame is sorted based on the data present in the ‘Campaign_Name’ column.
2. In the second output, the DataFrame is sorted based on the data present in the ‘Campaign_Name’ and ‘Type’ columns.
Example 2: Ascending Parameter
Utilize the above pandas DataFrame and sort the records based on the ‘Type’ column in ascending & descending order.
campaign_data = [['Java related','Conference','Completed',25000],
['Sales camp','Conference','Completed',5000],
['Python Bootcamp','Webinar','Planned',2000],
['Tutorial camp','Webinar','In-Progress',1000],
['Services','Trade-Show','Completed',2000]]
df_from_campaign_data = pandas.DataFrame(campaign_data,columns=['Campaign_Name','Type','Status','Budget'])
# Ascending Order
print(df_from_campaign_data.sort_values(by='Type',ascending = True),"\n")
# Descending Order
print(df_from_campaign_data.sort_values(by='Type',ascending = False))
Output
In the first output, the DataFrame is sorted in ascending order based on the ‘Type’ column, while in the second output, the DataFrame is sorted in the descending order based on the ‘Type’ column.
Example 3: inplace Parameter
Utilize the pandas DataFrame above and sort the records based on the ‘Budget’ column, both with and without using the inplace parameter.
campaign_data = [['Java related','Conference','Completed',25000],
['Sales camp','Conference','Completed',5000],
['Python Bootcamp','Webinar','Planned',2000],
['Tutorial camp','Webinar','In-Progress',1000],
['Services','Trade-Show','Completed',2000]]
df_from_campaign_data = pandas.DataFrame(campaign_data,columns=['Campaign_Name','Type','Status','Budget'])
# inplace =False
df_from_campaign_data.sort_values(by='Budget',inplace =False)
print(df_from_campaign_data,"\n")
# inplace =True
df_from_campaign_data.sort_values(by='Budget',inplace =True)
print(df_from_campaign_data)
Output
1. In the first output, when inplace is set to False, the existing DataFrame is not updated, and actual DataFrame is returned.
2. In the second output, when inplace is set to True, the DataFrame is sorted based on the values in the ‘Budget’ column in ascending order.
Example 4: kind Parameter
Sort the records in the DataFrame with quicksort, heapsort, stable and mergesort. Pass these sorting algorithms one after another to the kind parameter.
campaign_data = [['Java related','Conference','Completed',25000],
['Sales camp','Conference','Completed',5000],
['Python Bootcamp','Webinar','Planned',2000],
['Tutorial camp','Webinar','In-Progress',1000],
['Services','Trade-Show','Completed',2000]]
df_from_campaign_data = pandas.DataFrame(campaign_data,columns=['Campaign_Name','Type','Status','Budget'])
# quicksort
print(df_from_campaign_data.sort_values(by='Budget',kind='quicksort'),"\n")
# mergesort
print(df_from_campaign_data.sort_values(by='Budget',kind='mergesort'),"\n")
# heapsort
print(df_from_campaign_data.sort_values(by='Budget',kind='heapsort'),"\n")
# stable
print(df_from_campaign_data.sort_values(by='Budget',kind='stable'))
Output
Example 5: na_position Parameter
1. Place all the records with missing values present in the Budget column at the end (na_position = ‘last’).
2. Place all the records with missing values present in the Budget column at the beginning (na_position = ‘first’).
campaign_data = [[None,'Conference',None,25000],
['Sales camp','Conference','Completed',5000],
['Python Bootcamp','Webinar','Planned',None],
['Tutorial camp','Webinar','In-Progress',None],
['Services','Trade-Show','Completed',2000]]
df_from_campaign_data = pandas.DataFrame(campaign_data,columns=['Campaign_Name','Type','Status','Budget'])
# Place all the records at last with missing values present in the Budget column.
print(df_from_campaign_data.sort_values(by='Budget',na_position ='last'),"\n")
# Place all the records at first with missing values present in the Budget column.
print(df_from_campaign_data.sort_values(by='Budget',na_position ='first'))
Output
There are two missing values exist in the Budget column. So, the corresponding records are placed last, and in the second output, these two records are placed at the first.
Conclusion
We discussed how to sort the values present in a pandas DataFrame using pandas.DataFrame.sort_values. It is used to sort the DataFrame by values along any axis (rows or columns). All the parameters are discussed with code snippets and output. Mostly, one DataFrame with five records and four columns is utilized in all the examples to understand the concept better.