Apache Spark

PySpark desc_nulls_first() and desc_nulls_last() Functions

If you want to sort the values in a column in the PySpark DataFrame having nulls in descending order, then you can go with the desc_nulls_first() and desc_nulls_last() functions.

Before discussing these functions, we will create a sample PySpark DataFrame.

Data

import pyspark
 
from pyspark.sql import SparkSession
 
spark_app = SparkSession.builder.appName('_').getOrCreate()
 
 
students =[(4,'sravan',23,None,None),
           (4,'chandana',23,'CSS','PySpark'),
           (46,'mounika',22,None,'.NET'),
           (4,'deepika',21,'HTML',None),
              ]
 
dataframe_obj = spark_app.createDataFrame( students,
 ['subject_id','name','age','technology1','technology2'])
 
dataframe_obj.show()

 
Output:


Now, there are 5 columns and 4 rows.

desc_nulls_first() Function

The desc_nulls_first() function sorts the values in a column in descending order, but it will place the existing null values in a column.

It can be used with the select() method to select the ordered columns. It is very important to use the orderBy() function because the main thing here is the sort.orderBy() takes the desc_nulls_first() function as a parameter.

Syntax

dataframe_obj.select(dataframe_obj.column).
 orderBy(dataframe_obj.column.desc_nulls_first())

Where dataframe_obj is the DataFrame and column is the column name in which the values are sorted, all the null values will be placed first.

So, our DataFrame is ready. Let’s demonstrate the desc_nulls_first() function.

Example 1

Now, we will sort the values in the technology1 column that has None/Null values in descending order using the desc_nulls_first() function.

#sort the technology1 column in descending order and get the null values first.
dataframe_obj.select(dataframe_obj.technology1).
 orderBy(dataframe_obj.technology1.desc_nulls_first()).show()

Output:


Actually, there are two null values. First, they are placed, and later HTML and CSS are sorted in descending order.

Example 2

Now, we will sort the values in the technology2 column that has None/Null values in Descending order using the desc_nulls_first() function.

#sort the technology2 column in descending order and get the null values first.
dataframe_obj.select(dataframe_obj.technology2).
 orderBy(dataframe_obj.technology2.desc_nulls_first()).show()

Output:


Actually, there are two null values. First, they are placed and later, PySpark and .NET are sorted in descending order.

desc_nulls_last() Function

The desc_nulls_last() function sorts the values in a column in descending order, but it will place the existing null values in a column.

It can be used with the select() method to select the ordered columns. It is very important to use orderBy() because the main thing here is the sort.orderBy() takes desc_nulls_first() as a parameter.

Syntax

dataframe_obj.select(dataframe_obj.column).
 orderBy(dataframe_obj.column.desc_nulls_last())

Where, dataframe_obj is the DataFrame and column is the column name in which the values are sorted such that all the null values will be placed as last.

So, our DataFrame is ready. Let’s demonstrate the desc_nulls_last() function.

Example 1

Now, we will sort the values in the technology2 column with None/Null values in descending order using the desc_nulls_last() function.

#sort the technology1 column in descending order and get the null values last.
dataframe_obj.select(dataframe_obj.technology1).
 orderBy(dataframe_obj.technology1.desc_nulls_last()).show()

Output:


Actually, there are two null values. First, HTML and CSS are sorted in descending order, and two null values are placed last.

Example 2

Now, we will sort the values in the technology2 column that has None/Null values in descending order using the desc_nulls_last() function.

#sort the technology2 column in descending order and get the null values last.
Dataframe_obj.select(dataframe_obj.technology2).
 orderBy(dataframe_obj.technology2.desc_nulls_last()).show()

Output:


Actually, there are two null values. First, .NET and PySpark are sorted in descending order, and two null values are placed last.

Overall Code

import pyspark
from pyspark.sql import SparkSession
 
spark_app = SparkSession.builder.appName('_').getOrCreate()
 
students =[(4,'sravan',23,None,None),
           (4,'chandana',23,'CSS','PySpark'),
           (46,'mounika',22,None,'.NET'),
           (4,'deepika',21,'HTML',None),
              ]
 
dataframe_obj = spark_app.createDataFrame( students,
 ['subject_id','name','age','technology1','technology2'])
 
dataframe_obj.show()

#sort the technology1 column in descending order and get the null values first.
dataframe_obj.select(dataframe_obj.technology1).
 orderBy(dataframe_obj.technology1.desc_nulls_first()).show()

#sort the technology2 column in descending order and get the null values first.
dataframe_obj.select(dataframe_obj.technology2).
 orderBy(dataframe_obj.technology2.desc_nulls_first()).show()

#sort the technology1 column in descending order and get the null values last.
dataframe_obj.select(dataframe_obj.technology1).
 orderBy(dataframe_obj.technology1.desc_nulls_last()).show()

#sort the technology1 column in descending order and get the null values last.
dataframe_obj.select(dataframe_obj.technology1).
 orderBy(dataframe_obj.technology1.desc_nulls_last()).show()

Conclusion

By the end of this PySpark tutorial, we learned that it is possible to deal with null while sorting the values in a DataFrame using the desc_nulls_first() and desc_nulls_last() functions. The desc_nulls_first() function sorts the values in a column in descending order, but it will arrange the existing null values in a column first. The desc_nulls_last() function sorts the values in a column in descending order, but it will arrange the existing null values in a column last. You can run the entire code specified in the last part of the tutorial.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain