Apache Spark

PySpark asc_nulls_first() and asc_nulls_last() Functions

If you want to sort the values in a column in PySpark DataFrame with having nulls in ascending order, then you can go with the asc_nulls_first() and asc_nulls_last() functions.

Before discussing these functions, we will create a sample PySpark DataFrame.

Data

 

import pyspark
 
from pyspark.sql import SparkSession
 
spark_app = SparkSession.builder.appName('_').getOrCreate()
 
 
students =[(4,'sravan',23,None,None),
           (4,'chandana',23,'CSS','PySpark'),
           (46,'mounika',22,None,'.NET'),
           (4,'deepika',21,'HTML',None),
              ]
 
dataframe_obj = spark_app.createDataFrame( students,['subject_id','name','age','technology1','technology2'])
 
dataframe_obj.show()

 
Output:


Now, there are 5 columns and 4 rows.

asc_nulls_first() Function

The asc_nulls_first() function sorts the values in a column in ascending order, but it will place the existing null values in a column first.

It can be used with the select() method to select the ordered columns. It is very important to use orderBy() because the main thing here is to sort.orderBy() and takes asc_nulls_first() as a parameter.

Syntax

 

dataframe_obj.select(dataframe_obj.column).orderBy(dataframe_obj.column.asc_nulls_
first())

 
Where dataframe_obj is the DataFrame and column is the column name in which the values are sorted, all the null values will be placed first.

So, our DataFrame is ready. Let’s demonstrate the asc_nulls_first() function.

Example 1

Now, we will sort the values in the technology1 column with None/Null values in ascending order using the asc_nulls_first() function.

#sort the technology1 column in ascending order and get the null values first.
dataframe_obj.select(dataframe_obj.technology1).orderBy(dataframe_obj.technology1.asc_
nulls_first()).show()

 
Output:


Actually, there are two null values. First, they are placed, and later, CSS and HTML are sorted in ascending order.

Example 2

Now, we will sort the values in the technology2 column that has None/Null values in ascending order using the asc_nulls_first() function.

#sort the technology2 column in ascending order and get the null values first.
Dataframe_obj.select(dataframe_obj.technology2).orderBy(dataframe_obj.technology2.asc_
nulls_first()).show()

 
Output:


Actually, there are two null values. First, they are placed, and later, .NET and PySpark are sorted in ascending order.

asc_nulls_last() Function

The asc_nulls_last() function sorts the values in a column in ascending order, but it will place the existing null values in a column last.

It can be used with the select() method to select the ordered columns. It is very important to use orderBy() because the main thing here is to sort.orderBy() as it takes asc_nulls_first() as a parameter.

Syntax

 

dataframe_obj.select(dataframe_obj.column).orderBy(dataframe_obj.column.asc_nulls_last())

 
Where dataframe_obj is the DataFrame and column is the column name in which the values are sorted, all the null values will be placed at last.

So, our DataFrame is ready. Let’s demonstrate the asc_nulls_last() function.

Example 1

Now, we will sort the values in the technology2 column that has None/Null values in ascending order using the asc_nulls_last() function.

#sort the technology1 column in ascending order and get the null values last.
dataframe_obj.select(dataframe_obj.technology1).orderBy(dataframe_obj.technology1.asc_
nulls_last()).show()

 
Output:


Actually, there are two null values. First, CSS and HTML are sorted in ascending order, and two null values are placed last.

Example 2

Now, we will sort the values in the technology2 column that has None/Null values in ascending order using the asc_nulls_last() function.

#sort the technology2 column in ascending order and get the null values last.
dataframe_obj.select(dataframe_obj.technology2).orderBy(dataframe_obj.technology2.asc_
nulls_last()).show()

 
Output:


Actually, there are two null values. First, .NET and PySpark are sorted in ascending order and two null values are placed last.

Overall Code

 

import pyspark
 
from pyspark.sql import SparkSession
 
spark_app = SparkSession.builder.appName('_').getOrCreate()
 
 
students =[(4,'sravan',23,None,None),
           (4,'chandana',23,'CSS','PySpark'),
           (46,'mounika',22,None,'.NET'),
           (4,'deepika',21,'HTML',None),
              ]
 
dataframe_obj = spark_app.createDataFrame( students,['subject_id','name','age','technology1','technology2'])
 
dataframe_obj.show()

#sort the technology1 column in ascending order and get the null values first.
dataframe_obj.select(dataframe_obj.technology1).orderBy(dataframe_obj.technology1.asc_
nulls_first()).show()

#sort the technology2 column in ascending order and get the null values first.
dataframe_obj.select(dataframe_obj.technology2).orderBy(dataframe_obj.technology2.asc_
nulls_first()).show()

#sort the technology1 column in ascending order and get the null values last.
dataframe_obj.select(dataframe_obj.technology1).orderBy(dataframe_obj.technology1.asc_
nulls_last()).show()

#sort the technology2 column in ascending order and get the null values last.
dataframe_obj.select(dataframe_obj.technology2).orderBy(dataframe_obj.technology2.asc_
nulls_last()).show()

 

Conclusion

By the end of this PySpark tutorial, we learned that it is possible to deal with null while sorting the values in a DataFrame using the asc_nulls_first() and asc_nulls_last() functions. The asc_nulls_first() function sorts the values in a column in ascending order, but it will place the existing null values in a column first. The asc_nulls_last() function sorts the values in a column in ascending order, but it will place the existing null values in a column last. You can run the entire code specified in the last part of the tutorial.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain