Before discussing these functions, we will create a sample PySpark DataFrame.
Data
from pyspark.sql import SparkSession
spark_app = SparkSession.builder.appName('_').getOrCreate()
students =[(4,'sravan',23,None,None),
(4,'chandana',23,'CSS','PySpark'),
(46,'mounika',22,None,'.NET'),
(4,'deepika',21,'HTML',None),
]
dataframe_obj = spark_app.createDataFrame( students,
['subject_id','name','age','technology1','technology2'])
dataframe_obj.show()
Output:
Now, there are 5 columns and 4 rows.
desc_nulls_first() Function
The desc_nulls_first() function sorts the values in a column in descending order, but it will place the existing null values in a column.
It can be used with the select() method to select the ordered columns. It is very important to use the orderBy() function because the main thing here is the sort.orderBy() takes the desc_nulls_first() function as a parameter.
Syntax
orderBy(dataframe_obj.column.desc_nulls_first())
Where dataframe_obj is the DataFrame and column is the column name in which the values are sorted, all the null values will be placed first.
So, our DataFrame is ready. Let’s demonstrate the desc_nulls_first() function.
Example 1
Now, we will sort the values in the technology1 column that has None/Null values in descending order using the desc_nulls_first() function.
dataframe_obj.select(dataframe_obj.technology1).
orderBy(dataframe_obj.technology1.desc_nulls_first()).show()
Output:
Actually, there are two null values. First, they are placed, and later HTML and CSS are sorted in descending order.
Example 2
Now, we will sort the values in the technology2 column that has None/Null values in Descending order using the desc_nulls_first() function.
dataframe_obj.select(dataframe_obj.technology2).
orderBy(dataframe_obj.technology2.desc_nulls_first()).show()
Output:
Actually, there are two null values. First, they are placed and later, PySpark and .NET are sorted in descending order.
desc_nulls_last() Function
The desc_nulls_last() function sorts the values in a column in descending order, but it will place the existing null values in a column.
It can be used with the select() method to select the ordered columns. It is very important to use orderBy() because the main thing here is the sort.orderBy() takes desc_nulls_first() as a parameter.
Syntax
orderBy(dataframe_obj.column.desc_nulls_last())
Where, dataframe_obj is the DataFrame and column is the column name in which the values are sorted such that all the null values will be placed as last.
So, our DataFrame is ready. Let’s demonstrate the desc_nulls_last() function.
Example 1
Now, we will sort the values in the technology2 column with None/Null values in descending order using the desc_nulls_last() function.
dataframe_obj.select(dataframe_obj.technology1).
orderBy(dataframe_obj.technology1.desc_nulls_last()).show()
Output:
Actually, there are two null values. First, HTML and CSS are sorted in descending order, and two null values are placed last.
Example 2
Now, we will sort the values in the technology2 column that has None/Null values in descending order using the desc_nulls_last() function.
Dataframe_obj.select(dataframe_obj.technology2).
orderBy(dataframe_obj.technology2.desc_nulls_last()).show()
Output:
Actually, there are two null values. First, .NET and PySpark are sorted in descending order, and two null values are placed last.
Overall Code
from pyspark.sql import SparkSession
spark_app = SparkSession.builder.appName('_').getOrCreate()
students =[(4,'sravan',23,None,None),
(4,'chandana',23,'CSS','PySpark'),
(46,'mounika',22,None,'.NET'),
(4,'deepika',21,'HTML',None),
]
dataframe_obj = spark_app.createDataFrame( students,
['subject_id','name','age','technology1','technology2'])
dataframe_obj.show()
#sort the technology1 column in descending order and get the null values first.
dataframe_obj.select(dataframe_obj.technology1).
orderBy(dataframe_obj.technology1.desc_nulls_first()).show()
#sort the technology2 column in descending order and get the null values first.
dataframe_obj.select(dataframe_obj.technology2).
orderBy(dataframe_obj.technology2.desc_nulls_first()).show()
#sort the technology1 column in descending order and get the null values last.
dataframe_obj.select(dataframe_obj.technology1).
orderBy(dataframe_obj.technology1.desc_nulls_last()).show()
#sort the technology1 column in descending order and get the null values last.
dataframe_obj.select(dataframe_obj.technology1).
orderBy(dataframe_obj.technology1.desc_nulls_last()).show()
Conclusion
By the end of this PySpark tutorial, we learned that it is possible to deal with null while sorting the values in a DataFrame using the desc_nulls_first() and desc_nulls_last() functions. The desc_nulls_first() function sorts the values in a column in descending order, but it will arrange the existing null values in a column first. The desc_nulls_last() function sorts the values in a column in descending order, but it will arrange the existing null values in a column last. You can run the entire code specified in the last part of the tutorial.