Apache Spark

PySpark signum() Function

The main advantage of the signum function is providing a sign to the given values. In PySpark DataFrame with the numeric columns, if the value is less than 0, then the signum returns -1. If the value is equal to 0, then the signum returns 0. Otherwise, it returns 1.

signum() Function

The signum() function is a mathematical function used in PySpark. It is available in the pyspark.sql.functions module.

It can be used with the select method because select() displays the signum values in the PySpark DataFrame.

Syntax:
dataframe_obj.select(signum(dataframe_obj.column))

Parameter:
It takes the column name as a parameter to return signum values for that column.

Now, we will see some examples to understand this function better.

Example 1
Let’s create a PySpark DataFrame with 3 rows and 4 columns, plus all numeric types and return signum values.

import pyspark
import math
from pyspark.sql import SparkSession
from pyspark.sql.functions import signum
 
spark_app = SparkSession.builder.appName('_').getOrCreate()
 
 
 #create math values
values =[(math.pi,0,7.8,120),
           (math.pi/2,1,0.5,180),
           (math.pi/3,-5,-12.9,360)
              ]
 #assign columns by creating the PySpark DataFrame
dataframe_obj = spark_app.createDataFrame (values,['value1','value2','value3','value4'])
 
dataframe_obj.show()

 #get the signum values of value1 column
dataframe_obj.select (signum(dataframe_obj.value1)).show()

Output:

So, for column value1, we returned the signum values.
3.141592653589793 is greater than 0. So, the signum is 1.
1.5707963267948966 is greater than 0. So, the signum is 1.
1.0471975511965976 is greater than 0. So, the signum is 1.

Example 2
Now, we will return the signum values for value2 and value3 columns.

import pyspark
import math
from pyspark.sql import SparkSession
from pyspark.sql.functions import signum
 
spark_app = SparkSession.builder.appName('_').getOrCreate()
 
 
 #create math values
values =[(math.pi,0,7.8,120),
           (math.pi/2,1,0.5,180),
           (math.pi/3,-5,-12.9,360)
              ]
 #assign columns by creating the PySpark DataFrame
dataframe_obj = spark_app.createDataFrame (values,['value1','value2','value3','value4'])
 
dataframe_obj.show()

 #get the signum values of value2 and value3 column
dataframe_obj.select (signum(dataframe_obj.value2),signum(dataframe_obj.value3)).show()

Output:

column – value2:

0 is 0. So, the signum is 0.
1 is greater than 0. So, the signum is 1.
-5 is less than 0. So, the signum is -1.

Column – value3:

7.8 is greater than 0. So, the signum is 1.
0.5 is greater than 0. So, the signum is 1.
-12.9 is less than 0. So, the signum is -1.

Note: signum() function will return null if you apply it in string values. It only works on numeric data.

Conclusion

In this PySpark tutorial, we discussed the signum() function. signum() is a mathematical function that can be used in PySpark. It is available in the pyspark.sql.functions module. In a DataFrame column, if the value is less than 0, then the signum returns -1. If the value is equal to 0, the signum returns 0. Otherwise, it returns 1.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain