Apache Spark

Bitwise Functions in PySpark

Let’s discuss about the Bitwise operations performed on the columns in PySpark DataFrame.

BitwiseOR() Function

This function performs the Bitwise OR operation on two columns in PySpark DataFrame.

Operation:

1 bitwiseOR 1 => 1
1 bitwiseOR 0 => 1
0 bitwiseOR 1 => 1
0 bitwiseOR 0 => 0

 
Syntax:

It can be used with the select() method to display the Bitwise operation.

dataframe_obj.select(dataframe_obj.column1.bitwiseOR(dataframe_obj.column2))

 
Where the dataframe_obj is the PySpark DataFrame and the column represents the column names (column1, column2).

Example:

We have a DataFrame with 4 rows and 5 columns – [‘subject_id’,’name’,’age’,’m1′,’m2′]. Now, we apply the bitwiseAND() function on the m1 and m2 columns.

import pyspark
 
from pyspark.sql import SparkSession
 
spark_app = SparkSession.builder.appName('_').getOrCreate()
 
 
students =[(4,'sravan',23,0,0),
           (4,'chandana',23,0,1),
           (46,'mounika',22,1,0),
           (4,'deepika',21,1,1),
              ]
 
dataframe_obj = spark_app.createDataFrame( students,['subject_id','name','age','m1','m2'])
 
dataframe_obj.show()

 
 #perform bitwise-and operation on m1 and m2 columns
dataframe_obj.select(dataframe_obj.m1.bitwiseAND(dataframe_obj.m2)).show()

 
Output:

BitwiseAND() Function

This function performs the Bitwise AND operation on two columns in PySpark DataFrame.

Operation:

1 bitwiseAND 1 => 1
1 bitwiseAND 0 => 0
0 bitwiseAND 1 => 0
0 bitwiseAND 0 => 0

 
Syntax:

It can be used with the select() method to display the Bitwise operation.

dataframe_obj.select(dataframe_obj.column1.bitwiseAND(dataframe_obj.column2))

 
Where the dataframe_obj is the PySpark DataFrame and the column represents the column names (column1,column2).

Example:

We have a DataFrame with 4 rows and 5 columns – [‘subject_id’,’name’,’age’,’m1′,’m2′]. Now, we apply the bitwiseAND() function on the m1 and m2 columns.

import pyspark
 
from pyspark.sql import SparkSession
 
spark_app = SparkSession.builder.appName('_').getOrCreate()
 
 
students =[(4,'sravan',23,0,0),
           (4,'chandana',23,0,1),
           (46,'mounika',22,1,0),
           (4,'deepika',21,1,1),
              ]
 
dataframe_obj = spark_app.createDataFrame( students,['subject_id','name','age','m1','m2'])
 
dataframe_obj.show()

 
 #perform bitwise-and operation on m1 and m2 columns
dataframe_obj.select(dataframe_obj.m1.bitwiseAND(dataframe_obj.m2)).show()

 
Output:

BitwiseXOR() Function

This function performs the Bitwise XOR operation on two columns in PySpark DataFrame.

Operation:

1 bitwiseXOR 1 => 0
1 bitwiseXOR 0 => 1
0 bitwiseXOR 1 => 1
0 bitwiseXOR 0 => 0

 
Syntax:
 
It can be used with the select() method to display the Bitwise operation.

dataframe_obj.select(dataframe_obj.column1.bitwiseXOR(dataframe_obj.column2))

 
Where the dataframe_obj is the PySpark DataFrame and the column represents the column names (column1,column2).

Example:
 
We have a DataFrame with 4 rows and 5 columns – [‘subject_id’,’name’,’age’,’m1′,’m2′]. Now, we apply the bitwiseXOR() function on the m1 and m2 columns.

import pyspark
 
from pyspark.sql import SparkSession
 
spark_app = SparkSession.builder.appName('_').getOrCreate()
 
 
students =[(4,'sravan',23,0,0),
           (4,'chandana',23,0,1),
           (46,'mounika',22,1,0),
           (4,'deepika',21,1,1),
              ]
 
dataframe_obj = spark_app.createDataFrame( students,['subject_id','name','age','m1','m2'])
 
dataframe_obj.show()

 
 #perform bitwise-xor operation on m1 and m2 columns
dataframe_obj.select(dataframe_obj.m1.bitwiseXOR(dataframe_obj.m2)).show()

 
Output:
 

Conclusion

In this PySpark tutorial, we learned how to perform the Bitwise operations on PySpark DataFrame. BitwiseOR returns 1 if any one of the values in the PySpark DataFrame columns is 1 in a row. BitwiseXOR returns 1 if the values in two columns are different; otherwise, it returns 0. And BitwiseAND returns 1 if the values in two columns are 1; otherwise, it returns 0.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain