Apache Spark

PySpark shiftleft() and shiftright() Functions

Shift operations mean shifting the bits with respect to the total number of positions provided in a binary number.

So, internally, the number is converted into binary format, and bits are shifted. There are two possibilities. Either the bits will shift from left to right or right to left.

Shifting bits from left to right is called the left shift, and shifting bits from right to left is known as the right shift.

In this PySpark tutorial, we will see how to do left shift and right shift on the values in a particular column of a DataFrame.

shiftleft() Function

It is available in the pyspark.sql.functions module. The shiftleft() function shifts the bits to the left.

It is evaluated to the formula, value*(2^shift).

For example:

Let’s shift the value 12 by 3 bits.

Solution:

12*(2*3)
=> 12*8
=> 96

So, 12 after shifting 3 bites is 96.

This can be used inside the select() method since we can display the shifted values in a column using the select() method in PySpark.

Syntax
dataframe_obj.select(dataframe_obj.m1,shiftleft(dataframe_obj.column,n))

Parameters:
The shiftleft() function takes two parameters.

  1. The first parameter is the column name.
  2. The second parameter is the integer representing the total number of bits to be shifted left.

Example 1
So, we will create a PySpark DataFrame with 5 rows and 5 columns: [‘subject_id’,’name’,’age’,’m1′,’m2′].
And shift 2 bits in m1 and m2 columns to the left.

Import pyspark
 
from pyspark.sql import SparkSession
 
spark_app = SparkSession.builder.appName('_').getOrCreate()
 
 
students =[(4,'sravan',23,78,90),
           (4,'chandana',23,67,89),
           (46,'mounika',22,54,67),
           (4,'deepika',21,100,100),
           (46,'chandrika',22,50,50),
              ]
 
dataframe_obj = spark_app.createDataFrame( students,['subject_id','name','age','m1','m2'])
 
dataframe_obj.show()

 
# Import the shiftleft function from the module - pyspark.sql.functions
from pyspark.sql.functions import shiftleft
 
 #perform shiftleft operation on all values in m1 column by 2 positions
 # and m2 column by 2 positions
dataframe_obj.select(dataframe_obj.m1, shiftleft(dataframe_obj.m1,2), dataframe_obj.m2, shiftleft(dataframe_obj.m2,2)).show()

Output:

Explanation

In the m1 column:

78 => 78*(2^2) = 312
67 => 67*(2^2) = 268
54 => 54*(2^2) = 216
100 => 100*(2^2) = 400
50 => 50*(2^2) = 200

In the m2 column:

90 => 90*(2^2) = 360
89 => 89*(2^2) = 356
67 => 67*(2^2) = 268
100 => 100*(2^2) = 400
50 => 50*(2^2) = 200

Example 2
So, we will create a PySpark DataFrame with 5 rows and 5 columns: [‘subject_id’,’name’,’age’,’m1′,’m2′].
And shift 4 bits in m1 and m2 columns to the left.

import pyspark
 
from pyspark.sql import SparkSession
 
spark_app = SparkSession.builder.appName('_').getOrCreate()
 
 
students =[(4,'sravan',23,78,90),
           (4,'chandana',23,67,89),
           (46,'mounika',22,54,67),
           (4,'deepika',21,100,100),
           (46,'chandrika',22,50,50),
              ]
 
dataframe_obj = spark_app.createDataFrame( students,['subject_id','name','age','m1','m2'])
 
dataframe_obj.show()
 
 
# Import the shiftleft function from the module - pyspark.sql.functions
from pyspark.sql.functions import shiftleft
 
 #perform shiftleft operation on all values in m1 column by 4 positions
 # and m2 column by 4 positions
dataframe_obj.select(dataframe_obj.m1, shiftleft(dataframe_obj.m1,4), dataframe_obj.m2,shiftleft(dataframe_obj.m2,4)).show()

Output:

Explanation

In the m1 column:

78 => 78*(2^4) = 1248
67 => 67*(2^4) = 1072
54 => 54*(2^4) = 864
100 => 100*(2^4) = 1600
50 => 50*(2^4) = 800

In the m2 column:

90 => 90*(2^4) = 1440
89 => 89*(2^4) = 1424
67 => 67*(2^4) = 1072
100 => 100*(2^4) = 1600
50 => 50*(2^4) = 800

shiftright() Function

It is available in the pyspark.sql.functions module. The shiftright() function shifts the bits to the right.

It is evaluated to the formula, value/(2^shift)

For example: Let’s shift the value 12 by 3 bits.

Solution:

12/(2*3)
=> 12/8
=> 1

Thus, 12 after shifting 3 bites from right to left is 1.

This can be used inside the select() method. Because we can display the shifted values in a column using the select() method in PySpark.

Syntax
dataframe_obj.select(dataframe_obj.m1,shiftright(dataframe_obj.column,n))

Parameters:
shiftright() takes two parameters.

  1. The first parameter is the column name.
  2. The second parameter is the integer representing the total number of bits to be shifted right.

Example 1
So, we will create a PySpark DataFrame with 5 rows and 5 columns: [‘subject_id’,’name’,’age’,’m1′,’m2′].
And shift 2 bits in m1 and m2 columns to the right.

import pyspark
 
from pyspark.sql import SparkSession
 
spark_app = SparkSession.builder.appName('_').getOrCreate()
 
 
students =[(4,'sravan',23,78,90),
           (4,'chandana',23,67,89),
           (46,'mounika',22,54,67),
           (4,'deepika',21,100,100),
           (46,'chandrika',22,50,50),
              ]
 
dataframe_obj = spark_app.createDataFrame( students,['subject_id','name','age','m1','m2'])
 
dataframe_obj.show()
 
 
# Import the shiftright function from the module - pyspark.sql.functions
from pyspark.sql.functions import shiftright
 
 #perform shiftright operation on all values in m1 column by 2 positions
 # and m2 column by 2 positions
dataframe_obj.select(dataframe_obj.m1, shiftright(dataframe_obj.m1,2), dataframe_obj.m2,shiftright(dataframe_obj.m2,2)).show()

Output:

Explanation

In the m1 column:

78 => 78/(2^2) = 19
67 => 67/(2^2) = 16
54 => 54/(2^2) = 13
100 => 100/(2^2) = 25
50 => 50/(2^2) = 12

In m2 column:

90 => 90/(2^2) = 22
89 => 89/(2^2) = 22
67 => 67/(2^2) = 16
100 => 100/(2^2) = 25
50 => 50/(2^2) = 12

Example 2
So, we will create a PySpark DataFrame with 5 rows and 5 columns: [‘subject_id’,’name’,’age’,’m1′,’m2′].
And shift 4 bits in m1 and m2 columns to the right.

import pyspark
 
from pyspark.sql import SparkSession
 
spark_app = SparkSession.builder.appName('_').getOrCreate()
 
 
students =[(4,'sravan',23,78,90),
           (4,'chandana',23,67,89),
           (46,'mounika',22,54,67),
           (4,'deepika',21,100,100),
           (46,'chandrika',22,50,50),
              ]
 
dataframe_obj = spark_app.createDataFrame( students,['subject_id','name','age','m1','m2'])
 
dataframe_obj.show()
 
 
# Import the shiftright function from the module - pyspark.sql.functions
from pyspark.sql.functions import shiftright
 
 #perform shiftright operation on all values in m1 column by 4 positions
 # and m2 column by 4 positions
dataframe_obj.select(dataframe_obj.m1, shiftright(dataframe_obj.m1,4), dataframe_obj.m2, shiftright(dataframe_obj.m2,4)).show()

Output:

Explanation

In the m1 column:

78 => 78/(2^4) = 4
67 => 67/(2^4) = 4
54 => 54/(2^4) = 3
100 => 100/(2^4) = 6
50 => 50/(2^4) = 3

In the m2 column:

90 => 90/(2^4) = 5
89 => 89/(2^4) = 5
67 => 67/(2^4) = 4
100 => 100/(2^4) = 6
50 => 50/(2^4) = 3

Conclusion

In this PySpark tutorial, we see how to shift the bits in the PySpark DataFrame. The shiftleft() function shifts the bits to the left. It is evaluated to the formula, value*(2^shift). The shiftright() function shifts the bits to the right. It is evaluated to the formula, value/(2^shift). Make sure that you have to import these two functions from the pyspark.sql.functions module.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain