Apache Spark

PySpark – Pandas Series: Arithmetic Operations

In Python, PySpark is a Spark module used to provide a similar kind of Processing like spark using Series, which will store the given data in an array (column in PySpark Internally). PySpark – pandas Series represents the pandas Series, but it holds the PySpark column internally. Pandas support Series data structure, and pandas is imported from the pyspark module.

Arithmetic operations are used to perform operations like addition, subtraction, multiplication, division, and modulus. Pyspark pandas series supports built-in functions that are used to perform these operations. We will demonstrate these operations in this LinxuHint tutorial.

Before that, you have to install the pyspark module and import it, like below:

Command

pip install pyspark

Syntax to import

from pyspark import pandas

After that, we can create or use the series from the pandas module.

Syntax to create pandas Series

pyspark.pandas.Series()

We can pass a list or list of lists with values.

Let’s create a pandas Series through pyspark that has five numeric values.

#import pandas from the pyspark module
from  pyspark import pandas
 
#create series with 5 elements
pyspark_series=pandas.Series([90,56,78,54,0])

print(pyspark_series)

Output

Now, we will go into our tutorial.

We will we use add, add(), sub(), mul(), div() and mod() in this tutorial on arithmetic operators in PySpark. Let’s see them one by one.

pyspark.pandas.Series.add()

add() in the pyspark pandas series is used to add elements in the entire series with a value.

It takes the value as a  parameter.

Syntax

pyspark_series.add(value)

Where,

  1. pyspark_series is the pyspark pandas series
  2. value takes numeric value to be added to the pyspark_series.

Example
In this example, we will add  5 to the series.

#import pandas from the pyspark module
from  pyspark import pandas
 
#create series with 5 elements
pyspark_series=pandas.Series([90,56,78,54,0])

#add 10 to each element i series
print(pyspark_series.add(10))

Output

We can see that 10 is added to each element in the series.

pyspark.pandas.Series.sub()

sub() in the pyspark pandas series is used to subtract elements in the entire series with a value.

It takes the value as a  parameter.

Syntax

pyspark_series.sub(value)

Where,

  1. pyspark_series is the pyspark pandas series
  2. value takes numeric value to be subtracted from the pyspark_series.

Example
In this example, we will subtract 10 from the series.

#import pandas from the pyspark module
from  pyspark import pandas
 
#create series with 5 elements
pyspark_series=pandas.Series([90,56,78,54,0])
 
#subtract 10 from each element in series
print(pyspark_series.sub(10))

Output

We can see that 10 is subtracted from each element in the series.

pyspark.pandas.Series.mul()

mul() in the pyspark pandas series is used to multiply elements in the entire series with a value.

It takes the value as a  parameter.

Syntax

pyspark_series.mul(value)

Where,

  1. pyspark_series is the pyspark pandas series
  2. value takes numeric value to be multiplied with the pyspark_series.

Example
In this example, we will multiply  10 with the series.

#import pandas from the pyspark module
from  pyspark import pandas
 
#create series with 5 elements
pyspark_series=pandas.Series([90,56,78,54,0])
 
#multiply 10 to each element in series
print(pyspark_series.mul(10))

Output

We can see that 10 is multiplied by each element in the series.

pyspark.pandas.Series.div()

div() in the pyspark pandas series is used to divide the elements in the entire series by a value. It returns a quotient.

It takes the value as a  parameter.

Syntax

pyspark_series.div(value)

Where,

  1. pyspark_series is the pyspark pandas series
  2. value takes numeric value to be multiplied with the pyspark_series.

Example
In this example, we will divide the series by 10.

#import pandas from the pyspark module
from  pyspark import pandas
 
#create series with 5 elements
pyspark_series=pandas.Series([90,56,78,54,0])
 
#divide series by 10
print(pyspark_series.div(10))

Output

We can see each element in the series is divided by 10 and returned quotient.

pyspark.pandas.Series.mod()

div() in the pyspark pandas series is used to divide the elements in the entire series by a value. It returns the remainder.

It takes the value as a  parameter.

Syntax

pyspark_series.mod(value)

Where,

  1. pyspark_series is the pyspark pandas series
  2. value takes numeric value to be multiplied with the pyspark_series.

Example
In this example, we will divide the series by 10.

#import pandas from the pyspark module
from  pyspark import pandas
 
#create series with 5 elements
pyspark_series=pandas.Series([90,56,78,54,0])
 
#divide series by 10
print(pyspark_series.mod(10))

Output

We can see each element in the series is divided by 10 and returned the remainder.

Conclusion

In this pyspark pandas tutorial, we discussed arithmetic operations performed on the pyspark pandas series. add() is used to add all the values in the entire series, and sub() is used to subtract values from the entire pyspark pandas series. mul() is used to multiply all the values in the entire series with a value, and div() is used to divide all the values by a value in the pyspark pandas series and return the quotient. mod() is used to divide all the values by a value in the pyspark pandas series and return the remainder. The difference between mod() and div() is mod() returns remainder but div() returns quotient.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain