Arithmetic operations are used to perform operations like addition, subtraction, multiplication, division, and modulus. Pyspark pandas series supports built-in functions that are used to perform these operations. We will demonstrate these operations in this LinxuHint tutorial.
Before that, you have to install the pyspark module and import it, like below:
Command
Syntax to import
After that, we can create or use the series from the pandas module.
Syntax to create pandas Series
We can pass a list or list of lists with values.
Let’s create a pandas Series through pyspark that has five numeric values.
from pyspark import pandas
#create series with 5 elements
pyspark_series=pandas.Series([90,56,78,54,0])
print(pyspark_series)
Output
Now, we will go into our tutorial.
We will we use add, add(), sub(), mul(), div() and mod() in this tutorial on arithmetic operators in PySpark. Let’s see them one by one.
pyspark.pandas.Series.add()
add() in the pyspark pandas series is used to add elements in the entire series with a value.
It takes the value as a parameter.
Syntax
Where,
- pyspark_series is the pyspark pandas series
- value takes numeric value to be added to the pyspark_series.
Example
In this example, we will add 5 to the series.
from pyspark import pandas
#create series with 5 elements
pyspark_series=pandas.Series([90,56,78,54,0])
#add 10 to each element i series
print(pyspark_series.add(10))
Output
We can see that 10 is added to each element in the series.
pyspark.pandas.Series.sub()
sub() in the pyspark pandas series is used to subtract elements in the entire series with a value.
It takes the value as a parameter.
Syntax
Where,
- pyspark_series is the pyspark pandas series
- value takes numeric value to be subtracted from the pyspark_series.
Example
In this example, we will subtract 10 from the series.
from pyspark import pandas
#create series with 5 elements
pyspark_series=pandas.Series([90,56,78,54,0])
#subtract 10 from each element in series
print(pyspark_series.sub(10))
Output
We can see that 10 is subtracted from each element in the series.
pyspark.pandas.Series.mul()
mul() in the pyspark pandas series is used to multiply elements in the entire series with a value.
It takes the value as a parameter.
Syntax
Where,
- pyspark_series is the pyspark pandas series
- value takes numeric value to be multiplied with the pyspark_series.
Example
In this example, we will multiply 10 with the series.
from pyspark import pandas
#create series with 5 elements
pyspark_series=pandas.Series([90,56,78,54,0])
#multiply 10 to each element in series
print(pyspark_series.mul(10))
Output
We can see that 10 is multiplied by each element in the series.
pyspark.pandas.Series.div()
div() in the pyspark pandas series is used to divide the elements in the entire series by a value. It returns a quotient.
It takes the value as a parameter.
Syntax
Where,
- pyspark_series is the pyspark pandas series
- value takes numeric value to be multiplied with the pyspark_series.
Example
In this example, we will divide the series by 10.
from pyspark import pandas
#create series with 5 elements
pyspark_series=pandas.Series([90,56,78,54,0])
#divide series by 10
print(pyspark_series.div(10))
Output
We can see each element in the series is divided by 10 and returned quotient.
pyspark.pandas.Series.mod()
div() in the pyspark pandas series is used to divide the elements in the entire series by a value. It returns the remainder.
It takes the value as a parameter.
Syntax
Where,
- pyspark_series is the pyspark pandas series
- value takes numeric value to be multiplied with the pyspark_series.
Example
In this example, we will divide the series by 10.
from pyspark import pandas
#create series with 5 elements
pyspark_series=pandas.Series([90,56,78,54,0])
#divide series by 10
print(pyspark_series.mod(10))
Output
We can see each element in the series is divided by 10 and returned the remainder.
Conclusion
In this pyspark pandas tutorial, we discussed arithmetic operations performed on the pyspark pandas series. add() is used to add all the values in the entire series, and sub() is used to subtract values from the entire pyspark pandas series. mul() is used to multiply all the values in the entire series with a value, and div() is used to divide all the values by a value in the pyspark pandas series and return the quotient. mod() is used to divide all the values by a value in the pyspark pandas series and return the remainder. The difference between mod() and div() is mod() returns remainder but div() returns quotient.