Python

Cumulative Product Pandas

Python is a user-friendly high-level programming language considered best for performing data analysis. The main reason for this is that Python provides a great ecosystem of data-centric packages like Pandas. The Pandas in Python makes it simple to import and analyze the data.

One of the Panda’s functions is series.cumprod(). This method is used to calculate the cumulative product of a series. In this article, we will explain how to calculate the cumulative product using Panda’s library in Python.

What Is a Cumulative Product?

A cumulative product is an iterative product of each element in an array. A given sequence is iteratively multiplied by each element in that sequence. Every resulting value is the sum of the array’s current and previous values. For example, we have a sequence of 3 elements [x, y, z], and the cumulative product will be [x, xy, xyz].

The Pandas in Python provide a couple of functions to calculate the cumulative product of a series. cumprod() is one of those functions that are commonly used to find the cumulative product of a series in Python. A DataFrame or series of elements provided to the cumprod() function calculates the cumulative product and returns the same size of DataFrame or series containing the cumulative product.

What Is the Syntax of Series.cumprod() Method?

Here is the syntax of series.cumprod() method:

The Series.cumprod() takes two parameters; axis and skipna. The value for the axis is either 0 or 1; or it is either index or column, 0 and index, both represent the row-wise operation, while 1 and column both represent the column-wise operation. Furthermore, the skipna value is a Boolean value (True or False). It’s used to skip over the NA values in a DataFrame. The Series.cumprod() returns the same size of series as the input.

Now, let us proceed with the examples to see how we can implement the cumprod() function in Python.

Example 1

In this example, we will create a small series of numbers that also contains an NA value. The NA value is kept in the series to see how cumprod() reacts to it. Moreover, no value is provided for the skipna parameter, making the cumprod() use the default skipna value TRUE. See the code below.

As previously discussed, the cumulative product is the product of the current value and all the previous values in the array. The first item in the original array is always equal to the first cumulative product. The second value is the product of the first and second values, 2 * 3 = 6, and the third value is the product of the first three values, 2 * 3 * 5 = 30.

Now, if the fourth value is NaN, the skipna was True, which made cumprod() skip the NA value and move forward, returning NA for the current value. The same process of the cumulative product is followed for the rest of the values in the array.

import pandas as pd

import numpy as np

num = [2, 3, 5, np.nan, 7, 9, 1, 0]

= pd.Series(num)

cumprod = s.cumprod()

print(cumprod)

See the following output to know the cumulative product of each value in the array:

Example 2

In the previous example, we have not provided the value for skipna keeping the skipna True by default. Now, we will provide FALSE for skipna so that cumprod() does not skip NA, and we can see what will happen in that case.

By providing the FALSE value for skipna, we are forcing the cumprod() to notice the NA value at any point and compare it every time on its occurrence. See the following code to learn how to provide a FALSE value for the skipna parameter:

import pandas as pd

import numpy as np

num = [2, 3, 5, np.nan, 7, 9, 1, 0]

= pd.Series(num)

cumprod = s.cumprod(skipna = False)

print(cumprod)

Here is the output of the previous code:

Note that the first four values are the same as the previous example. However, the fifth value becomes NA as we had provided skipna = false, which means NA is not ignored and compared when it occurred in the list. Thus, making all the remaining values NA.

Example 3

We have seen the cumulative product of a simple array in the previous examples. Let’s look at how we can compute an array’s cumulative product dependent on the axis. This example will provide two columns in an array and find their cumulative product. Here is the code for that:

import numpy as np

arr = np.array([[1, 3, 5, 7], [2, 4, 6, 8]])

print("The input array is = ",arr)

res = np.cumprod(arr)

print("The cumulative product of the input array is = ",res)

Note that the resultant array is the sum of the length of both columns, which is 4 + 4 = 8. See the output below:

Example 4

Now, we know that it is possible to compute the cumulative product of an array based on the axis, we can decide whether we want to compute the cumulative product of all the axis or just the 1 axis at one time. See the code below to know how we can achieve this.

As you can observe, we have only provided the additional axis parameter to the cumprod() function. The value for the axis parameter is 1, which means calculating the cumulative product of 1 axis. In simple words, the cumprod() will take the first column, calculate its cumulative product, and return the result. After that, take the second column, start the new cumulative product, calculate the cumulative product of each element, and return the result for the second column.

import numpy as np

arr = np.array([[1, 3, 5, 7], [2, 4, 6, 8]])

print("The input array is = ",arr)

res = np.cumprod(arr, axis = 1)

print("The cumulative product of the input array is = ",res)

Here is the output image:

Conclusion

We covered the fundamental concept of calculating the cumulative product in this article. We have also mentioned guidelines on how to calculate the cumulative product using Pandas in Python. The Pandas in Python provide a cumprod() function to calculate the cumulative product of a series.

About the author

Kalsoom Bibi

Hello, I am a freelance writer and usually write for Linux and other technology related content