Python Pandas

Cumulative Percentage Pandas

Cumulative Percentages” help you compare how many records are lower than a certain number in a group of records. For instance, if we have a group of students who took a test, the cumulative percentage shows us how many students got the same or lower score than a certain number. If you are working with data or math, finding cumulative percentages is an important feature.

In Python, you can easily find cumulative percentages using the “pandas” library. In this blog, we will show you how to get cumulative percentages in Python utilizing “pandas”.

How to Calculate/Find Cumulative Percentages Using Pandas in Python?

Follow the following steps to find the cumulative percentages using “pandas” in Python:

Step 1: Importing Required Libraries

The first step is to import the required library named “pandas” (used for data manipulation and analysis):

import pandas

Step 2: Creating DataFrame

Now, let’s create a data frame to find cumulative percentages based on it:

import pandas

df = pandas.DataFrame({'year': [11, 22, 33, 44, 55, 66],

'sale': [100, 175, 737, 847, 114, 234]})

print(df)

In the above code block, the “DataFrame” is created using the “pandas.DataFrame()” function having the stated values.

Output

Based on the above output, the data frame has been created appropriately.

Step 3: Calculating Cumulative Percentages

To calculate cumulative percentages, we need to sort the data in ascending order and calculate the cumulative sum of the values. We can achieve this using the “cumsum()” function in Pandas. The “cumsum()” and “sum()” functions are used in Python to find the cumulative percentages.

Syntax

The syntax of the “cumsum()” function in Python pandas is shown below:

dataframe.cumsum(axis=None, skipna=True, *args, kwargs)

Here is the example code:

import pandas

df = pandas.DataFrame({'year': [11, 22, 33, 44, 55, 66],

'sale': [100, 175, 737, 847, 114, 234]})

df_sorted = df.sort_values('sale')

df_sorted['cumulative_sum'] = df_sorted['sale'].cumsum()

print('Cumulative Sum:')

print(df_sorted)

total_sum = df_sorted['sale'].sum()

print('\nTotal Sum: ')

print(total_sum)

df_sorted['cumulative_percentage'] = 100 * df_sorted['cumulative_sum'] / total_sum

print('\nCumulative Percentage')

print(df_sorted)

In the above code:

  • The “pandas” library is loaded at the start.
  • The “pandas.DataFrame()” function creates a data frame with two columns: “year” and “sale”, respectively.
  • The “df.sort_values()” function is used to sort the data frame by the values in the “sale” column, from lowest to highest.
  • The “cumsum()” function is used to create a new column in the sorted data frame named “cumulative_sum” and calculate the cumulative sum of the values in the “sale” column, which means adding up the values from top to bottom in an incremented manner.
  • It is such that the previous value(s) will be appended as a sum to the next one and so on.
  • Now, the “sum()” function calculates the total sum of the values in the “sale” column.
  • After dividing each value in the “cumulative_sum” column by the total sum, the “cumulative percentage” of the values in the “sale” column is calculated.

Output

As analyzed, the cumulative sum, total sum, and cumulative percentage have been calculated, respectively.

Conclusion

The “cumsum()” and “sum()” functions are used in Python to find the cumulative percentages. The “cumsum()” function adds up the column’s values and the “sum()” function gives the total value of a column. The purpose of this Python article was to demonstrate how to find the cumulative percentage utilizing pandas.

About the author

Talha Saif Malik

Talha is a contributor at Linux Hint with a vision to bring value and do useful things for the world. He loves to read, write and speak about Linux, Data, Computers and Technology.