Python

Pandas Sum Column

This article will demonstrate how to sum all or particular columns in a Pandas DataFrame using Python. The DataFrame.sum() function will be used along with a few helpful parameters in the numerous examples of this tutorial.

The ‘dataframe.sum()’ function in Pandas returns the total sum for the specified axis. If the input is an axis of the index, the function adds each column’s values individually. Then it does the same for each column, returning a series storing the sum of the data/values in each column. Additionally, it supports calculating the DataFrame’s sum by ignoring the missing values.

Syntax

pandas.DataFrame_object.sum(axis = None, skipna = None, level = None, numeric_only = None, min_count = 0, **kwargs)

Parameters

  1. axis: {columns (1), index (0)}
  2. skipna: Ignore NA/null values when calculating the result.
  3. level: If the specified axis is hierarchical (a multi-index), count to a particular index level before converting to a Series.
  4. numeric_only: Just float, int, and Boolean columns are acceptable. If None, try to use everything; if not, only numerical data. For Series, not implemented.
  5. min_count: The number of possible values required to complete the operation. The outcome will be NA if there are fewer non-NA values present than min_count.

Return

DataFrame (if level specified) or Series.

DataFrame

For all the examples, we will use the following ‘analysis’ DataFrame. It holds 12 rows with 5 columns.

import pandas

# Create the dataframe using lists

analysis = pandas.DataFrame([[23,'sravan',1000,34,56],
                             [23,'sravan',700,11,0],
                             [23,'sravan',20,4,2],
                             [21,'siva',400,32,45],
                             [21,'siva',100,456,78],
                             [23,'sravan',00,90,12],
                             [21,'siva',400,32,45],
                             [20,'sahaja',120,1,67],
                             [23,'sravan',00,90,12],
                             [22,'suryam',450,76,56],
                             [22,'suryam',40,0,1],
                             [22,'suryam',12,45,0]

],columns=['id','name','points3','points1','points2'])

# Display the DataFrame - analysis

print(analysis)

Output

    id    name  points3  points1  points2
0   23  sravan     1000       34       56
1   23  sravan      700       11        0
2   23  sravan       20        4        2
3   21    siva      400       32       45
4   21    siva      100      456       78
5   23  sravan        0       90       12
6   21    siva      400       32       45
7   20  sahaja      120        1       67
8   23  sravan        0       90       12
9   22  suryam      450       76       56
10  22  suryam       40        0        1
11  22  suryam       12       45        0

Here, the ‘id’, ‘points3’, ‘points2’, and ‘points1’ columns are numeric, and make sure that you need to load the DataFrame for all the examples that we are discussing in this tutorial.

Scenario 1: Sum of All Columns

We can directly apply sum() on the DataFrame to return the sum of values in each column.

pandas.DataFrame_object.sum()

Example

# Return the sum of values in all columns

print(analysis.sum())

Output

id                                                       264
name       sravansravansravansivasivasravansivasahajasrav...
points3                                                 3242
points1                                                  871
points2                                                  374

Explanation

You can see that the sum of values in each column is returned.

Scenario 2: Sum of Particular Column

If you want to return the sum of values in a particular column, then you need to specify the column name and the DataFrame object.

pandas.DataFrame_object[‘column’].sum()

Example

Let’s return the sum of values in the ‘points1’,’points2’, and ‘points3’ columns separately.

# Return the sum of values in points1 column
print(analysis['points1'].sum())

# Return the sum of values in points2 column
print(analysis['points2'].sum())

# Return the sum of values in points3 column
print(analysis['points3'].sum())

Output

871
374
3242

Explanation

  1. Sum of values in the points1 column is 871.
  2. Sum of values in the points2 column is 374.
  3. Sum of values in the points3 column is 3242.

Scenario 3: Sum Across Rows

If you want to return the sum of values across each row, then you need to specify the axis parameter in the sum() function and set it to 1.

pandas.DataFrame_object[[column/s…]].sum(axis=1)

Example

Let’s return the sum of values of ‘points1’, ‘points2’, and ‘points3’ across all rows and store the result in the ‘SUM’ column.

# Return the sum of values across each row
analysis['SUM']=analysis[['points1','points2','points3']].sum(axis=1)

print(analysis)

Output

    id    name  points3  points1  points2   SUM
0   23  sravan     1000       34       56  1090
1   23  sravan      700       11        0   711
2   23  sravan       20        4        2    26
3   21    siva      400       32       45   477
4   21    siva      100      456       78   634
5   23  sravan        0       90       12   102
6   21    siva      400       32       45   477
7   20  sahaja      120        1       67   188
8   23  sravan        0       90       12   102
9   22  suryam      450       76       56   582
10  22  suryam       40        0        1    41
11  22  suryam       12       45        0    57

Explanation

Now, the new column – ‘SUM’ holds the sum of three points.

We can also add across rows without using sum(). By using the “+” operator, we can achieve the previous functionality.

Example

  1. Add values in points1 and points2 columns and store the result in the ‘2 Added‘ column.
  2. Add values in points1, points2, and points3 columns and store the result in the ‘3 Added‘ column.
import pandas

# Create the dataframe using lists

analysis = pandas.DataFrame([[23,'sravan',1000,34,56],
                             [23,'sravan',700,11,0],
                             [23,'sravan',20,4,2],
                             [21,'siva',400,32,45],
                             [21,'siva',100,456,78],
                             [23,'sravan',00,90,12],
                             [21,'siva',400,32,45],
                             [20,'sahaja',120,1,67],
                             [23,'sravan',00,90,12],
                             [22,'suryam',450,76,56],
                             [22,'suryam',40,0,1],
                             [22,'suryam',12,45,0]


],columns=['id','name','points3','points1','points2'])

# Add values in points1 and points2 columns and store the result in '2 Added' column
analysis['2 Added']=analysis['points1']+analysis['points2']

# Add values in points1,points2 and points2columns and store the result in '3 Added' column
analysis['3 Added']=analysis['points1']+analysis['points2']+analysis['points3']

print(analysis)

Output

    id    name  points3  points1  points2  2 Added  3 Added
0   23  sravan     1000       34       56       90     1090
1   23  sravan      700       11        0       11      711
2   23  sravan       20        4        2        6       26
3   21    siva      400       32       45       77      477
4   21    siva      100      456       78      534      634
5   23  sravan        0       90       12      102      102
6   21    siva      400       32       45       77      477
7   20  sahaja      120        1       67       68      188
8   23  sravan        0       90       12      102      102
9   22  suryam      450       76       56      132      582
10  22  suryam       40        0        1        1       41
11  22  suryam       12       45        0       45       57

Scenario 4: sum() With groupby()

If you want to return the sum of values for individual groups, then you have to use groupby() with sum(). So groupby() is used to group the column values in a particular column, and sum() will return the sum in each group.

pandas.DataFrame_object.groupby(‘grouping_column’).sum()

Example

Let’s group the rows based on the name column and return the sum of values in each group for all columns.

import pandas

# Create the dataframe using lists

analysis = pandas.DataFrame([[23,'sravan',1000,34,56],
                             [23,'sravan',700,11,0],
                             [23,'sravan',20,4,2],
                             [21,'siva',400,32,45],
                             [21,'siva',100,456,78],
                             [23,'sravan',00,90,12],
                             [21,'siva',400,32,45],
                             [20,'sahaja',120,1,67],
                             [23,'sravan',00,90,12],
                             [22,'suryam',450,76,56],
                             [22,'suryam',40,0,1],
                             [22,'suryam',12,45,0]

],columns=['id','name','points3','points1','points2'])

# group the rows based on name column and return sum of values in each group for all columns
print(analysis.groupby('name').sum())

Output

        id  points3  points1  points2
name                                  
sahaja   20      120        1       67
siva     63      900      520      168
sravan  115     1720      229       82
suryam   66      502      121       57

Explanation

So there are 4 groups in the ‘name’ column. For each group, the sum of id, points3, points1, and points2 is returned.

Conclusion

We tried to teach you how to compute the sum across DataFrames using the Pandas sum() method. We have discussed the row-wise and column-wise addition of values in the examples of this post. Additionally, you learned how to add columns conditionally and how to sum the values after grouping the column of the DataFrame. Now, you may be able to sum the columns of the DataFrame together or sum the values within the DataFrame column by yourself.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain