Python

Geometric Mean Pandas

Working with the python programming language makes everything simple and easier. The python programming language was designed to make the developer’s life easier, which is why even novice and beginner python developers fall in love with programming and development. It is one of the best programming languages for data analysis. Moreover, the python programming language provides libraries that can perform mathematical and statistical computation.

Geometric means is one of the python pandas functions that is used to calculate the geometric mean of a given set of numbers, list, or DataFrame. This article is designed to demonstrate how to find the geometric mean using pandas in Python.

What Does Geometric Mean?

The geometric mean is the average of the set of numbers which is usually referred to as compounded annual growth rate. It is used where a list of numbers needs to be multiplied together. In simple words, it is the average value of the set of numbers. To calculate the geometric mean, we simply multiply all the numbers together present in the set and take its nth root, where n is the total number of observations present in the set.

How to Find Geometric Mean using Pandas in Python?

There are several ways which we can implement to calculate the geometric means using pandas in Python. However, here we are going to discuss the four simplest and easiest ways to find the geometric mean using pandas in Python.

Method 1: Manual Calculation of Geometric Mean

The first method is very simple but tedious. It is just like calculating the geometric mean on a calculator, taking the product of all the numbers and then taking the nth root of the product. Now let’s see an example code to learn the manual method.

Example 1

In this example, we will simply provide 5 numbers and take their product with * (multiplication sign), and then we will divide the product by 5 as 5 is the number of observations. Now let’s see the code: 

numbers = 10 * 20 * 1 * 5 * 6
n = 5
gm = (numbers)**(1/n)
print ('The manually calculated Geometric Mean is: ' + str(gm))

Note that the product of 10 * 20 * 1 * 5 * 6 is 6000, and the nth root of 6000 is 5.69. See the output below:

Method 2: Using a Loop to Calculate the Geometric Mean

The alternate method of the manual process is to provide all the numbers in a list and use the loop to calculate the product. See the example below to understand better.

Example 2

In this example, we will simply put all the numbers in a list and use the ‘for’ loop to calculate the product of the numbers provided in the list and apply the formula of geometric means. See the code below.

product = 1
numbers = [10, 20, 1, 5, 6]
n = len(numbers)
for i in numbers:
    product = (product)*(i)
gm = (product)**(1/n)
print ('The manually calculated Geometric Mean is: ' + str(gm))

After using the ‘for’ loop, you will get the following result. Now, if you notice, the result is the same as in the previous example. Let’s move on to the third method.

Method 3: Use Scipy and Pandas to Calculate the Geometric Means

Pandas library in Python is exceptionally great with statistical and mathematical computation. It provides almost every function for scientific, statistical, and mathematical computations. Pandas provide a gmean() function to find the geometric mean of a set of numbers. In the example below, we will demonstrate how to use the gmean() function to calculate the geometric means using Scipy and Pandas.

Example 3

This example is very simple; we will just import the ‘stats’ library of Scipy and use the gmean() function on a set of numbers. See the code below:

from scipy import stats
gm = stats.gmean([10, 20, 1, 5, 6])
print ('The manually calculated Geometric Mean is: ' + str(gm))

As we have used the same set of numbers so the output should be the same as in the previous examples. See the output below.

Note that the gmean() function provided the same result as in the above examples, which means gmean() is capable of performing the computation of a couple of lines of code with just the gmean() function call.

Now let’s create a DataFrame and then use Scipy and Pandas on it to see how gmean() behave with DataFrames. First, we will create a DataFrame and then will call the gmean() function to calculate the geometric mean of a DataFrame. See the code below:

from pandas import DataFrame
from scipy.stats.mstats import gmean
list1 = {'numbers': [10, 20, 1, 5, 6]}
df = DataFrame(list1)
gm = gmean(df.loc[:,'numbers'])
print ('The manually calculated Geometric Mean is: ' + str(gm))

See the output below. Note that, as before, the same result is generated. Now, let us move to the fourth and the last method.

Method 4: Use Numpy to Calculate the Geometric Mean

This method is all about calculating the geometric mean using the built-in function provided by the Numpy library. See the example below to learn how to use the Numpy built-in function in the python code.

Example 4

In this example, we will simply create a custom function to calculate the geometric mean using Numpy built-in log() and mean() functions. The custom function and gmean() function both are designed to perform the same function so that they should provide the same result. See the code below to learn how to define the custom python function that can calculate the geometric mean for you.

Here, we will be using the log() function to find the log of the set of the numbers first, then we will apply the normal mean() function, and after that, the exp() function is applied to convert the normal mean into geometric mean. See the code below to have a better understanding.

import numpy as np
def g_mean(x):
    a = np.log(x)
    return np.exp(a.mean())
gm = g_mean([10, 20, 1, 5, 6])
print ('The manually calculated Geometric Mean is: ' + str(gm))

As we have provided the same data as input so the output should be the same again. See the output below.

Conclusion

In this article, we have learned about how to calculate the geometric means in Python. We have demonstrated four different methods to calculate the geometric mean in Python. The first method is manual, the second method utilizes the ‘for’ loop, the third method uses the Scipy and Pandas, and the last method uses the Numpy custom function to calculate the geometric means.

About the author

Kalsoom Bibi

Hello, I am a freelance writer and usually write for Linux and other technology related content