Python

Pandas Group by Quantile

Python is one of the leading high-level user-friendly programming languages that provide simple and easy-to-understand libraries. It is the top-ranked programming language that even beginners love to use. The novice developers also feel comfortable working with Python libraries and packages. Pandas in Python provide a quantile() function, used to calculate the quantile by the group in Python.

In the Python programming language, there are several ways to find the quantile. However, Pandas make it simple to find the quantile by the group in just a few lines of code using groupby.quantile() function. In this article, we will explore the ways to find the quantile by the group in Python.

What Is a Quantile Group?

The basic concept of a quantile group is to distribute the total number of subjects into equal sizes of ordered groups. In other words, distribute the subjects so that each group contains an equal number of subjects. This concept is also called fractiles, and the groups are commonly known as S-tiles.

What Is the Quantile Group in Python?

A quantile represents a specific part of the dataset. It defines how many values are below and above a certain limit in a distribution. Quantile in Python follows the general concept of quantile group. It takes an array as input, and a number says “n” and returns the value at the nth quantile. The special quartiles called quintile are the quartile that represents a quarter and represents the fifth quantile and the percentile, which represents the hundredth quantile.

For example, let’s say we have divided a dataset into four equal sizes of groups. Each group now has the same number of elements or subjects. The first two quantiles comprise 50% lower distribution values, and the last two quantiles include the other 50% higher distribution.

What Is the Function of Groupby.quantile() in Python?

Pandas in Python provide groupby.quantile() function to calculate the quantile by the group. It is commonly used for analyzing the data. It first distributes each row in a DataFrame into equal-sized groups based on a specific column value. After that, it finds the aggregated value for every group. Along with groupby.quantile() function, Pandas also provide other aggregate functions like mean, median, mode, sum, max, min, etc.

However, this article will only discuss the quantile() function and provide the relevant example to learn how to use it in the code. Let us proceed with the example to understand the usage of quantiles.

Example 1

In the first example, we simply import Pandas by using the “import pandas as pd” command, and then we will create a DataFrame of which we are going to find the quantile. The DataFrame consists of two columns: ‘Name’ represents the names of 3 players, and the columns ‘Goals’ represent the number of goals each player has scored in different games.

import pandas as pd
Hockey = {'Name': ['Adam', 'Adam', 'Adam', 'Adam', 'Adam',
                    'Biden', 'Biden', 'Biden', 'Biden', 'Biden',
                        'Cimon', 'Cimon', 'Cimon', 'Cimon', 'Cimon'],
        'Goals': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
        }
df = pd.DataFrame(Hockey)
print(df.groupby('Name').quantile(0.25))

Now, the quantile() function will return the result accordingly, whatever number you provide.

To help you understand, we will provide three numbers, 0.25, 0.5, and 0.75, to find the third, half, and two-third quartile of the group. First, we have provided 0.25 to see the 25th quantile. Now, we will provide 0.5 to see the 50th quantile of the group. See the code, as shown below:

Here is the complete code:

import pandas as pd
Hockey = {'Name': ['Adam', 'Adam', 'Adam', 'Adam', 'Adam',
                    'Biden', 'Biden', 'Biden', 'Biden', 'Biden',
                        'Cimon', 'Cimon', 'Cimon', 'Cimon', 'Cimon'],
        'Goals': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
        }
df = pd.DataFrame(Hockey)
print(df.groupby('Name').quantile(0.5))

Observe how the output value has changed, providing the middle value of each group.

Now, let us provide the 0.75 value to see the 75th quantile of the group.

df.groupby('Name').quantile(0.75)

The complete code is shown below:

import pandas as pd
Hockey = {'Name': ['Adam', 'Adam', 'Adam', 'Adam', 'Adam',
                    'Biden', 'Biden', 'Biden', 'Biden', 'Biden',
                        'Cimon', 'Cimon', 'Cimon', 'Cimon', 'Cimon'],
        'Goals': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
        }
df = pd.DataFrame(Hockey)
print(df.groupby('Name').quantile(0.75))

Again, you can observe that the 2/3rd value of the group has returned as the 75th quantile.

Example 2

In the previous example, we have seen the 25th, 50th, and 75th quantile only by one. Now, let us find the 12th, 37th, and 62nd quantile together. We will be defining each quartile as a “def” class that will return the quantile number of the group.

Let us see the following code to understand the difference between calculating the quantile separately and combined:

import pandas as pd
df = pd.DataFrame({'Name': ['Adam', 'Adam', 'Adam', 'Adam', 'Adam',
                    'Biden', 'Biden', 'Biden', 'Biden', 'Biden',
                        'Cimon', 'Cimon', 'Cimon', 'Cimon', 'Cimon'],
        'Goals': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
        })
def q12(x):
    return x.quantile(0.12)
def q37(x):
    return x.quantile(0.37)
def q62(x):
    return x.quantile(0.62)
vals = {'Goals': [q12, q37, q62]}
print(df.groupby('Name').agg(vals))

Here is the output in the matrix, which provides the 12th, 37th, and 62nd quantiles of the DataFrame:

Example 3

Now that we have learned the function of quantile() with the help of simple examples. Let us see a complex example to have a more clear understanding. Here, we will provide two groups in a DataFrame. First, we will calculate the quantile for only one group, and then we will calculate the quantile of both groups together. Let’s see the code below:

import pandas as pd  
data = pd.DataFrame({'A':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],  
                     'B':range(13, 25),
                     'g1':['Adam', 'Biden', 'Biden', 'Cimon', 'Cimon', 'Adam', 'Adam', 'Cimon', 'Cimon', 'Biden', 'Adam', 'Adam'],
                     'g2':['adam', 'adam', 'adam', 'adam', 'adam', 'adam', 'biden', 'biden', 'biden', 'biden', 'biden', 'biden']})
print(data)

First, we have created a DataFrame containing two groups. Here is the output of the Dataframe:

Now, let’s compute the quantile of the first group.

print(data.groupby('g1').quantile(0.25))

The groupby.quantile() method is used to find the aggregated value of the group. Here is its output:

Now, let’s find the quantile of both groups together.

Print(data.groupby([‘g1’, ‘g2’]).quantile(0.25))

Here, we only provided the other group’s name and calculated the 25th quantile of the group. See the following:

Conclusion

In this article, we have discussed the general concept of quantile and its function. After that, we discussed the quantile group in Python. The quantile by group distributes the values of a group in equal-size groups. Pandas in Python provide groupby.quantile() function to calculate the quantile by the group. We have also provided some examples to learn the quantile() function.

About the author

Kalsoom Bibi

Hello, I am a freelance writer and usually write for Linux and other technology related content