The “Count Distinct” is a common operation in data analysis that provides the number of unique values within a column. In Python, the “groupby()” function of “Pandas” is used along with other functions such as “nunique()”, “unique()”, and others, to group data by a common value and count the number of unique values in each group.
This Python article will deliver a detailed guide on how to count the distinct value of the Pandas DataFrame group via the below methods:
- Using the “nunique()” Method
- Using the “value_counts()” Method
- Using the “unique()” Method
- Using the “agg()” Method
Method 1: Determine the Count Distinct Values in Pandas DataFrame Group Using the “nunique()” Method
The “nunique()” method is utilized in Python to retrieve the number of unique values in the Pandas DataFrame column. The particular method counts the distinct values of DataFrame groups.
Example 1: Using Single Column Value
The below code is utilized to count the distinct value of the single group of DataFrame:
df = pandas.DataFrame({'Name': ['Lily', 'Carry', 'Lily', 'Sybil', 'Lily', 'Lily', 'Sybil'],'Age': [15, 17, 16, 19, 15, 15, 21],'Score': [55, 66, 25, 88, 55, 66, 18]})
print(df)
df1 = df.groupby('Name')['Age'].nunique()
print('\n',df1)
In the above example, the “Pandas” module is imported, and the DataFrame is created with multiple columns. Next, the “df.groupby()” method groups the DataFrame based on a single column “Name”. After grouping, the “nunique()” method is applied to the group value to determine the distinct unique values.
Output
The distinct value of the specified DataFrame group is shown in the above output.
Example 2: Using Multiple Column Value
Let’s utilize the following code to count distinct values of the DataFrame group based on multiple columns:
df = pandas.DataFrame({'Name': ['Lily', 'Carry', 'Lily', 'Sybil', 'Lily', 'Lily', 'Sybil'],'Age': [15, 17, 16, 19, 15, 15, 21],'Score': [55, 66, 25, 88, 55, 66, 18]})
print(df)
df1 = df.groupby('Name')[['Age', 'Score']].nunique()
print('\n',df1)
In this code, the “df.groupby()” method is utilized to group the DataFrame of Pandas on a single column. The “nunique()” method is then used to determine the distinct values of the multiple columns.
Output
The distinct values of the multiple columns have been shown.
Method 2: Determine the Count Distinct Values in Pandas DataFrame Group Using the “value_counts()” Method
The “value_counts()” method is used to retrieve the count of the unique value of single or multiple columns. This method calculates the distinct value of a group of DataFrame.
Example 1: Using Single Column Value
Here is an example code to count the distinct value of a single column:
df = pandas.DataFrame({'Name': ['Cyndy', 'Carry', 'Lily', 'Sybil', 'Cyndy', 'Lily', 'Sybil'],'Age': [15, 17, 18, 19, 15, 16, 19]})
print(df)
df1 = df.groupby('Name')['Age'].value_counts()
print('\n',df1)
In the above code, the “df.groupby()” method is used along with the “value_counts()” method to count the distinct value of the single column named “Age”.
Output
The total distinct values for the specified group have been shown in the above snippet.
Example 2: Using Multiple Columns Value
Let’s overview this for multiple columns values:
df = pandas.DataFrame({'Name': ['Cyndy', 'Carry', 'Lily', 'Sybil', 'Cyndy', 'Lily', 'Sybil'],'Age': [15, 17, 18, 19, 15, 16, 19],'Score': [55, 66, 55, 88, 55, 66, 88]})
print(df)
df1 = df.groupby('Name')[['Age', 'Score']].value_counts()
print('\n',df1)
In the above code, the “df.groupby()” creates a group according to the particular column value. The “value_counts()” method is used to count the distinct value of the multiple columns for the created group.
Output
The total distinct values for the multiple groups have been returned.
Method 3: Determine the Count Distinct Values in Pandas DataFrame Group Using the “unique()” Method
The “unique()” method is used to find the unique data/value of the Pandas DataFrame. We can use the below code to count the distinct values of the DataFrame group:
df = pandas.DataFrame({'Name': ['Lily', 'Carry', 'Lily', 'Sybil', 'Lily', 'Lily', 'Sybil'],'Age': [15, 17, 16, 19, 15, 15, 21],'Score': [55, 66, 25, 88, 55, 66, 18]})
print(df)
df1 = df.groupby('Name')['Age'].unique()
print('\n',df1)
Here, in this code, the “df.groupby()” method is used to return the DataFrame having a unique value rather than a count. However, we can determine the distinct value by counting the unique value returned.
Output
The distinct values of the specified column have been returned successfully.
Method 4: Determine the Count Distinct Values in Pandas DataFrame Group Using the “agg()” Method
The agg() method can also be utilized to count the distinct values of the Pandas DataFrame group. Here is an example:
df = pandas.DataFrame({'Name': ['Lily', 'Carry', 'Lily', 'Sybil', 'Lily', 'Lily', 'Sybil'],'Age': [15, 17, 16, 19, 15, 15, 21],'Score': [55, 66, 25, 88, 55, 66, 18]})
print(df)
df = df.groupby('Name')[['Age']].agg(['nunique'])
print('\n',df)
In the above code, the “df.groupby()” method is used along with the “agg()” method to return the distinct value of the specified columns according to the specified group.
Output
The total distinct value has been calculated/determined.
Conclusion
The “nunique()”, “value_counts()”, “unique()”, and the “agg()” methods are used to determine the count of distinct values in the Pandas DataFrame group. These methods help us count distinct values of single or multiple DataFrame columns based on the group value. The DataFrame first groups by the specific columns and then applies all of these methods to determine the distinct value. This blog has delivered a detailed guide on counting the distinct value of Pandas DataFrame using numerous examples.