Python

Pandas Count Distinct

This article will explore various ways and methods to determine the number of unique items in a Pandas DataFrame.

Sample Data

Before discussing how to determine the number of unique values in a DataFrame, we will need sample data.

An example code is shown below:

# import pandas
import pandas as pd
df = pd.DataFrame({
    'salary': [120000, 100000, 90000, 110000, 120000, 100000, 56000],
    'department': ['game developer', 'database developer', 'front-end developer', 'full-stack developer', 'database developer', 'security researcher', 'cloud-engineer'],
    'rating': [4.3, 4.4, 4.3, 3.3, 4.3, 5.0, 4.4]},
    index=['Alice', 'Michael', 'Joshua', 'Patricia', 'Peter', 'Jeff', 'Ruth'])
df

The code above should create a sample DataFrame that we can use in this tutorial. The resulting tabular form of the data is as shown:

#1 Pandas Unique Method

The unique () function is the first method we can use to determine the number of unique values in a DataFrame.

The function takes a series as the input and returns a list of the unique values.

For example, to calculate the unique items in the salary column, we can do:

print(pd.unique(df['salary']))

The code above should return the unique items in the ‘salary’ column.

[120000 100000  90000 110000  56000]

If you want the number of unique values, you can get the length of the list as shown:

print(f"Unique items: {len(pd.unique(df['salary']))}")

The code above should return:

Unique items: 5

#2 Pandas nunique Function

The nunique() function allows you to get the number of unique values along a specified axis.

An example is as shown:

print(f"[number of unique items/column]\n{df.nunique(axis=0)}")

The code above should return the number of unique items in each column. The resulting output is as shown:

[number of unique items/column]
salary        5
department    6
rating        4
dtype: int64

You can also fetch the number of unique items in a specific column as shown:

print(df.salary.nunique())

The above should return the number of unique items in the salary column.

#3 Pandas value_counts()

Pandas also provides us with the value_count() function. This function returns the number of unique values in a specified column.

An example is as shown:

res = list(df.salary.value_counts())
print(f"unique items: {len(res)}")

The value_counts() function returns the count of each value in the column. We then convert the result into a list and get the length.

This should get the number of unique items in the column:

unique items: 5

Conclusion

This article discussed various methods and techniques we can use to determine the number of unique values in a Pandas DataFrame.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list