Sample Data
Before discussing how to determine the number of unique values in a DataFrame, we will need sample data.
An example code is shown below:
import pandas as pd
df = pd.DataFrame({
'salary': [120000, 100000, 90000, 110000, 120000, 100000, 56000],
'department': ['game developer', 'database developer', 'front-end developer', 'full-stack developer', 'database developer', 'security researcher', 'cloud-engineer'],
'rating': [4.3, 4.4, 4.3, 3.3, 4.3, 5.0, 4.4]},
index=['Alice', 'Michael', 'Joshua', 'Patricia', 'Peter', 'Jeff', 'Ruth'])
df
The code above should create a sample DataFrame that we can use in this tutorial. The resulting tabular form of the data is as shown:
#1 Pandas Unique Method
The unique () function is the first method we can use to determine the number of unique values in a DataFrame.
The function takes a series as the input and returns a list of the unique values.
For example, to calculate the unique items in the salary column, we can do:
The code above should return the unique items in the ‘salary’ column.
If you want the number of unique values, you can get the length of the list as shown:
The code above should return:
#2 Pandas nunique Function
The nunique() function allows you to get the number of unique values along a specified axis.
An example is as shown:
The code above should return the number of unique items in each column. The resulting output is as shown:
salary 5
department 6
rating 4
dtype: int64
You can also fetch the number of unique items in a specific column as shown:
The above should return the number of unique items in the salary column.
#3 Pandas value_counts()
Pandas also provides us with the value_count() function. This function returns the number of unique values in a specified column.
An example is as shown:
print(f"unique items: {len(res)}")
The value_counts() function returns the count of each value in the column. We then convert the result into a list and get the length.
This should get the number of unique items in the column:
Conclusion
This article discussed various methods and techniques we can use to determine the number of unique values in a Pandas DataFrame.