Python Pandas

Pandas Distinct Values Column

Pandas” provides several methods to perform data analysis and manipulation operations on data. In Python, while working with a Pandas DataFrame, we often need to extract unique or distinct values from a specific column. To accomplish this, various methods are used in Python.

This write-up will provide a comprehensive guide on selecting Pandas distinct from the Pandas DataFrame column using numerous examples.

How to Get/Determine Distinct Values From Pandas DataFrame Column?

To get distinct values from the Pandas DataFrame column, the following methods are used in Python:

Method 1: Get Distinct Values From Pandas DataFrame Column Using “pandas.unique()” Function

The “pandas.unique()” function is utilized to get distinct unique values from the Pandas DataFrame column. Here is an example code:

import pandas
data1 = {'Name':["Anna","Joseph","Lily","Henry","Lily","Anna","Henry"],'Age' :[18,22,33,13,33,18,13],'Height':[5.7,3.7,4.9,5.5,3.7,5.7,5.5],'Salary':['$100','$300','$500','$700','$300','$100','$700']}
df = pandas.DataFrame(data1)
print(df, '\n')
df1 = pandas.unique(df[['Name']].values.ravel())
print(df1)

 

In the above example:

  • The “pandas” module is imported.
  • The “pd.DataFrame()” function takes/accepts the dictionary as an argument and constructs a DataFrame.
  • The “pandas.unique()” function is utilized to determine the unique value in the “Name” column of Pandas DataFrame.
  • The “ravel()” method is used along with the “pandas.unique()” function to flatten the column into a one-dimensional array.

Output

The unique value from the specified column has been returned in the output.

We can also get distinct values from the multiple columns of Pandas DataFrame. Here is an example code:

import pandas
data1 = {'Name':["Anna","Joseph","Lily","Henry","Lily","Anna","Henry"],'Age' :[18,22,33,13,33,18,13],'Height':[5.7,3.7,4.9,5.5,3.7,5.7,5.5],'Salary':['$100','$300','$500','$700','$300','$100','$700']}
df = pandas.DataFrame(data1)
print(df, '\n')
df1 = pandas.unique(df[['Name', 'Age']].values.ravel())
print(df1)

 

Here in this code:

The “pandas.unique()” function gets the distinct values from the more than one column of Pandas DataFrame.

Output

The distinct values of multiple columns have been returned.

Method 2: Get Distinct Values From Pandas DataFrame Column Using “Series.unique()” Function

The “Series.unique()” function can also be used to get distinct or unique values from the Pandas DataFrame column. Let’s overview the below code:

import pandas
data1 = {'Name':["Anna","Joseph","Lily","Henry","Lily","Anna","Henry"],'Age' :[18,22,33,13,33,18,13],'Height':[5.7,3.7,4.9,5.5,3.7,5.7,5.5],'Salary':['$100','$300','$500','$700','$300','$100','$700']}
df = pandas.DataFrame(data1)
print(df, '\n')
print(df['Name'].unique())

 

Here in this code:

The “Series.unique()” method is applied to the column “Name” to get the distinct values from the Pandas DataFrame column.

Output

The distinct value of the specified column has been returned in the output.

Method 3: Get Distinct Values From Pandas DataFrame Column Using “Numpy.unique()” Function

The “numpy.unique()” function can also be used to get the distinct values from the Pandas DataFrame column. Here is an example:

import pandas
import numpy
data1 = {'Name':["Anna","Joseph","Lily","Henry","Lily","Anna","Henry"],'Age' :[18,22,33,13,33,18,13],'Height':[5.7,3.7,4.9,5.5,3.7,5.7,5.5],'Salary':['$100','$300','$500','$700','$300','$100','$700']}
df = pandas.DataFrame(data1)
print(df, '\n')

df1 = numpy.unique(df[['Name', 'Salary']].values)
print(df1)

 

In the above code:

The “numpy.unique()” function of the “numpy” module is applied to the multiple columns named “Name” and “Salary” and returns the distinct value by removing the duplicate value.

Output

The distinct value of the multiple columns has been returned.

Method 4: Get Distinct Values From Pandas DataFrame Column Using “pandas.concat()” Method

The “pandas.concat()” method is used along with the “unique()” function to get the distinct values from Pandas DataFrame. Let’s examine this method via the following code:

import pandas
data1 = {'Name':["Anna","Joseph","Lily","Henry","Lily","Anna","Henry"],'Age' :[18,22,33,13,33,18,13],'Height':[5.7,3.7,4.9,5.5,3.7,5.7,5.5],'Salary':['$100','$300','$500','$700','$300','$100','$700']}
df = pandas.DataFrame(data1)
print(df, '\n')
df1 = pandas.concat([df['Name'],df['Age']]).unique()
print(df1)

 

In this code block:

  • The “pandas.concat()” and “unique()” methods are applied on the multiple Pandas DataFame columns “Name” and “Age” to get the unique/distinct value.

The unique value of the specified column has been returned by removing the duplicate value.

Conclusion

The “pandas.unique()”, “Series.unique()”, “Numpy.unique()”, and “pandas.concat()” methods are used to get distinct values of the Pandas DataFrame column. The “Pandas.unique()” function is used to get the distinct or unique value of a single or multiple DataFrame column by removing the duplicate item. This guide has presented various methods to get distinct values of Pandas DataFrame columns using numerous examples.

About the author

Haroon Javed

Hi, I'm Haroon. I am an electronics engineer and a technical content writer. I am a tech geek who loves to help people to the best of my knowledge.