Python Pandas

Pandas Drop Duplicate Index

Index refers to the position of the sequence element in Python, as it starts at “0” and goes on until the last element of the sequence. But in Pandas DataFrame, the “Index” refers to the row labels. This means the first row has a “0” label, and the second row has a “1” label by default. We can also create a custom index using various methods of Python and add it to the Pandas DataFrame. However, while working with Pandas DataFrame, we may encounter index duplication that causes ambiguity. To address this issue,  the “Index.drop_duplicates()” method can be used in Python.

This article presents a detailed step-by-step tutorial on dropping duplicate index via the below-provided content:

What is the Pandas “Index.drop_duplicates()” in Python?

In Python, the “Index.drop_duplicates()” method removes/drops all the duplicate indexes, the first or last occurrence of the duplicate index. The syntax of the “Index.drop_duplicates()” method is shown below:

Index.drop_duplicates(keep='first')

In the above syntax:

  • The “keep= ‘first’” is a default parameter value utilized to drop/remove duplicate indexes except for the first occurrence.
  • The “keep=False” parameter value drops/removes all the duplicate indexes.
  • The “keep= ‘last’” parameter value drops/removes duplicate indexes except for the last occurrence.

Return Value

The “Index.drop_duplicates()” method retrieves the index with duplicate values removed.

Example 1: Using “Index.drop_duplicates()” Method to Drop All Duplicates Index

This example is implemented to drop all the duplicates index:

import pandas
index_1 = pandas.Index([12, 12, 3, 3, 11, 3, 23, 21])
print('Original Index:\n', index_1)
print('\nAfter Removing Duplicate Index:\n',index_1.drop_duplicates(keep = False))

Here in this code:

  • The “pandas” module is imported.
  • The “Index()” function is used to create the specified index value.
  • The “drop_duplicates()” method takes the “keep = False” parameter as an argument and retrieves the index value by removing all duplicate index values.

Output

The index without any duplicate value has been returned successfully.

Example 2: Using “Index.drop_duplicates()” Method to Drop/Removes Duplicates Index Except for the First Occurrence

This example drops the duplicate index except for the first occurrence:

import pandas
index_1 = pandas.Index([12, 12, 3, 3, 11, 3, 23, 21])
print('Original Index:\n', index_1)
print('\nAfter Removing Duplicate Index:\n',index_1.drop_duplicates(keep = 'first'))

In the above code:

  • The “pandas” module is imported.
  • The “drop_duplicates()” method takes the “keep = ‘first’” parameter value as an argument and retrieves the index except for the first occurrence.

Output

The input index and after removing the duplicate index have been shown in the above output.

Note: Similarly, the “keep=last” parameter value is used for dropping duplicate indexes except for the last occurrence.

Conclusion

The “Index.drop_duplicates()” method in Python is used to remove the duplicate indexes by taking the specified “keep” parameter value. The “keep=False” parameter value removes all the duplicate indexes. While the “keep= first” or “keep = last” removes all the indexes except the first occurrence or last occurrence, respectively. This guide presented an extensive overview of how to drop Pandas duplicate index via numerous examples.

About the author

Haroon Javed

Hi, I'm Haroon. I am an electronics engineer and a technical content writer. I am a tech geek who loves to help people to the best of my knowledge.