Python

Pandas Dataframe Indexing

In pandas, indexing involves picking specific columns and rows of data out of a DataFrame. Choosing all of the rows and even some of the columns, part of the rows and all of the columns, or some of each of the rows and columns is what indexing entails. Subset selection is another name for indexing. When we build a Pandas DataFrame object in Python using the pd.DataFrame() function from the Pandas module, an address in the row or column indices is automatically produced to symbolize each data element/point inside the DataFrame. However, row indices are the DataFrame’s index, while column indices are simply referred to as columns. A Pandas DataFrame object’s index essentially identifies certain rows. Let’s have a look at how to alter the Panda’s index DataFrame object.

Example 1

The index option in Python can be used to set the index of a DataFrame as it is being created. We will generate a list and pass it to the pd.DataFrame() method’s index parameter. Let’s put this into practice with Python code. We have imported the pandas module here. Following that, we made a dictionary and a Python list. The dictionary was used to start the development of DataFrame. As you can see, we used the DataFrame function with the index parameter to make the “rr” column the index.

import pandas as pd
dd = {'Name': ['Alex', 'Ramen', 'Zayn', 'Travis', 'Scott'],
        'Marks': [33, 66, 88, 67, 78],
        'City': ['New York', 'Los Angeles', 'Chicago', 'San Diego', 'Dallas']}
rr = [1, 2, 3, 4, 5]
ff = pd.DataFrame(dd, index = rr)
print(ff)

See the output in the following image.

Example 2

In Python, we may use various methods to make the index of any existing column or columns of a Pandas DataFrame object.  In this scenario, we will use the Python Pandas module’s set index() method’s inplace argument. The inplace argument is set to False by default. However, we shall set the value of inplace to True in this case. The existing column passed to the pd.set index() method as the new index replaces the DataFrame’s old index. Let’s see how this works.

import pandas as pd
dd = {'Name': ['Alex', 'Ramen', 'Zayn', 'Travis', 'Scott'],
        'Rollnum': ['1', '2', '3', '4', '5'],
        'City': ['New York', 'Los Angeles', 'Chicago', 'San Diego', 'Dallas']}
ff = pd.DataFrame(dd)
print("\nInitial DataFrame:")
print(ff)
ff = ff.set_index('Rollnum')
print("\nFinal DataFrame:")
print(ff)

The output is given in the following screenshot.

Example 3

In this case, we will be using the drop argument of the Python Pandas module’s set index() function. However, we shall set the drop argument to False in this case. So that the DataFrame does not lose the column that has been assigned as the new index. Let’s set this into practice with the code below.

import pandas as pd
dd = { 'Rollnum' : ['1', '2', '3', '4', '5'],
  'Name': ['Alex', 'Ramen', 'Zayn', 'Travis', 'Scott'],
        'Marks': [33, 66, 88, 67, 78]}
ff = pd.DataFrame(dd)
print("\nInitial DataFrame:")
print(ff)
ff = ff.set_index('Name', drop = False)
print("\nFinal DataFrame:")
print(ff)

Here is the result.

Example 4

By creating a list of DataFrame column names and passing it to the set index() function, we may set several columns of the Pandas DataFrame object just like its index. As a result, the index is referred to as multi-index in this scenario.

import pandas as pd
dd = {'Rollnum': [1, 2, 3, 4, 5],
        'Name': ['Alex', 'Ramen', 'Zayn', 'Travis', 'Scott'],
        'Marks': [33, 66, 88, 67, 78],
        'City': ['New York', 'Los Angeles', 'Chicago', 'San Diego', 'Dallas']}
ff = pd.DataFrame(dd)
print("\nInitial DataFrame:")
print(ff)
ff = ff.set_index(['Rollnum', 'Name'])
print("\nFinal DataFrame:")
print(ff)

Here you can see the output of the code given above.

Example 5

Several techniques can be used to set the position of the Pandas DataFrame object to any Python object, such as a list, range, or even series. Using the pd.Index(), set index(), and range() functions, we may set the index of the DataFrame object (Pandas) in this approach. First, we will use the range() function to build a Python sequence of integers, which we will then send to the pd.Index() function. This method produces the index object of the DataFrame. The returning DataFrame index object is then set as the DataFrame’s new index using the set index() function. Let’s implement this code.

import pandas as pd
dd = {'Rollnum': [1, 2, 3],
        'Name': ['Alex', 'Ramen', 'Zayn'],
        'Marks': [33, 66, 88],
        'City': ['New York', 'Los Angeles', 'Chicago']}
ff = pd.DataFrame(dd)
print("\nInitial DataFrame:")
print(ff)
my_list = ['I', 'II', 'III']
idx = pd.Index(my_list)
ff = ff.set_index(idx)
print("\nFinal DataFrame:")
print(ff)

See the output below.

Example 6

The Pandas DataFrame object’s index can be set using the set index() and pd.Index() methods. We will begin by creating a Python list, which we will then send to the pd.Index() function. This function will give a DataFrame index object. The returning DataFrame index object is then set as the DataFrame’s new index using the set index() function.

import pandas as pd
dd = {'Rollnum': [1, 2, 3, 4, 5],
        'Name': ['Alex', 'Ramen', 'Zayn', 'Travis', 'Scott'],
        'Marks': [33, 66, 88, 67, 78],
        'City': ['New York', 'Los Angeles', 'Chicago', 'San Diego', 'Dallas']}

ff = pd.DataFrame(dd)
print("\nInitial DataFrame:")
print(ff)
indx = pd.Index(range(1, 6, 1))
ff = ff.set_index(indx)
print("\nFinal DataFrame:")
print(ff)

The result is given in the attached screenshot.

Example 7

Using the pd.Series() and set index() functions, we may set the index of the DataFrame object of Pandas in the preceding method. Generate a list and pass it to the pd.Series() function will return a Pandas serial that can be utilized as the DataFrame index object. The resultant Pandas series is passed to the set index() method, which sets it as the DataFrame’s new index. Let’s see the following code and understand how this works.

import pandas as pd
dd = {'Rollnum': [1, 2, 3, 4, 5],
        'Name': ['Alex', 'Ramen', 'Zayn', 'Travis', 'Scott'],
        'Marks': [33, 66, 88, 67, 78],
        'City': ['New York', 'Los Angeles', 'Chicago', 'San Diego', 'Dallas']}
ff = pd.DataFrame(dd)
print("\nInitial DataFrame:")
print(ff)

ser_indx = pd.Series([5, 4, 3, 2, 1])
ff = ff.set_index(ser_indx)
print("\nFinal DataFrame:")
print(ff)

Here you can see the output.

Conclusion

Indexing is the process of selecting values from specific rows and columns in a DataFrame. We can choose all rows and some columns or rows and all columns using indexing. This session discussed topics such as what is the index and how you can set index while generating a DataFrame, how you can set existing DataFrame columns as an index or even multi-index, and how you can set Python objects such as range, list, or even series as an index.

About the author

Kalsoom Bibi

Hello, I am a freelance writer and usually write for Linux and other technology related content