Python

Pandas Change Index

In this article, we will try to teach you how to set the Pandas DataFrame’s index using either a list of labels or the already-existing columns. We cover all situations in which the new row labels are assigned or the existing ones need to be changed. The tabular structure in the Pandas package is called a DataFrame. Each row and column is represented by its label. An index is a row label, while a column label is a column index or header. Python Pandas by default define a range of numbers (beginning at 0) as an index for rows when generating a DataFrame. Each row is uniquely identified using a row index. We will use the set_index() function to change the indexes of rows in the DataFrame which we will create or which have been created by default.

How to Change the Index in Pandas Columns

We can make one of the columns in the DataFrame into the index using the Pandas set_index method. To understand how the set_index() method works, let’s look at its syntax.

Syntax for the dataframe.set_index

DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Parameter

  • keys: Name of a column or a set of column names.
  • drop: If True, the Boolean value drops the index column.
  • append: If True, it adds the column to the already existing index column.
  • inplace: If True, it applies the changes to the DataFrame.
  • verify_integrity: If True, check if there are any duplicates in the new index column.

As we have seen in the syntax, we will now see how to use the set_index() function to set or change the indexes of a DataFrame in the following examples.

Example 1: Setting the Index of the Dataframe Using Set_Index() Function

A sample DataFrame with some rows and columns is created first. Just a simple DataFrame containing a “dummy” student record is created here. The name, age, subject, and fee are the four columns or variables in the DataFrame “df.”

We first imported the Pandas module to use the features and functions provided by the library. Then, a dictionary is passed in parameters of the pd.DataFrame() functions as an argument to create a “df” DataFrame.

Observe that on the left side of the displayed DataFrame, there is a number at the beginning of each row (the numbers from 0 to 6). These numbers are known as indexes. Now, we use the Pandas set_index() method to set the index of the “df” DataFrame. To accomplish this, we must type the name of the DataFrame, followed by a “dot” and then the method name which is “set index()”. We use the column name between the parentheses of the set_index() function.

The “fee” column has taken the place of the previous integer index (0 to 6). We passed the “fee” column inside the set_index() function as an argument to set it as the row index of our DataFrame.

Example 2: Setting the Index of the DataFrame Using a List

We can also provide the DataFrame with a list of labels which can be either strings or numbers. We use the set_index() function to create a new index in the DataFrame by using the list object. Let’s create our DataFrame with a dummy data after importing the Pandas modules.

Our DataFrame is created with three columns – “name”, “age”, and “country” – storing the dummy data. Now, using a list of labels, a Python Index is created which we will then pass to the DataFrame.set index() function as an input.

We passed a list containing the labels of row-indexes [‘R1’, ‘R2’, ‘R3’, ‘R4′,’R5’, ‘R6’] to the pd.index() function and assigned it to the “index” variable. The variable is then passed as an argument inside the parentheses of the set_index() function to set the DataFrame’s indexes.

As seen in the given DataFrame, our specified list replaced the default index of the DataFrame with labels (“R1”, “R2”, “R3”, “R4”, “R5”, “R6”).

Example 3: Setting the Index of the DataFrame Using Multiple Columns

DataFrames in Python Pandas having more than one row or column as an index are known as multi-index DataFrames. Using the DataFrame.set_index() function, we may set several columns as row labels. It should be understood that setting more than one index makes our DataFrame complicated. The index can be structured in several ways. We will show you how to set the several columns as an index in a simple way. Let’s first create our DataFrame.

Our DataFrame has four columns – “id”, “name”, “course”, and “code”.

From these columns, we decide which columns are appropriate to use as indexes of our DataFrame. After deciding the suitable columns, we pass a list with two labels inside the set_index() function.

The columns “id” and “code” are set as the row-indexes in the DataFrame. By using the column names inside the list and passing them to the set_index(), we assigned these columns as the indexes. The list [“id”, “code”] is passed as the argument of the set_index(). Both the name and region columns are the new indexes, as seen in the output. 

Example 4: Setting the Index of the DataFrame Using Python Series

A multi-index DataFrame can be created by assigning new series using the “DataFrame.set_index()” function when we need to change the existing integer index with some Pandas series rather than the DataFrame’s columns. We create a DataFrame first by passing a dictionary inside the pd.DataFrame() function to demonstrate how a series can be passed as the DataFrame’s first and second-level indexes.

Now, we create a series by passing a list of integers inside the parentheses of the pd.Series() function. We assign this series to “n” variable.

As seen in the given DataFrame, our series “n” and “n **2” are set as the first and the second-level indexes. 

Example 5: Setting the Index of the DataFrame Using Python Range

Let’s say we need to specify a series of integers as the DataFrame’s index so that it can begin at any number. For instance, we want to start the id number for the employee DataFrame at 1. It is not possible to use the DataFrame.set_index() function with a list of all the numbers as input. The Python range() method can be used in this situation. By using the range() function, we can create a Pandas index that we can then pass to the DataFrame.set index() function. Let’s create a DataFrame so we can replace its row_index by using the range() function.

We created our DataFrame with the columns “name”, “rank”, “bonus”, and “salary”. Now, let’s set the index using the range() function in place of the default integer index. The range() method returns a series of numbers that starts at 0 by default, increases by 1 (by default), and ends before a specified number.

We specified the range of index to start at 1, increase by 1, and end before 6. After specifying the range of index, we passed the “index” variable in the set_index() function as an input to set the row-index of our DataFrame.

Conclusion

In this tutorial, we discussed the indexes of a DataFrame and how to set the new indexes in an existing DataFrame. We have seen that the Python constructor creates an integer index for each row by default but it can be changed by using the set_index() function. We have seen the syntax of the set_index() function in this tutorial and implemented the multiple examples to teach you how to set the row-index of the DataFrame using lists, series, and columns in Pandas.

About the author

Aqsa Yasin

I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.