Python Pandas

Return the Index of the First and Last Valid Not-Null Values in Pandas

In Machine Learning, cleaning the missing values is an important step. Missing values lead to data inconsistency and prediction accuracy will be impacted. In some cases, we need to get the first value or the last value that is not-null from the Series/DataFrame. In this guide, we will discuss how to return the index of the first valid not-null value and the index of the last valid not-null value using the first_valid_index and last_valid_index functions in Series and DataFrame.

Pandas First_Valid_Index

 The pandas.Series.first_valid_index() function returns the index of the first non-NA value. If the Series holds all the null values or if it is empty, “None” is returned.

Syntax:

Let’s see the syntax of this function. It doesn’t take any parameters:

pandas.Series.first_valid_index()

The pandas.DataFrame.first_valid_index() function returns the index of the first non-NA value that is present in the row. If the DataFrame holds all null values or if it is empty, “None” is returned.

Syntax:

Let’s see the syntax of this function. It doesn’t take any parameters:

pandas.DataFrame.first_valid_index()

Example 1: Series with Some Missing Values

Create the “departments” Series with 10 strings that include the missing values. Use the pandas.Series.first_valid_index() function to get the first not-null value from the previous Series.

import pandas

# Create Series with 10 strings that include Missing values
departments=pandas.Series([None,None,'Computer','Mechanical','Auto-Mobile',None,'Civil','Electronics','Bio-tech',None])
print(departments,'\n')

# Series.first_valid_index()
print(departments.first_valid_index(),'\n')

Output:

“Computer” is the first not-null value that is present in the Series. Its index is 2.

Example 2: Series with All Missing Values

Create the “departments” Series with five missing values. Use the function and try to return the index of the first not-null value from the departments.

import pandas

# Create Series with 5 strings with all Missing values
departments=pandas.Series([None,None,None,None,None])
print(departments,'\n')

# Series.first_valid_index()
print(departments.first_valid_index(),'\n')

Output:

All are missing values in the previous “departments” Series. So, the result is “None”.

Example 3: DataFrame with Some Missing Values

Create the “departments” DataFrame with five rows and two columns. Return the first valid index of the DataFrame using the pandas.DataFrame.first_valid_index() function.

import pandas

# Create DataFrame with some missing values
departments=pandas.DataFrame([[120,None],[None,'Chemical'],[100,None],[100,'Computers'],[None,'Biotech']],
                      columns=['Department_id','Department_name'])
print(departments,"\n")

# DataFrame.first_valid_index()
print(departments.first_valid_index())

Output:

The Department_id (120.0) under the first row is not-null. So, it’s index is returned.

Example 4: DataFrame with All Missing Values

Create the “departments” DataFrame with two columns with all missing values. Try to return the first valid index of the DataFrame using the pandas.DataFrame.first_valid_index() function.

import pandas

# Create DataFrame with all missing values
departments=pandas.DataFrame([[None,None],[None,None]],
                      columns=['Department_id','Department_name'])
print(departments,"\n")

# DataFrame.first_valid_index()
print(departments.first_valid_index(),'\n')

Output:

The result is “None” since the entire DataFrame is “None”.

Pandas Last_Valid_Index

The pandas.Series.last_valid_index() function returns the index of the last non-NA value. If the Series holds all the null values or if it is empty, “None” is returned.

Syntax: 

Let’s see the syntax of this function. It doesn’t take any parameters:

pandas.Series.last_valid_index()

The pandas.DataFrame.last_valid_index() function returns the index of the last non-NA value that is present in the row. If the DataFrame holds all the null values or if it is empty, “None” is returned.

Syntax: 

Let’s see the syntax of this function. It doesn’t take any parameters:

pandas.DataFrame.last_valid_index()

Example 1: Series with Some Missing Values

Create the “departments” Series with 10 strings that include the missing values. Use the function and return the index of the last not-null value using the pandas.Series.last_valid_index() function.

import pandas

# Create Series with 10 strings that include Missing values
departments=pandas.Series([None,None,'Computer','Mechanical','Auto-Mobile',None,'Civil','Electronics','Bio-tech',None])
print(departments,'\n')

# Series.last_valid_index()
print(departments.last_valid_index(),'\n')

Output:

“Bio-tech” is the last not-null value that is present in the Series. Its index is 8.

Example 2: Series with All Missing values

Create the “departments” Series with five missing values. Use the function and try to return the index of the last not-null value.

import pandas

# Create Series with 5 strings with all Missing values
departments=pandas.Series([None,None,None,None,None])
print(departments,'\n')

# Series.last_valid_index()
print(departments.last_valid_index(),'\n')

Output:

All are missing values in the previous “departments” Series. So, the result is “None”.

Example 3: DataFrame with Some Missing Values

Create the “departments” DataFrame with five rows and two columns. Return the last valid index of the DataFrame using the pandas.DataFrame.last_valid_index() function.

import pandas

# Create DataFrame with some missing values
departments=pandas.DataFrame([[120,None],[None,'Chemical'],[100,None],[100,'Computers'],[None,'Biotech']],
                      columns=['Department_id','Department_name'])
print(departments,"\n")

# DataFrame.last_valid_index()
print(departments.last_valid_index())

Output:

The Department_name (“Biotech”) under the last row is not-null. So, its index is returned.

Example 4: DataFrame with All Missing values

Create the “departments” DataFrame with two columns with all missing values. Try to return the last valid index of the DataFrame using the pandas.DataFrame.last_valid_index() function.

import pandas

# Create DataFrame with all missing values
departments=pandas.DataFrame([[None,None],[None,None]],
                      columns=['Department_id','Department_name'])
print(departments,"\n")

# DataFrame.last_valid_index()
print(departments.last_valid_index())

Output:

The result is “None” since the entire DataFrame is “None”.

Bonus Example:

Let’s return the first and last valid indices in specific columns of the DataFrame.

import pandas
departments=pandas.DataFrame([[120,None],[None,'Chemical'],[100,None],[100,'Computers'],[567,None]],
                      columns=['Department_id','Department_name'])
print(departments,"\n")

print(departments['Department_name'].last_valid_index())
print(departments['Department_name'].first_valid_index())
print(departments['Department_id'].last_valid_index())
print(departments['Department_id'].first_valid_index())

 Output:

  1. The last not-null value in the Department_name column is “Computers”. So, its index which is 3 is returned.
  2. The first not-null value in the Department_name column is “Chemical”. So, its index which is 1 is returned.
  3. The last not-null value in the Department_id column is 567.0. So, its index which is 4 is returned.
  4. The first not-null value in the Department_id column is 120.0. So, its index which is 0 is returned.

Conclusion

We learned how to find the first and last non-null elements in the Series and DataFrame separately with examples using the first_valid_index and last_valid_index functions. Also, we provided an example for Series and DataFrame having all missing values in them. As a bonus, we provided one example that returns the first and last valid indices from a specific column in the DataFrame.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain