Python Pandas

Pandas Remove Rows with Condition

This article will discuss how to use the Pandas drop() function to delete rows that match a specific condition.

Sample DataFrame

In this tutorial, we will use a sample DataFrame with the data below stored in movies.csv file:

,title,release_year,imdb_rating

0,Iron Man,2008,7.9

1,The Incredible Hulk,2008,6.6

2,Iron Man 2,2010,6.9

3,Thor,2011,7.0

4,Captain America: The first Avenger,2011,6.9

5,The Avengers,2012,8.0

6,Iron Man 3,2013,7.1

7,Thor: The Dark World,2013,6.8

8,Captain America: The Winter soldier,2014,7.8

9,Guardians of the Galaxy,2014,8.0

10,Avengers: Age of Ultron,2015,7.3

11,Ant-Man,2015,7.3

12,Captain America: Civil War,2016,7.8

13,Doctor Strange,2016,7.5

14,Guardians of the Galaxy: Volume 2,2017,7.6

15,Spiderman: Homecoming,2017,7.4

16,Thor: Ragnarok,2017,7.9

17,Black Panther,2018,7.3

18,Avengers: Infinity war,2018,8.4

19,Ant-man and the Wasp,2018,7.0

20,Captain Marvel,2019,6.8

21,Avengers: Endgame,2019,8.4

22,Spider-man: Far from home,2019,7.4

23,Black Widow,2021,6.7

24,Shang-Chi,2021,7.4

25,Spiderman: No way home,2021,8.4

26,Doctor Strange: In the Multiverse of Madness,2022,7.5

Once downloaded, load the CSV file as shown below:

import pandas as pd

df = pd.read_csv('movies.csv', index_col=[0])

df

Delete rows based on Column Condition

To delete rows based on a single condition in a specified column, we can use the drop() function. For example, if we want to delete any rows where the release_year is below 2012, we can do:

df = df.drop(df[df['release_year'] < 2012].index, inplace=False)

df

In this example, we command the drop function to delete all the rows where the value in the ‘release_year’ column is less than 2012. The ‘inplace’ parameter prevents the function from modifying the original DataFrame. This should return:

If you noticed, the rows in the above output have a release_year value of 2012 and above.

Delete Rows Based on Multiple Conditions.

We can also pair more than one condition when removing rows. For example, to remove the rows where the rating is greater than 7.3 and the release year is greater than 2018, we can do:

df.drop(df[(df['release_year'] > 2018) & (df['imdb_rating'] >= 7.3)].index, inplace=False)

df

The code above allows us to use the ampersand operator to combine multiple conditions.

Closing

The article demonstrates how to use the Pandas drop() function to remove rows that match single or multiple conditions in a Pandas DataFrame.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list