The popular Python library named “Pandas” is utilized to create and manipulate DataFrames. DataFrames are tubular structures that store data in rows and columns. While working with DataFrames, we are required to select only those DataFrame rows that do not contain a specific value in one or multiple columns. To achieve this we use the “NOT IN” filter along with the “df.isin()” method.
This guide will follow the below content:
- How to Use Pandas “NOT IN” in Python?
- Using Pandas “NOT IN” Filter to Filter Rows of Single Column
- Using Pandas “NOT IN” Filter to Filter Rows of Multiple Column
- Using NumPy With “NOT IN” Filter
How to Use Pandas “NOT IN” in Python?
Pandas do not have a “NOT IN” operator. But the simple NOT IN (~) operator is utilized along with the “df.isin()” method to filter the particular data from DataFrame. It is used to check whether the data is present in the DataFrame or not.
Syntax
In this syntax, the “col_name” represents the name of the column, and “values_list” represents the list value that we used for filtering rows.
Example 1: Using Pandas “NOT IN” Filter to Filter Rows of Single Column
This example filters the rows according to the single DataFrame column value using the “NOT IN” and “df.isin()” method:
df = pandas.DataFrame({'Name':["Jason", "Joseph", "Lily", "Anna", "Scarlet"],
'Age' :[22, 25, 23, 24, 26],
'Salary':[2000, 3000, 4000, 5000, 10000],
'Team':['IT', 'QA', 'Technical', 'IT', 'Video Editing']})
print(df, '\n')
df2 = df[~df['Name'].isin(["Jason", "Lily"])]
print(df2)
In the above code:
- We imported the “Pandas” module and created the DataFrame multiple columns.
- Next, the “~” Not in operator is used along with the “df.isin()” method to retrieve all the rows except the row containing the “Jason” and “Lily” column values.
Output
This snippet shows the filtration of DataFrame rows according to the single column value.
Example 2: Using Pandas “NOT IN” Filter to Filter Rows of Multiple Column
The below code filters the Pandas DataFrame rows according to the multiple column values:
df = pandas.DataFrame({'Name':["Jason", "Joseph", "Lily", "Anna", "Scarlet"],
'Age' :[22, 25, 23, 24, 26],
'Salary':[2000, 3000, 4000, 5000, 10000],
'Team-1':['IT', 'QA', 'Technical', 'IT', 'Video Editing'],
'Team-2':['Author', 'IT', 'QA', 'IT', 'Graphics']})
print(df, '\n')
df1 = df[~df[['Team-1', 'Team-2']].isin(['QA', 'Graphics']).any(axis=1)]
print(df1)
Here:
- We imported the Pandas module and created the DataFrame with multiple columns. (Some columns have common values)
- Next, the Not in “~” operator is used to access the multiple columns and verify the presence of value using the “df.isin()” method. The rows containing the specified column values are eliminated.
Output
The specified values of DataFrame have been filtered successfully.
Example 3: Using NumPy With “NOT IN” Filter
This code utilizes the “numpy.isin()” method with the NOT IN “~” operator to filter the DataFrame rows:
df = pandas.DataFrame({'Name':["Jason", "Joseph", "Lily", "Anna", "Scarlet"],
'Age' :[22, 25, 23, 24, 26],
'Salary':[2000, 3000, 4000, 5000, 10000],
'Team':['IT', 'QA', 'Technical', 'IT', 'Video Editing']})
print(df, '\n')
df1 = df[~numpy.isin(df['Name'], ["Jason", "Lily"])]
print(df1)
According to the above code:
- The Not In “~” operator is used along with the “numpy.isin()” operator to filter rows.
- First, the “numpy.isin()” checks whether the value is found in the DataFrame.
- If the value is present, the NOT IN “~” operator reverses the value and returns all results other than the ones found.
Output
The DataFrame rows have been filtered successfully.
Conclusion
The NOT IN “~” operator is used along with the “DataFrame.isin()” method of Pandas to filter the rows of single or multiple DataFrame columns. Pandas do not contain the “NOT IN” operator separately. We can also use the “np.isin()” method with the NOT IN “~” operator to filter the specified row values. This tutorial delivered a detailed guide on the Pandas “NOT IN” operator using several examples.