Using “filter()”, we can filter a DataFrame based on indexes. With this technique, the DataFrame’s rows or columns will be a subset of the original DataFrame according to the specified labels in the given index.
There are different methods to filter the rows of the DataFrame based on their index. But in this tutorial, our main focus is the filter() function. Let’s check its syntax first so we can use it to filter the data. The method returns an object of the same type as the one that is used as input.
Syntax:
Parameters:
-
- items: It requires a list of the axis labels that you want to filter.
- like: Keep the information axis where “arg in col == True”. The axis string label that we want to filter is taken.
- regex: Keep the info axis where Re.search(regex, col) == True.
- axis: The axis on which to filter the {‘index’ or 0, ‘columns’ or 1, None}. By default, this is the information axis. For series, it’s “index”. For DataFrame, it’s “columns”.
Since we have seen the syntax, we demonstrate the filter() function in the following examples:
Example 1: Filter by Numeric Index
Create the DateFrame with 2 columns which contains 5 records and return only the particular rows based on index.
hobbies=pandas.DataFrame({'stud_name':['stud 1','stud 2','stud 3','stud 4','stud 5'],
'hobbies':['music','singing','dance','play','drink']})
print(hobbies)
print()
# Get only first row
print(hobbies.filter([0],axis=0))
print()
# Get only fifth row
print(hobbies.filter([4],axis=0))
Output:
0 stud 1 music
1 stud 2 singing
2 stud 3 dance
3 stud 4 play
4 stud 5 drink
stud_name hobbies
0 stud 1 music
stud_name hobbies
4 stud 5 drink
Explanation:
-
- In the first output, we returned the first row using index-0.
- In the second output, we returned the fifth row using index-4.
Example 2: Filter by Multiple Numeric Indices
Create the DateFrame with 2 columns which contains 5 records and return only the particular rows based on the index at a time.
hobbies=pandas.DataFrame({'stud_name':['stud 1','stud 2','stud 3','stud 4','stud 5'],
'hobbies':['music','singing','dance','play','drink']})
# Get first two rows
print(hobbies.filter(items=[0,1],axis=0))
print()
# Get only second,third and fifth rows
print(hobbies.filter(items=[1,2,4],axis=0))
Output:
0 stud 1 music
1 stud 2 singing
stud_name hobbies
1 stud 2 singing
2 stud 3 dance
4 stud 5 drink
Explanation:
-
- In the first output, we returned the first and second rows at a time using index-0 and 1.
- In the second output, we returned the second, third, and fifth rows using index-1, 2, and 4.
Example 3: Filter by Non-Numeric Index
Create the DateFrame with 3 columns which contains 4 records and return only the particular rows separately based on index. Here, the index is of “.string” type.
journey=pandas.DataFrame({'from':['city 1','city 1','city 3','city 4'],
'to':['ap','usa','city 2','city 1'],
'distance':[200,500,466,100]},
index=['passenger 1','passenger 2','passenger 3','passenger 4'])
print(journey)
print()
# Get the row where index-'passenger 3'.
print(journey.filter(items=['passenger 3'],axis=0))
print()
# Get the row where index-'passenger 1'.
print(journey.filter(items=['passenger 1'],axis=0))
Output:
passenger 1 city 1 ap 200
passenger 2 city 1 usa 500
passenger 3 city 3 city 2 466
passenger 4 city 4 city 1 100
from to distance
passenger 3 city 3 city 2 466
from to distance
passenger 1 city 1 ap 200
Explanation:
-
- In the first output, we returned the third row using index-“passenger 3”.
- In the second output, we returned the first row using index-“passenger 1”.
Example 4: Filter by Multiple Non-Numeric Indices
Return the last three rows at a time based on the index.
journey=pandas.DataFrame({'from':['city 1','city 1','city 3','city 4'],
'to':['ap','usa','city 2','city 1'],
'distance':[200,500,466,100]},
index=['passenger 1','passenger 2','passenger 3','passenger 4'])
# Get the row where index- 'passenger 2','passenger 3','passenger 4'
print(journey.filter(items=['passenger 2','passenger 3','passenger 4'],axis=0))
Output:
passenger 2 city 1 usa 500
passenger 3 city 3 city 2 466
passenger 4 city 4 city 1 100
Example 5: Filter Using the Like Parameter
Let’s utilize the “like” parameter to return the rows based on the index like – “passenger” and “r 1”, separately.
journey=pandas.DataFrame({'from':['city 1','city 1','city 3','city 4'],
'to':['ap','usa','city 2','city 1'],
'distance':[200,500,466,100]},
index=['passenger 1','passenger 2','passenger 3','passenger 4'])
# Get the row where the index is like 'passenger'.
print(journey.filter(like='passenger',axis=0))
print()
# Get the row where the index is like 'r 1'.
print(journey.filter(like='r 1',axis=0))
Output:
passenger 1 city 1 ap 200
passenger 2 city 1 usa 500
passenger 3 city 3 city 2 466
passenger 4 city 4 city 1 100
from to distance
passenger 1 city 1 ap 200
Explanation:
-
- All indices contain “passenger”. So, all rows were returned in the first output.
- Only one index is like “r 1”. So, the row with index – “passenger 1” is returned in the second output.
Example 6:
Let’s consider the DataFrame with the indices – [‘sravan’,’ravan’,’pavan’,’Ravi’] and then return the rows with indexes like “n” and “M” separately.
journey=pandas.DataFrame({'from':['city 1','city 1','city 3','city 4'],
'to':['ap','usa','city 2','city 1'],
'distance':[200,500,466,100]},
index=['sravan','ravan','pavan','Ravi'])
# Get the row where the index is like 'n'.
print(journey.filter(like='n',axis=0))
print()
# Get the row where the index is like 'M'.
print(journey.filter(like='M',axis=0))
Output:
sravan city 1 ap 200
ravan city 1 usa 500
pavan city 3 city 2 466
Empty DataFrame
Columns: [from, to, distance]
Index: []
Explanation:
-
- There are three rows where the index include “n”.
- There is no row where the indexes include “M”. So, the empty DataFrame is returned.
Conclusion
We taught you how to retrieve the DataFrame rows based on their indexes in Pandas. We saw the syntax of the filter() function first to understand its parameters and the working of the filter function. We implemented the different examples to teach you how to filter a DataFrame using the indexes of numerical values and non-numeric values. We also implemented some examples to explain how you can filter a DataFrame for the indexes that contain a particular character or string by passing the like parameter to the filter() function.