This article will help you understand various methods we can use to search for a string in a Pandas DataFrame.
Pandas Contains Method
Pandas provide us with a contains() function that allows searching if a substring is contained in a Pandas series or DataFrame.
The function accepts a literal string or a regular expression pattern which is then matched against the existing data.
The function syntax is as shown:
The function parameters are expressed as shown:
- pattern – refers to the character sequence or regex pattern to search.
- case – specifies if the function should obey case sensitivity.
- flags – specifies the flags to pass to the RegEx module.
- na – fills the missing values.
- regex – if True, treats the input pattern as a regular expression.
Return Value
The function returns a series or index of Boolean values indicating if the pattern/substring is found in the DataFrame or series.
Example
Suppose we have a sample DataFrame shown below:
import pandas as pd
df = pd.DataFrame({"full_names": ['Irene Coleman', 'Maggie Hoffman', 'Lisa Crawford', 'Willow Dennis','Emmett Shelton']})
df
Search a String
To search for a string, we can pass the substring as the pattern parameter as shown:
The code above checks if the string ‘Shelton’ is contained in the full_names columns of the DataFrame.
This should return a series of Boolean values indicating whether the string is located in each row of the specified column.
An example is as shown:
To get the actual value, you can pass the result of the contains() method as the index of the dataframe.
The above should return:
4 Emmett Shelton
Case Sensitive Search
If case sensitivity is important in your search, you can set the case parameter to True as shown:
In the example above, we set the case parameter to True, enabling a case-sensitive search.
Since we search for the lowercase string ‘shelton,’ the function should ignore the uppercase match and return false.
RegEx search
We can also search using a regular expression pattern. A simple example is as shown:
We search for any string matching the patterns ‘ wi’ or ’em’ in the code above. Note that we set the case parameter to false, ignoring case sensitivity.
The code above should return:
Closing
This article covered how to search for a substring in a Pandas DataFrame using the contains() method. Check the docs for more.