Python

Find Strings in Pandas

This article will help you understand various methods we can use to search for a string in a Pandas DataFrame.

Pandas Contains Method

Pandas provide us with a contains() function that allows searching if a substring is contained in a Pandas series or DataFrame.

The function accepts a literal string or a regular expression pattern which is then matched against the existing data.

The function syntax is as shown:

1
Series.str.contains(pattern, case=True, flags=0, na=None, regex=True)

The function parameters are expressed as shown:

  1. pattern – refers to the character sequence or regex pattern to search.
  2. case – specifies if the function should obey case sensitivity.
  3. flags – specifies the flags to pass to the RegEx module.
  4. na – fills the missing values.
  5. regex – if True, treats the input pattern as a regular expression.

Return Value

The function returns a series or index of Boolean values indicating if the pattern/substring is found in the DataFrame or series.

Example

Suppose we have a sample DataFrame shown below:

1
2
3
4
5
# import pandas
import pandas as pd

df = pd.DataFrame({"full_names": ['Irene Coleman', 'Maggie Hoffman', 'Lisa Crawford', 'Willow Dennis','Emmett Shelton']})
df

Search a String

To search for a string, we can pass the substring as the pattern parameter as shown:

1
print(df.full_names.str.contains('Shelton'))

The code above checks if the string ‘Shelton’ is contained in the full_names columns of the DataFrame.

This should return a series of Boolean values indicating whether the string is located in each row of the specified column.

An example is as shown:

To get the actual value, you can pass the result of the contains() method as the index of the dataframe.

1
print(df[df.full_names.str.contains('Shelton')])

The above should return:

1
2
full_names
4  Emmett Shelton

Case Sensitive Search

If case sensitivity is important in your search, you can set the case parameter to True as shown:

1
print(df.full_names.str.contains('shelton', case=True))

In the example above, we set the case parameter to True, enabling a case-sensitive search.

Since we search for the lowercase string ‘shelton,’ the function should ignore the uppercase match and return false.

RegEx search

We can also search using a regular expression pattern. A simple example is as shown:

1
print(df.full_names.str.contains('wi|em', case=False, regex=True))

We search for any string matching the patterns ‘ wi’ or ’em’ in the code above. Note that we set the case parameter to false, ignoring case sensitivity.

The code above should return:

Closing

This article covered how to search for a substring in a Pandas DataFrame using the contains() method. Check the docs for more.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list