After finishing this tutorial, you will know:
- Python methods and operators to determine whether a string comprises a substring or not.
- Filtering the DataFrame when a substring is present in a column
- Using regex, determine whether a string includes a pattern substring.
How to Find if a Substring or Expression Exists in the String Value in Pandas?
To determine whether a string consists of a particular substring or not, there are several functions that we can do to do so.
Example # 1: Check Whether the Specified Substring is Included in String Data Using the in Operator
In Python, the in operator can be used with iterable types like lists and strings. It’s used to determine whether an element is present in the iterable or not. A found element is indicated by the in operator returning True. If not, it returns False. The in operator is the quickest and most Pythonic approach to determine whether a string includes a substring in Python. The operator makes it plain to every reader of your code what you’re trying to accomplish.
The pandas series has been created using the pd.Series() function after importing the pandas module. Our series consists of string values “Floor”, “our”, “cancel”, “sure”, “tour”, “store”, “bore”, and “evil”. Now we will use the in operator to find if the specified substring exists in the string values of the pandas series or not. For iterating over each value of the pandas series, the “for” loop will be used, as seen in the script below.
By using the in operator, we got the results in the form of True and False. “True” indicates the presence of a substring for the string values, and “False” indicates the absence of a substring. We can also use the in operator with the pandas list and dataframe columns containing string values. Let’s try the in operator on a dataframe”s column. To create the dataframe, we will use the pandas pd.DataFrame() function.
First, we have created a python dictionary “dic” consisting of key-value pairs. Then we passed the “dic” dictionary inside the pd.DataFrame(). We have created our dataframe with three columns, i.e., id, name, and course. We aim to find whether substrings exist in the string column or not, so we will only focus on string columns. There are two string columns in our dataframe “name” and “column” having the string values (“Davidson”, “Hendery”, “Henderson”, “Jason”, “Kim”, “Jenson”, “Jackson”, “Carl”) and (“Python”, “Amazon”, “Economics”, “Business”, “Languages”, “Database”, “Designing”, “Drawing”) respectively.
We have specified the column “name”, which is iterated by for loop to check whether the substring “son” is present in the string values of column name or not. The function generates the result by checking each value inside the column.
Example # 2: Filter a String if the Substring is Present
The in operator will be used to filter the list, series, or dataframe by extracting the string values if the substring is present. To accomplish this, we will iterate through each item of the object using a for loop iterator to see if a substring is present. If the list items consist of the substring, the strings will be added to another list. Let”s first create a list object
First, we have created a list containing the items as string values “banana”, “apple”, “Nature”, “analyze”, “Fish”, “name”, “shirt”, “analogue”. Then an empty list “filtered” is created to store the resultant values. We have used the in operator to determine the presence of the substrings. The append function is used to append the output strings (where the substring was present) in the empty string “filtered”. We got four values, i.e., “banana”, “analyze”, “name”, “analogue” which contain the substring “na”. Now let”s try this with a dataframe column. We will use the dataframe which we have created in Example # 1.
This time we will check for the course column.
We have specified the course column to be iterated over by for loop to check whether the substring “on” is included in the course column of the dataframe. The values in which the substring exists are appended to an empty list “filtered” which we have printed as an output.
The “Series.str” function can obtain the series’ values as strings and perform various operations. To check whether a pattern or regex is present within an Index or a Series string, we can use the “Series.str.contains()” function in Pandas. Depending on whether a specified pattern or regex is present in a Series or Index string value, the method returns a boolean Index or series.
Syntax: Series.str.contains(pat, case= True, flags= 0, na= nan, regex= True)
pat: Regular expression or character sequence.
case: case sensitive if set to True.
flags: Flags to be passed through re module, for example, re.IGNORECASE.
na: To fill the missing or null values.
regex: The pat is considered to be a regular expression if True.
Example # 4: Use Series.str.contains() Function to Determine if the Substring is Present in the Data
First, we will create a list with string values. Along with the pandas, we will also import the re-modules. The re-module offers a set of efficient regular expression features that let you easily determine whether a provided string matches or contains a particular pattern using the match method and using the search method, respectively.
We have created a list with pd.Series() function with the string values “team_A”, “team_AB”, “team_B”, “Team_Alpha”, “team_Ace”, “team_Stars”, and “team_C”. We have also specified an index for our series “sr” as “team 1”, “team 2”, “team 3”, “team 4”, “team 5”, “team 6”, and “team 7”. Now, let’s use the Series.str.contains() function to find if the substring is present in the string values of the list.
Inside the str.contains() function, we have specified the pat parameter as “team_A” to check if the substring “team_A” is included in the string values of the list. The output shows that a series object containing boolean values is returned by the Series.str.contains() function. Where the supplied pattern is found in the string, it is true; otherwise, False is returned.
Example # 5: Use Series.str.contains() Function to Determine if the Pattern is Present in the Data
We will now check to see if the specified pattern exists in the string data of the underlying series object. Let’s create a string containing string values.
We have created a series with the values “Mickey”, “Rickon”, “Alex”, “Nick”, “Rov”, “Tim”, and “Danny”. To determine whether a pattern is included in the string data of the series object, we will now use the “Series.str.contains()” function.
We have specified path = “i[a-z]” to find if there is a string value in the series “s” having the letter “i” followed by any lowercase alphabet.
In this tutorial, we have tried to teach how to determine whether a string includes a substring in pandas. This can be done using several methods, but we have discussed a few of them in the examples. We implemented examples to teach you how to Determine whether the string contains the specified substring using the in operator, Filter a string if the substring is present in a series or dataframe, and use the str.contains() function to determine if the substring or expression is present in the data.