There can be several case scenarios where it is required to extract a substring from a string in Python. For instance, while working on large datasets, you may need to get specific data from the text fields or match a particular pattern in a string, such as an email address or phone number. Moreover, the substring extraction operation also assists in text processing and analysis.
This post will cover the following approaches:
- Method 1: Python Extract Substring Using Regex in “re.search()” Method
- Method 2: Python Extract Substring Using Regex in “re.match()” Method
- Method 3: Python Extract Substring Using Regex in “re.findall()” Method
- Method 4: Python Extract Substring Using Regex in “re.finditer()” Method
Method 1: Python Extract Substring Using Regex in “re.search()” Method
The Python “re.search()” method looks for the first occurrence of the instance of the added pattern within a string and outputs a “Match” object. It can be invoked when you want to locate a specific substring inside a longer string but have no idea how frequently it occurs.
Syntax
To use the re.search() method, follow the given syntax:
Here:
- “pattern” represents the regex that you want to search.
- “string” refers to the specified string in which you want to search.
- “flags” represents the optional parameters, such as multi-line mode, case sensitivity, etc.
Example 1: Extracting Text-based Substring Using “re.search()” Method
For utilizing the “re.search()” method to extract a substring, firstly import the “re” module. This module offers support for regex:
Define the string from which you want to retrieve a substring:
Then, specify the regex. Here, “r” indicates that it is a raw string to treat backlashes as the literal characters, and “best” is the defined regular expression or regex:
Pass the created “regex” and “string” to the re.search() method and store the resultant object in the “match”:
Now, add the given condition to extracts the matched substring from the “match” object returned by the re.search() method, and display it to the console:
sub_string = match.group()
print(sub_string)
It can be observed that the substring “best” has been extracted by utilizing the “group()” method of the match object:
Example 2: Extracting Numeric Substring Using “re.search()” Method
Now, define a numeric string and search for the first occurrence of one or more digits in it by passing the “\d+” as the regex to “re.search()” method:
print(re.search(r'\d+', string))
In the specified regex:
- “\” is utilized for escaping the letter “d” (digit character).
- “+” signifies one or match digits in a row:
As you can see, the matched object has been returned by the “re.search()” method.
Method 2: Python Extract Substring Using Regex in “re.match()” Method
“re.match()” only searches for the regex at the start of the strings and outputs a Match object in case of a successful search. This method can be utilized when you know that the substring only occurs at the start of the given string.
Syntax
To invoke the re.match() method, follow the given syntax:
Example
Firstly, define the regular expression as “‘^l…….t$‘”. This regex matches the strings that begin with “l”, end with “t”, and have exactly 8 characters.
Then, declare the string. Pass it to the re.match() method, along with the regex as arguments:
result = re.match(regex, string)
Add the “if-else” condition and specify the respective print statements for the cases if “Match” object has been returned or not:
print("Search has been done successfully", result)
else:
print("Search was unsuccessful.")
Output
Method 3: Python Extract Substring Using Regex in “re.findall()” Method
The “re.findall()” Python method searches for every instance of a pattern within the given strings and outputs a list of extracted substrings. This method is used in those case scenarios where it is required to retrieve multiple substrings without any particular order.
Syntax
To invoke the re.findall() method, check out the given syntax:
Example
Define a string comprising numeric values. Then, specify the regex pattern as “r’\d+‘” to match one or more digits:
regex = r'\d+'
Then, call the “re.findall()” method and pass the defined regex and the string as arguments
Now, iterate over the returned “Match” object stored in the matches variable and print the elements on the console:
print(match)
Output
Method 4: Python Extract Substring Using Regex in “re.finditer()” Method
The “re.finditer()” method works the same as the re.findall() method. However, it returns an iterator rather than a list of substrings. In Python, this method can be utilized when there exists a large data set and it does not need to store all matches at once. More specifically, the re.finditer() method processes the extracted substring one at a time.
Syntax
To invoke the re.finditer() method, follow the given syntax:
Example
First, create a string. Then, define a regex pattern as “r'[A-Z]+’” that matches one or more uppercase letters:
regex = r'[A-Z]+'
Pass the regex and the string as arguments to the “re.finditer()” method and store the resultant Match object in “matches”:
Lastly, iterate over the matches object elements, extract the substring with the help of the “group()” method and print out on the console:
sub_string = match.group()
print(sub_string)
Output
We have compiled essential approaches related to extracting substring in Python.
Conclusion
To extract substring using regex in Python, use the “re.search()”, “re.match()”, “re.findall()”, or the “re.finditer()” methods. Depending on your requirements, utilize “re.search()” method when it is required to extract only the first instance of the regex, “re.match()” extracts the substring present the start of a string, “re.findall()” retrieves multiple substrings according to the pattern, and lastly “re.finditer()” process the multiple strings one at a time. This blog covered the methods for extracting substring in Python.