str.replace()
The method str.replace() is used to substitute a string or regex with a string value or data. The replace() function can substitute anything with anything else, also the strings and regex. Take a look at the str.replace() function’s syntax.
Syntax
Parameters:
- pat: compiled regex or str. Regular expressions or character sequences can both be used as strings.
- repl: callable or str. Substitute string or callable. A replacement string must be returned by the callable for it to be used after receiving the regex match object.
- n: By default, set as -1, int. Total replacements to be made.
- case: None by default, bool. Finds out whether “replace” is case-sensitive:
- Case sensitive, if True.
- For case insensitivity, set to False
- If pat is specified as a compiled regex, it can”t be set.
- flags: 0 or no flags by default, int. Flags in the regex module, like re.IGNORECASE. If pat is specified as a compiled regex, it cannot be set.
- regex: True by default, bool. Identifies whether a regular expression is present in the passed-in pattern: If True, the passed pattern is considered to be a regular expression. Otherwise, the pattern is treated as a literal string.
Scenario 1: str.Replace() in DataFrame
We will apply this function on pandas DataFrame columns to replace single/multiple values. We will see several examples in this scenario.
Syntax:
Single –
Multiple-
Here, old is the existing string and new is the new string that replaces the existing one.
Example 1: Replace Single String
In this example, we are having a DataFrame named “records” that hold ”chemical”,”alphabet” and “valency’’ columns. Replace ‘hydrogen’ with “Hydrogen Chemical” in the chemical column.
records = pandas.DataFrame({'chemical': ['hydrogen', 'nitrogen', 'oxygen', 'hydrogen', 'sodium'],
'alphabet': ['HY', 'N', 'O', 'HY', 'NA'],
'valency': [10, 2, 3, 4, 11]})
print(records)
# Replace 'hydrogen' with “Hydrogen Chemical” in the chemical column.
records['chemical'] = records['chemical'].str.replace(
'hydrogen', 'Hydrogen Chemical')
print()
print(records)
Output:
Explanation
There are two strings with ‘hydrogen’ in the chemical column. So, both of them were replaced with “Hydrogen Chemical”.
Example 2: Replace Single Character
In this example, we are having a DataFrame named “records” that hold ”chemical” and ”alphabet” columns. Replace ‘O’ with “o” in the alphabet column.
records=pandas.DataFrame({'chemical':['hydrogen','nitrogen','oxygen','hydrogen'],
'alphabet':['HY','N','O','NO']})
print(records)
# Replace 'O' with "o" in the alphabet column.
records['alphabet']=records['alphabet'].str.replace('O','o')
print()
print(records)
Output:
There are two characters in the alphabet column. So, both of them were replaced with “O”.
Example 3: Replace Multiple Strings
Let us replace ‘hydrogen’ with ‘HYDROGEN’, ‘oxygen’ with ‘OXY’ and ‘sulphur’ with ‘S’ in the chemical column
records=pandas.DataFrame({'chemical':['hydrogen','nitrogen','oxygen','hydrogen','sulphur'],
'alphabet':['HY','N','O','NO','SUL']})
print(records)
# Replace multiple values at a time.
records['chemical']=records['chemical'].replace(['hydrogen','oxygen','sulphur'],['HYDROGEN','OXY','S'])
print()
print(records)
Output:
2 “hydrogen” strings are replaced with “HYDROGEN”, 1 “oxygen” and “sulphur” are replaced with “OXY” and “S”.
Example 4: Replace Multiple Strings using Dictionary
Let us replace ‘hydrogen’ with ‘ACID’ and ‘sulphur’ with ‘BASE’ in the chemical column
records=pandas.DataFrame({'chemical':['hydrogen','nitrogen','oxygen','hydrogen','sulphur'],
'alphabet':['HY','N','O','NO','SUL']})
print(records)
# Replace multiple values at a time.
records['chemical']=records['chemical'].replace({'hydrogen':'ACID','sulphur':'BASE'})
print()
print(records)
Output:
2 “hydrogen” strings are replaced with “ACID” and 1 “sulphur” is replaced with “BASE”.
Scenario 2: str.Replace() in Series
Let us create a pandas Series and replace strings with new strings with a regex pattern. We can specify the substring that has to be replaced inside the re.compile() method and this can be passed inside the replace() method as the first parameter and new string as the second parameter which replaces the substring.
Syntax:
Here, old is the existing string and new is the new string that replaces the existing one.
Example: Replace Single String
In this example, we are having a Series named “record” that holds 4 strings.
Replace the substring – “gen” with “AND”.
import re
# Create pandas Series with 4 strings
record=pandas.Series(['hydrogen','nitrogen','HY','N'])
# Replace the substring - "gen" with "AND".
print(record.str.replace(re.compile("GEN"),"AND"))
print()
# Replace the substring - "gen" with "AND" by ignoring the case.
print(record.str.replace(re.compile("GEN",flags=re.IGNORECASE),"AND"))
Output:
Explanation
- In the first output, we are not ignoring the case-sensitivity. So, “gen” and “GEN” are different. As “GEN” does not exist in the Series, no replacement is done.
- In the second output, we are ignoring the case-sensitivity by setting “flags=re.IGNORECASE”. So, “gen” and “GEN” are the same. It is replaced with “AND”. Hence, the updated strings are – “hydroAND” and “nitroAND”.
Conclusion
We teach you how to substitute/replace the string values in pandas. We have discussed the syntax of the str.replace() method to understand its functionality. We implemented a few examples in this tutorial to teach you how to substitute the string values with string data, replace a particular character, replace a substring or characters sequence with the string values, and replace multiple strings from the DataFrame column using str.replace() and replace() functions. Also, how to replace strings from the Series using regex.