In this Python article, you will learn how to perform the search and replace operations on a Python string using regular expressions. The regular expression in Python provides more methods and techniques to match the patterns and perform operations on a string than a simple string method.
This tutorial discusses regular expression sub() and subn() methods that can search and replace patterns in a string. With the help of these two methods, we can search and replace one or more than one occurrence of a specific pattern from a string. By the end of this tutorial, you will build a solid understanding of the following methods.
Methods | Description |
re.sub(pattern, replacement, string)
|
This method replaces all the occurrence of pattern from the string with replacement . |
re.sub(pattern, replacement, string, count =n)
|
This method will replace the first n occurrences of pattern from a string with replacement. |
re.subn(pattern, replacement, string)
|
It can also replace all the occurrence of pattern from the string with replacement, same as sub() method. But it returns a Tuple and sub() returns a string. |
Python re.sub() method
With the help of re.sub() method we can replace or substitute a specific regular expression pattern from a string.
How to use re.sub() method?
In order to use re.sub() we first need to understand its syntax and the value it returns.
re.sub() syntax:
import re
re.sub(pattern, repl, string, count = 0, flags=0)
Arguments
The sub() method can accept 5 arguments out of which the first three are mandatory and the other 2 are optional.
- pattern: It is a regular expression pattern that we want to find in the targeted string.
- repl: It can be a string or a function that we want to substitute or replace in a string.
-
If the
repl
is a string, the sub() method will replace all the matched pattern with the given repl string
- If repl is a function, the sub() method will return all the matched pattern with the value returned by the function.
- string: It is the targeted string value from where we want to replace the string pattern.
- count: It is an integer value that defines the maximum number of pattern occurrences that needs to be replaced by the sub() method. It is an optional argument, which default value is 0 means replace all the occurrences of the pattern.
-
flags:
flags arguments define the flag conditions for the replacement. It is an optional argument value which is 0 means no flag is raised. But if you want, you can raise some flags using the flags argument such as
re.I
for case-insensitive matching,re.A
for ASCII-Only matching.
Return value
The
sub()
method return a string by replacing the specified number of occurrence from the targeted string. If the pattern is not to be found in the targeted string, the complete string is returned without any changes.
Example 1
Let’s say we have given a string and we need to replace all the UK, and England, and Britain with the word the United Kingdom.
import re
string = '''UK is an island country located on northwestern coast Europe. English is the main Language of England. The Capital of Britain is London'''
#regular expression pattern
pattern = 'UK|Britain|England'
#string to replace
repl = 'United Kingdom'
#replace UK, England and Britain with United Kingdom
replaced_string = re.sub(pattern, repl, string)
print(replaced_string)
Output
United Kingdom is an island country located on the northwestern coast of Europe. English is the main Language of United Kingdom. The Capital of United Kingdom is London
Example 2
If the repl contains any escape character, sub() will process it accordingly.
import re
string = '''UK
is an island country located on northwestern coast Europe. English is the main Language of England
. The Capital of Britain
is London'''
#regular expression pattern
pattern = 'UK\n|Britain\n|England\n'
#string to replace
repl = 'United Kingdom'
#replace UK, England and Britain with United Kingdom
replaced_string = re.sub(pattern, repl, string)
print(replaced_string)
Output
United Kingdom is an island country located on northwestern coast Europe. English is the main Language of United Kingdom. The Capital of United Kingdom is London
In the above example, in the string we have a new line after UK , England and Britain . And in the pattern we have specified that we only need to replace those UK\n|Britain\n|England\n with United Kingdom .
Python re.sub() examples
Let’s discuss some more examples of
re.sub()
method.
Example 1: Replace all the whitespaces with underscores
Suppose we have a string
string
and we need to replace all the whitespaces with underscore _. The whitespace in the string can be represented using
\s
escape characters.
import re
string = ' Hello World Welcome to TechgeekBuzz '
#regular expression pattern
pattern = r'\s'
#string to replace
repl = '_'
#replace all whitespace by _
replaced_string = re.sub(pattern, repl, string)
print(replaced_string)
Output
__Hello_World_Welcome_to_TechgeekBuzz__
Example 2: Remove all the whitespaces from a string
To remove all the spaces we can set the pattern value to r '\s' and repl value to ''. But if we want to remove specific spaces we for we have different patterns.
- \s+ pattern for removing single or multiple spaces.
- ^\s+ pattern for removing leading spaces.
- \s+$ pattern for removing trailing spaces.
- ^\s+|\s+$ pattern for removing leading and trailing spaces.
1. Remove all the spaces from the string.
import re
string = ' Hello World Welcome to TechgeekBuzz .'
#regular expression pattern
pattern = r'\s'
#string to replace
repl = ''
#replace all whitespace
replaced_string = re.sub(pattern, repl, string)
print(replaced_string)
Output
HelloWorldWelcometoTechgeekBuzz.
2. Remove the leading whitespaces from a string in Python
import re
string = ' Hello World Welcome to TechgeekBuzz .'
#regular expression pattern
pattern = r'^\s+'
#string to replace
repl = ''
#replace leading whitespace
replaced_string = re.sub(pattern, repl, string)
print(replaced_string)
Output
Hello World Welcome to TechgeekBuzz .
3. Remove all the trailing whitespaces from a string in Python
import re
string = ' Hello World Welcome to TechgeekBuzz '
#regular expression pattern
pattern = r'\s+$'
#string to replace
repl = ''
#replace trailing whitespace
replaced_string = re.sub(pattern, repl, string)
print(f"'{replaced_string}'")
Output
' Hello World Welcome to TechgeekBuzz'
4. Remove the leading and trailing whitespaces from a string in Python
import re
string = ' Hello World Welcome to TechgeekBuzz '
#regular expression pattern
pattern = r'^\s+|\s+$'
#string to replace
repl = ''
#replace leading and trailing whitespace
replaced_string = re.sub(pattern, repl, string)
print(f"'{replaced_string}'")
Output
'Hello World Welcome to TechgeekBuzz'
5. Replace multiple whitespaces with single whitespace using regex
import re
string = ' Hello World Welcome to TechgeekBuzz '
#regular expression pattern
pattern = r'\s+'
#string to replace
repl = ' '
#replace multiple whitespaces with single
replaced_string = re.sub(pattern, repl, string)
print(f"'{replaced_string}'")
Output
' Hello World Welcome to TechgeekBuzz '
How to limit the maximum number of pattern occurrences to be replaced.
The
sub()
method also accepts an optional argument
count
that can limit the number of replacement.
sub(pattern, repl, string,
count =0
)
By default, the value of
count
is
0
, which means it can replace all the occurrence of the pattern in the string. But by setting it to a positive integer value we can limit the replacement numbers.
Example
Replace the first 3 whitespaces with underscores in a string.
import re
string = 'Hello World Welcome to TechgeekBuzz.'
#regular expression pattern
pattern = r'\s+'
#string to replace
repl = '_'
#replace first 3 whitespaces by _
replaced_string = re.sub(pattern, repl, string, count =3)
print(f"'{replaced_string}'")
Output
'Hello_World_Welcome_to TechgeekBuzz.'
Regex Replacement function
By now we were only using the string value for the repl argument. The repl argument can be a string or a function , now let’s see how to use a function as a repl argument.
Example
import re
def digit_to_word(match_obj):
digi_words = {'1': 'one', '2': 'two', '3': 'three', '4': 'four', '5': 'five',
'6': 'six', '7': 'seven', '8': 'eight', '9': 'nine', '10': 'ten'}
digit = match_obj.group()
return digi_words[digit]
string = 'There are 3 red balls, 2 green balls and 5 black balls in the bag'
# regular expression pattern for digits
pattern = r'[0-9]'
# function to call for replacement
repl = digit_to_word
replaced_string = re.sub(pattern, repl, string, count=3)
print(f"'{replaced_string}'")
Output
'There are three red balls, two green balls and five black balls in the bag'
Python re.subn() method
The
re.subn()
method is similar to the
sub()
method. Similar to the sub() method the
subn()
method can also replace the specific regex pattern from a string with the replacement string or function. The only difference between
sub()
and
subn()
is, the
subn()
return a tuple of two values.
- The first value is the new value of the targeted string with replacement.
- And the second value is the number of replacement that has been made on the string.
Example
Let’s see an example where we Uppercase the Capitalize names of the string using the
subn()
method and see the number of replacements applied.
import re
def cap_to_upper(match_obj):
name = match_obj.group()
return name.upper()
string = 'class 10 has 3 toppers Rahul, Jay and Raj
# regular for capitalize words
pattern = r'[A-Z]+[a-z]*'
# function to call for replacement
repl = cap_to_upper
result = re.subn(pattern, repl, string)
new_string = result[0]
changes = result[1]
print("Replaced String: ", new_string)
print('Number of changes: ',changes)
Output
Replaced String: class 10 has 3 toppers RAHUL, JAY and RAJ
Number of changes: 3
Conclusion
In this Python Regular Expression tutorial, you learned how to replace a specific string pattern with a targeted string. To do this, we learned two regex methods
sub()
and
subn()
. The sub() method accept a regular expression pattern, and replace all the matched pattern of the string with the replacement string or function, and return the newly replaced string. The
subn()
method is similar to the
sub()
method, but it returns a tuple containing two items, the new replaced string and the number of replacement made on the string.
People are also reading:
Leave a Comment on this Post