Fundamentals 17 min read

Understanding Python's re Module and Regular Expressions

This article introduces Python's re module, explaining regular expression fundamentals, key functions such as match, search, compile, sub, findall, finditer, and split, detailing their syntax, parameters, flags, and providing numerous code examples to illustrate pattern matching, searching, replacing, and splitting strings.

Python Programming Learning Circle

Feb 9, 2020

Understanding Python's re Module and Regular Expressions

Regular expressions are special character sequences that help you conveniently check whether a string matches a certain pattern.

Since Python 1.5, the re module has been added, providing Perl-style regular expression patterns.

The re module gives Python full regular expression capabilities.

The compile function creates a regular expression object from a pattern string and optional flag arguments; the object has methods for matching and substitution.

The re module also offers functions that perform the same operations directly using a pattern string as the first argument.

>> re.match function

re.match

attempts to match a pattern at the beginning of a string; if the match is not at the start, match() returns None. re.match(pattern, string, flags=0) Parameters:

If the match succeeds, re.match returns a match object; otherwise it returns None.

You can use group(num) or groups() on the match object to retrieve the matched expressions.

Example

#!/usr/bin/pythonimport re
print(re.match('www', 'www.runoob.com').span())  # matches at start
print(re.match('com', 'www.runoob.com'))         # does not match at start

Output: (0, 3)None Another example

#!/usr/bin/python3import re
line = "Cats are smarter than dogs"
matchObj = re.match(r'(.*) are (.*?) .*', line, re.M|re.I)
if matchObj:
    print ("matchObj.group() : ", matchObj.group())
    print ("matchObj.group(1) : ", matchObj.group(1))
    print ("matchObj.group(2) : ", matchObj.group(2))
else:
    print ("No match!!")

Output:

matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter

>> re.search method

re.search

scans the entire string and returns the first successful match.

Syntax: re.search(pattern, string, flags=0) Parameters:

If the match succeeds, re.search returns a match object; otherwise None.

You can use group(num) or groups() on the match object to retrieve the matched expressions.

>> Difference between re.match and re.search

re.match

only matches at the start of the string; if the start does not satisfy the pattern, it fails and returns None. re.search scans the whole string until a match is found.

#!/usr/bin/python3 import re
line = "Cats are smarter than dogs";
matchObj = re.match(r'dogs', line, re.M|re.I)
if matchObj:
    print ("match --> matchObj.group() : ", matchObj.group())
else:
    print ("No match!!")
matchObj = re.search(r'dogs', line, re.M|re.I)
if matchObj:
    print ("search --> matchObj.group() : ", matchObj.group())
else:
    print ("No match!!")

Output:

No match!!search --> matchObj.group() :  dogs

>> Search and Replace

Python's re module provides re.sub to replace matched substrings in a string.

Syntax: re.sub(pattern, repl, string, count=0, flags=0) Parameters:

pattern: the regex pattern string.

repl: the replacement string or a function.

string: the original string to be processed.

count: maximum number of replacements (0 means replace all).

flags: optional matching flags.

First three parameters are required; the last two are optional.

#!/usr/bin/python3import re
phone = "2004-959-559 # 这是一个电话号码"
# Remove comment
num = re.sub(r'#.*$', "", phone)
print ("电话号码 : ", num)
# Remove non-digits
num = re.sub(r'\D', "", phone)
print ("电话号码 : ", num)

Output:

电话号码 :  2004-959-559 电话号码 :  2004959559

>> repl parameter can be a function

In the following example, matched numbers are multiplied by 2.

#!/usr/bin/python import re
# Multiply matched numbers by 2
def double(matched):
    value = int(matched.group('value'))
    return str(value * 2)
s = 'A23G4HFD567'
print(re.sub('(?P<value>\d+)', double, s))

Output:

A46G8HFD1134

>> compile function

The compile function compiles a regular expression into a pattern object for use with match() and search().

Syntax: re.compile(pattern[, flags]) Parameters:

pattern: a string representing the regular expression.

flags (optional): matching modes such as case‑insensitive, multiline, etc. Common flags include re.I (ignore case), re.L, re.M, re.S, re.U, re.X.

>>>import re>>> pattern = re.compile(r'\d+')
>>> m = pattern.match('one12twothree34four')        # no match at start
>>> print m
None
>>> m = pattern.match('one12twothree34four', 2, 10) # start at 'e', no match
>>> print m
None
>>> m = pattern.match('one12twothree34four', 3, 10) # start at '1', matches
>>> print m
<_sre.SRE_Match object at 0x10a42aac0>
>>> m.group(0)   # '12'
>>> m.start(0)   # 3
>>> m.end(0)     # 5
>>> m.span(0)    # (3, 5)

When a match succeeds, a Match object is returned, providing: group([group1, …]) – returns the matched substring(s). start([group]) – start index of the matched substring. end([group]) – end index (exclusive) of the matched substring. span([group]) – tuple of (start, end).

Another example:

>>>import re>>> pattern = re.compile(r'([a-z]+) ([a-z]+)', re.I)
>>> m = pattern.match('Hello World Wide Web')
>>> print m
<_sre.SRE_Match object at 0x10bea83e8>
>>> m.group(0)
'Hello World'
>>> m.span(0)
(0, 11)
>>> m.group(1)
'Hello'
>>> m.span(1)
(0, 5)
>>> m.group(2)
'World'
>>> m.span(2)
(6, 11)
>>> m.groups()
('Hello', 'World')
>>> m.group(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: no such group

>> findall

Find all substrings matching the regex in a string and return them as a list; returns an empty list if no matches are found.

Note: match and search perform a single match, while findall finds all.

Syntax: re.findall(string[, pos[, endpos]]) Parameters:

string – the target string.

pos (optional) – start position (default 0).

endpos (optional) – end position (default length of string).

Example – find all numbers:

import re
pattern = re.compile(r'\d+')
result1 = pattern.findall('runoob 123 google 456')
result2 = pattern.findall('run88oob123google456', 0, 10)
print(result1)
print(result2)

Output:

['123', '456']['88', '12']

>> re.finditer

Similar to findall, but returns an iterator yielding match objects. re.finditer(pattern, string, flags=0) Parameters:

Example:

import re
it = re.finditer(r"\d+","12a32bc43jf3")
for match in it:
    print (match.group() )

Output:

12 32 43 3

>> re.split

The split method splits a string by the occurrences of the pattern and returns a list.

Syntax:

re.split(pattern, string[, maxsplit=0, flags=0])

Parameters:

Example:

>>>import re>>> re.split('\W+', 'runoob, runoob, runoob.')
['runoob', 'runoob', 'runoob', '']
>>> re.split('(\W+)', ' runoob, runoob, runoob.')
['', ' ', 'runoob', ', ', 'runoob', ', ', 'runoob', '.', '']
>>> re.split('\W+', ' runoob, runoob, runoob.', 1)
['', 'runoob, runoob, runoob.']
>>> re.split('a*', 'hello world')
['hello world']

>> Regular expression objects

re.compile()

returns a RegexObject. re.MatchObject provides group() to retrieve the matched string. start() – returns the start position of the match. end() – returns the end position of the match. span() – returns a tuple (start, end).

>> Regular expression flags – optional modifiers

Regular expressions can include optional flag modifiers to control matching behavior. Multiple flags can be combined using bitwise OR (|).

For example, re.I | re.M sets both the I and M flags.

>> Regular expression patterns

Pattern strings use special syntax to represent a regular expression.

Letters and digits match themselves. Adding a backslash before them gives them special meaning.

Punctuation characters only match themselves when escaped; otherwise they have special meanings.

A backslash itself must be escaped. Because regular expressions often contain backslashes, it is best to use raw strings (e.g., r'\t').

>> Regular expression examples

Character matching

Character classes

Special character classes

- END -

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

regex text processing string-matching re module

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.