Fundamentals 13 min read

Master Python’s re Module: Essential Regex Techniques Explained

This article provides a comprehensive guide to Python’s re module, covering regex definitions, common methods, special character sets, pattern‑matching functions, match object attributes, and practical code examples for tasks such as validating phone numbers, IP addresses, and HTML snippets.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Master Python’s re Module: Essential Regex Techniques Explained

re Regex Handling

Regex Definition

Regular expressions are logical formulas for operating on strings; they consist of predefined special characters and their combinations to form a rule string that expresses filtering logic for text.

Common regex methods

re.compile – compile a pattern into a regex object

pattern.match – match from the start of a string

pattern.search – find the first match anywhere

pattern.findall – return all matches

pattern.sub – replace matches

Special character set

Key metacharacters include

.

(any character except newline),

^

(start of string),

$

(end of string),

*

(zero or more repetitions, greedy),

+

(one or more repetitions),

?

(zero or one),

{m}

(exactly m repetitions),

{m,n}

(between m and n repetitions), and their non‑greedy variants

*?

,

+?

,

{m,n}?

. Escape sequences such as

\d

(digits),

\w

(word characters),

\s

(whitespace), and assertions like

\b

(word boundary) are also essential.

Regex methods

re.compile(pattern, flags=0)

<code>>> comp = re.compile(r'\d+')
>>> ret = comp.match('123456')
>>> ret.group()
'123456'</code>

Equivalent to:

<code>>> ret = re.match(r'\d+', '123456')</code>

re.search(pattern, string, flags=0) Finds the first location where the pattern matches and returns a match object.

re.match(pattern, string, flags=0) Matches only at the beginning of the string.

re.fullmatch(pattern, string, flags=0) Matches the entire string.

re.split(pattern, string, maxsplit=0, flags=0)

<code>>> re.split(r'\W+', 'Words words wordS')
['Words', 'words', 'wordS']
>>> re.split(r'\W+', 'Words words wordS', 1)
['Words', 'words wordS']
>>> re.split(r'\d+', '1q2W3e4R', flags=re.IGNORECASE)
['', 'q', 'W', 'e', 'R']</code>

re.findall(pattern, string, flags=0)

<code>>> re.findall(r'\d+', '123,456')
['123', '456']
>>> re.findall(r'(\d+)(\w+)', '123qw,werrc')
[('123', 'qw')]
>>> re.findall(r'(\d+)|(\w+)', '123qw,werrc')
[('123', ''), ('', 'qw'), ('', 'werrc')]</code>

re.finditer(pattern, string, flags=0)

<code>>> for i in re.finditer(r'\d+', '123456'):
    print(i.group())
123456</code>

re.sub(pattern, repl, string, count=0, flags=0)

<code>>> re.sub(r'(\d+) (\w+)', r'\2 \1', '12345 asdfd')
'asdfd 12345'</code>

If repl is a function, it receives a match object.

<code>>> def mat(m):
    if m.group(2) == '1234':
        return m.group(1)
    else:
        return '1234'
>>> re.sub(r'(\d+) (\d+)', mat, '123 1234qer')
'123qer'</code>

re.subn(pattern, repl, string, count=0, flags=0)

<code>>> re.subn(r'(\d+) (\d+)', mat, 'as123 1234qer')
('as123qer', 1)</code>

Match object

match.group([group1, …]) – returns the matched subgroup(s).

match.groups(default=None) – returns a tuple of all subgroups.

match.groupdict(default=None) – returns a dict of named groups.

match.start([group]) / match.end([group]) – start and end indices of a group.

match.span([group]) – tuple of (start, end) indices.

match.lastindex – index of the last matched group.

match.lastgroup – name of the last matched named group.

Simple examples

Match characters following "123":

<code>>> re.search(r'(?<=123)\w+', '123asd,wer').group(0)
'asd'</code>

Match characters after "123" and before "_":

<code>>> re.search(r'(?<=123)\w+(?=_)', '123asd_123wer').group(0)
'asd'</code>

Match mobile numbers:

<code>>> re.match(r'1[3,5,7,8]\d{9}|', '13573528479').group()
'13573528479'</code>

Match telephone numbers:

<code>>> re.match(r'\d{3}-\d{8}|\d{4}-\d{7}', '0531-82866666').group()
'0531-8286666'</code>

Match IP addresses:

<code>>> re.match(r'\d+\.\d+\.\d+\.\d+', '192.168.10.25').group()
'192.168.10.25'</code>

Match NetEase email addresses:

<code>>> re.findall(r'\w+@163\.com|\w+@126\.com', '[email protected] [email protected]')
['[email protected]', '[email protected]']</code>

Match HTML text:

<code>>> re.match(r'<(\w*)><(\w*)>.*</\2></\1>', '<body><h2>wahaha5354</h2></body>').group()
'<body><h2>wahaha5354</h2></body>'</code>
PythonregexPattern Matchingre modulestring processing
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.