Fundamentals 14 min read

Python Regular Expressions: Syntax, Character Classes, Quantifiers, Groups, Assertions, Flags, and Usage

This article provides a comprehensive overview of Python's regular expression features, covering special characters, character classes, quantifiers, grouping, backreferences, assertions, conditional matching, flags, and the most commonly used methods of the re module with illustrative code examples.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Python Regular Expressions: Syntax, Character Classes, Quantifiers, Groups, Assertions, Flags, and Usage

This article provides a comprehensive overview of Python's regular expression features, covering special characters, character classes, quantifiers, grouping, backreferences, assertions, conditional matching, flags, and the most commonly used methods of the re module with illustrative code examples.

1. Regular Expression Syntax

Special characters : \.^$?+*{}[]()| . To use them literally, escape with a backslash.

Character classes (enclosed in [] ) match any one of the characters inside; ranges like [a-zA-Z0-9] represent a‑z, A‑Z, 0‑9. A leading ^ inside the class negates it (e.g., [^0-9] ). Inside a class, most special characters lose their special meaning except \ , ^ (first position) and - (range).

Shorthand notations such as \d , \s , \w are also allowed.

2. Quantifiers

? – 0 or 1 occurrence

* – 0 or more occurrences

+ – 1 or more occurrences

{m} – exactly m occurrences

{m,} – at least m occurrences

{,n} – at most n occurrences

{m,n} – between m and n occurrences (inclusive)

All quantifiers are greedy by default; appending ? makes them non‑greedy.

3. Groups and Capturing

Parentheses ( ) capture the matched sub‑expression for later use; non‑capturing groups use (?:...) .

Captured groups can be referenced by number (e.g., \1 ) or by name using (?P<name>...) and (?P=name) .

Note: Backreferences cannot be used inside a character class.

4. Assertions and Anchors

\b – word boundary (outside [] ), \B – non‑word boundary

\A – start of string, ^ – start of line (with re.MULTILINE )

\Z – end of string, $ – end of line (with re.MULTILINE )

Positive lookahead (?=e) , negative lookahead (?!e)

Positive lookbehind (?<=e) , negative lookbehind (?<!e)

5. Conditional Matching

Syntax: (?(id)yes_exp|no_exp) – if group id has matched, use yes_exp , otherwise no_exp .

6. Regular Expression Flags

Pass flags to re.compile() using bitwise OR, e.g., re.compile(r"pattern", re.IGNORECASE|re.MULTILINE) .

Or embed flags inline with (?flags) , e.g., (?ms)pattern .

Common flags: re.A / re.ASCII , re.I / re.IGNORECASE , re.M / re.MULTILINE , re.S / re.DOTALL , re.X / re.VERBOSE .

re.compile(r"""
    <img\s+>   # start of tag
    [^>]*?    # attributes except src
    src=      # start of src attribute
    (?:
        (?P<quote>[\"'])   # opening quote
        (?P<image_name>[^\1>]+?)   # image name
        (?P=quote)            # closing quote
    )
""", re.VERBOSE|re.IGNORECASE)

7. Using the re Module in Python

Four main tasks:

Match – test whether a string conforms to a pattern.

Search – locate substrings that match.

Replace – substitute matching parts with new text.

Split – divide a string using a pattern.

Two ways to use the module:

Compile a pattern with re.compile(pattern, flags) to obtain a regex object and call its methods repeatedly.

Use module‑level functions like re.search() , re.sub() directly; suitable for one‑off use.

Common regex object methods:

rx.findall(s, start, end) – returns a list of all matches (or tuples if groups are present).

rx.finditer(s, start, end) – returns an iterator of match objects.

rx.search(s, start, end) – returns the first match object or None .

rx.match(s, start, end) – matches only at the start of the string.

rx.sub(repl, s, count) – returns a new string with replacements; repl can be a function.

rx.subn(repl, s, count) – like sub but also returns the number of substitutions.

rx.split(s, maxsplit) – splits the string; captured groups appear in the result list.

rx = re.compile(r"(\d)[a-z]+(\d)")
    s = "ab12dk3klj8jk9jks5"
    result = rx.split(s)

Result: ['ab1', '2', '3', 'klj', '8', '9', 'jks5']

Additional regex object attributes and methods include rx.flags() , rx.pattern() , and match object methods such as m.group() , m.groups() , m.start() , m.end() , m.span() , m.re() , m.string() , m.pos() , and m.endpos() .

8. Summary

Python does not provide a direct true/false match result; instead, check whether match() or search() returns None .

Use search() or match() for single‑time searches; use finditer() for multiple matches.

Use sub() or subn() for replacements; the module‑level sub() can accept a callable for dynamic replacement.

Use split() for dividing strings; if the pattern contains capturing groups, the captured text is included in the output list.

Pythonregular expressionsRegexPattern matchingquantifiersre moduleAssertionsstring processing
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.