Fundamentals 8 min read

A Comprehensive Guide to Using Regular Expressions in Python

This article introduces Python's built‑in re module, explains how to import it, craft raw‑string patterns, and demonstrates common functions such as findall, match, search, sub, split, as well as compiling patterns, using match objects, flags, meta‑characters, and handling Unicode encoding for robust text processing.

Test Development Learning Exchange
Test Development Learning Exchange
Test Development Learning Exchange
A Comprehensive Guide to Using Regular Expressions in Python

In Python, regular expressions (regex) are handled through the built‑in re module, which provides functions for pattern creation, compilation, and various string operations such as matching, searching, replacing, and splitting.

1. Import the re module

import re

2. Write a regex pattern

email_pattern = r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"

Using a raw string (prefix r ) prevents backslash escaping issues.

3. Common re functions

findall() – returns all non‑overlapping matches as a list.

matches = re.findall(email_pattern, "Contact us at [email protected] or [email protected]")
print(matches)  # Output: ['[email protected]', '[email protected]']

match() – attempts to match a pattern at the start of a string.

result = re.match(r"Hello", "Hello world!")
if result:
    print("Match found:", result.group())
else:
    print("No match")

search() – scans the entire string and returns the first match.

result = re.search(r"world", "Hello world!")
if result:
    print("Found:", result.group())  # Output: Found: world
else:
    print("Not found")

sub() – replaces matched substrings with a new string.

new_text = re.sub(r"\d+", "number", "There are 123 apples and 456 oranges.")
print(new_text)  # Output: There are number apples and number oranges.

split() – splits a string by the pattern and returns a list.

words = re.split(r"\W+", "Hello, how are you?")
print(words)  # Output: ['Hello', 'how', 'are', 'you', '']

4. Compile a regex for performance

compiled_pattern = re.compile(email_pattern)
matches = compiled_pattern.findall("Contact us at [email protected] or [email protected]")
print(matches)  # Output: ['[email protected]', '[email protected]']

5. Using match objects for more information

result = re.search(r"(\w+) (\w+)", "John Doe")
if result:
    print("Full name:", result.group())          # Output: Full name: John Doe
    print("First name:", result.group(1))        # Output: First name: John
    print("Last name:", result.group(2))         # Output: Last name: Doe

6. Flags to modify regex behavior

case_insensitive_match = re.search("hello", "Hello World!", flags=re.IGNORECASE)
if case_insensitive_match:
    print("Case‑insensitive match found!")

Common flags include re.IGNORECASE (or re.I ), re.MULTILINE ( re.M ), and re.DOTALL ( re.S ).

7. Regex meta‑characters and special sequences

Understanding symbols such as ^ , $ , . , * , + , ? , {m,n} , [] , () , | and sequences like \d , \s , \w is essential for building effective patterns.

8. Handling Chinese characters and file encoding

Save Python scripts in UTF‑8 encoding, ensure the terminal supports UTF‑8, and optionally add # -*- coding: utf-8 -*- at the top of the file to avoid character display issues.

regular expressionsregexcoding tutorialre modulestring-manipulation
Test Development Learning Exchange
Written by

Test Development Learning Exchange

Test Development Learning Exchange

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.