Fundamentals 8 min read

Understanding Greedy and Non‑Greedy Matching in Regular Expressions

This article explains the difference between greedy and non‑greedy (lazy) matching in regular expressions, describes how quantifiers behave by default, shows how to switch to lazy mode using a trailing question mark, and provides multiple Python code examples illustrating both approaches.

Test Development Learning Exchange
Test Development Learning Exchange
Test Development Learning Exchange
Understanding Greedy and Non‑Greedy Matching in Regular Expressions

Regular expressions use quantifiers such as *, +, ?, and {} to specify how many characters to match. By default these quantifiers are greedy, meaning they try to consume as many characters as possible; adding a trailing question mark makes them non‑greedy (lazy), causing them to match the smallest possible portion.

Greedy Matching

In greedy mode the engine expands the match to the longest possible string that still satisfies the pattern. The following Python example demonstrates this behavior when searching for HTML‑like tags.

import re
text = "Here is some text with
and
."
pattern = r'<.*>'
match = re.search(pattern, text)
print(match.group())  # output:
and

The pattern <.*> starts at the first '<' and continues until the last '>', capturing everything in between.

Non‑Greedy (Lazy) Matching

Appending ? after a quantifier forces the engine to stop as soon as the rest of the pattern can be satisfied. The example below extracts each tag individually.

import re
text = "Here is some text with
and
."
pattern = r'<.*?>'
matches = re.findall(pattern, text)
print(matches)  # output: ['
', '
']

Here the engine stops at the first closing '>', returning separate matches for each tag.

Key Points Summary

• Greedy quantifiers (*, +, ?, {n,m}) are the default and match as many characters as possible. • Non‑greedy quantifiers (*?, +?, ??, {n,m}?) match the minimal number of characters needed. • Choosing between them depends on the structure of the data you need to extract.

Additional Illustrative Examples

1. HTML tag matching (greedy vs lazy)

import re
text = "FirstSecond"
pattern_greedy = r'.*'
print(re.findall(pattern_greedy, text))  # ['FirstSecond']
pattern_lazy = r'.*?'
print(re.findall(pattern_lazy, text))  # ['F', 'i', 'r', 's', 't', 'S', 'e', 'c', 'o', 'n', 'd']

2. Matching repeated words

import re
text = "This is a test test sentence."
pattern_greedy = r"(\b\w+\b)\s+\1"
match = re.search(pattern_greedy, text)
print("Greedy:", match.group(0))  # Greedy: test test
pattern_lazy = r"(\b\w+\b)\s+?\1"
match = re.search(pattern_lazy, text)
print("Lazy:", match.group(0))  # Lazy: test test

Both patterns produce the same result here because the whitespace quantifier already matches minimally.

3. Extracting file names from paths

import re
path = "/home/user/documents/report.docx"
pattern_greedy = r".*/(.*)"
match = re.search(pattern_greedy, path)
print("Greedy file name:", match.group(1))  # report.docx
pattern_lazy = r".*?/(.*)"
match = re.search(pattern_lazy, path)
print("Lazy file name:", match.group(1))  # documents/report.docx

The greedy pattern captures everything after the last slash, while the lazy pattern stops at the first slash, demonstrating how the choice of quantifier affects the result.

Conclusion

Greedy matching (*, +, ?, {n,m}) is the default behavior and captures the longest possible substring; non‑greedy matching (*?, +?, ??, {n,m}?) captures the shortest possible substring. Understanding and selecting the appropriate mode allows precise control over pattern extraction in regular expressions.

regular expressionsregexgreedy matchingnon-greedyquantifiers
Test Development Learning Exchange
Written by

Test Development Learning Exchange

Test Development Learning Exchange

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.