Mastering Regular Expressions: Essential Rules and Advanced Techniques
This comprehensive guide explains what regular expressions are, outlines basic syntax, character classes, quantifiers, anchors, grouping, backreferences, lookahead/lookbehind assertions, and advanced options, providing practical examples to help developers validate, search, and manipulate strings effectively.
What Is a Regular Expression
A regular expression uses a string pattern to describe a characteristic and then tests whether another string matches that characteristic, such as
s.match("a"). It can validate strings, search within text, and perform flexible replacements.
Basic Rules
Literal Characters
Letters, digits, Chinese characters, underscores, and punctuation that have no special meaning match themselves; e.g., the pattern
amatches the first "a" in "abcde".
Escape Characters
Special characters are escaped with a backslash, e.g.,
\r(carriage return),
\n(newline),
\t(tab),
\\(a literal backslash). Other symbols like
\^,
\$,
\.also need escaping.
Character Classes
\d: any digit (0‑9)
\w: any word character (letters, digits, underscore)
\s: any whitespace character
.: any character except a newline
Custom classes can be defined with brackets, e.g.,
[123]matches "1", "2" or "3";
[^abc]matches any character except "a", "b", or "c".
Quantifiers
{n}: exactly n repetitions
{m,n}: between m and n repetitions
{m,}: at least m repetitions
?: 0 or 1 time
+: 1 or more times
*: 0 or more times
Special Symbols
^: start of a string (or line in multiline mode)
$: end of a string (or line in multiline mode)
\b: word boundary
|: alternation (OR)
( ): grouping, also captures matched substrings
Advanced Rules
Greedy vs. Lazy Matching
Quantifiers are greedy by default, matching as much as possible. Adding
?after a quantifier makes it lazy, matching as little as needed.
Backreferences
Parenthesized sub‑expressions are stored and can be referenced later with
\1,
\2, etc., enabling patterns like
(['"]).*?\1to match quoted strings.
Lookahead and Lookbehind
Positive lookahead
(?=pattern)asserts that
patternfollows without consuming characters; negative lookahead
(?!pattern)asserts that it does not. Similarly,
(?<=pattern)and
(?<!pattern)are lookbehind assertions.
Tips
Use
^and
$to anchor a pattern to the whole string.
Use
\bto match whole words.
Avoid patterns that can match an empty string to prevent infinite loops.
Ensure alternation operators
|are placed so only one side can match a given character.
Choose greedy or lazy quantifiers appropriately for the desired match.
Source: Backend Technology Talk, author: 飒然Hang
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.