Fundamentals 11 min read

Mastering Regular Expressions: Essential Rules and Advanced Techniques

This comprehensive guide explains what regular expressions are, outlines basic syntax, character classes, quantifiers, anchors, grouping, backreferences, lookahead/lookbehind assertions, and advanced options, providing practical examples to help developers validate, search, and manipulate strings effectively.

Efficient Ops
Efficient Ops
Efficient Ops
Mastering Regular Expressions: Essential Rules and Advanced Techniques

What Is a Regular Expression

A regular expression uses a string pattern to describe a characteristic and then tests whether another string matches that characteristic, such as

s.match("a")

. It can validate strings, search within text, and perform flexible replacements.

Basic Rules

Literal Characters

Letters, digits, Chinese characters, underscores, and punctuation that have no special meaning match themselves; e.g., the pattern

a

matches the first "a" in "abcde".

Escape Characters

Special characters are escaped with a backslash, e.g.,

\r

(carriage return),

\n

(newline),

\t

(tab),

\\

(a literal backslash). Other symbols like

\^

,

\$

,

\.

also need escaping.

Character Classes

\d

: any digit (0‑9)

\w

: any word character (letters, digits, underscore)

\s

: any whitespace character

.

: any character except a newline

Custom classes can be defined with brackets, e.g.,

[123]

matches "1", "2" or "3";

[^abc]

matches any character except "a", "b", or "c".

Quantifiers

{n}

: exactly n repetitions

{m,n}

: between m and n repetitions

{m,}

: at least m repetitions

?

: 0 or 1 time

+

: 1 or more times

*

: 0 or more times

Special Symbols

^

: start of a string (or line in multiline mode)

$

: end of a string (or line in multiline mode)

\b

: word boundary

|

: alternation (OR)

( )

: grouping, also captures matched substrings

Advanced Rules

Greedy vs. Lazy Matching

Quantifiers are greedy by default, matching as much as possible. Adding

?

after a quantifier makes it lazy, matching as little as needed.

Backreferences

Parenthesized sub‑expressions are stored and can be referenced later with

\1

,

\2

, etc., enabling patterns like

(['"]).*?\1

to match quoted strings.

Lookahead and Lookbehind

Positive lookahead

(?=pattern)

asserts that

pattern

follows without consuming characters; negative lookahead

(?!pattern)

asserts that it does not. Similarly,

(?<=pattern)

and

(?<!pattern)

are lookbehind assertions.

Tips

Use

^

and

$

to anchor a pattern to the whole string.

Use

\b

to match whole words.

Avoid patterns that can match an empty string to prevent infinite loops.

Ensure alternation operators

|

are placed so only one side can match a given character.

Choose greedy or lazy quantifiers appropriately for the desired match.

Source: Backend Technology Talk, author: 飒然Hang
regular expressionsprogramming fundamentalsRegexPattern Matchingstring validation
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.