Fundamentals 21 min read

What Is a Regular Expression? A Comprehensive Guide

This article explains regular expressions, covering their definition, basic matching, metacharacters, character sets, quantifiers, grouping, alternation, escaping, anchors, zero‑width assertions, flags, and greedy versus lazy matching, with clear examples and online practice links.

DevOps Engineer
DevOps Engineer
DevOps Engineer
What Is a Regular Expression? A Comprehensive Guide

What Is a Regular Expression?

A regular expression is a special text composed of letters and symbols that can be used to find sentences in a text that match a desired format.

A regular expression is a pattern that matches a target string from left to right. The term "Regular expression" is often shortened to "regex" or "regexp". Regular expressions can replace, validate, extract, and more.

Imagine you are writing an application and want to enforce a username rule allowing letters, numbers, underscores, and hyphens with length constraints. The following regex validates such usernames:

The regex accepts john_doe , jo-hn_doe , john12_as but does not match Jo because it contains an uppercase letter and is too short.

Table of Contents

1. Basic Matching

2. Metacharacters 2.8.1 ^ 2.8.2 $ 2.3.1 * 2.3.2 + 2.3.3 ? 2.2.1 Negated Character Set 2.1 Dot Operator . 2.2 Character Set 2.3 Quantifier 2.4 {} Quantifier 2.5 (...) Group 2.6 | Alternation 2.7 Escape Special Characters 2.8 Anchors 3. Shorthand Character Sets 4. Zero‑Width Assertions (Lookahead/Lookbehind) 5. Flags Additional Notes Contributions License

1. Basic Matching

The regex the matches the literal string "the". Example:

'"the" => The fat cat sat on
the
mat.'

Online practice: https://regex101.com/r/dmRygT/1

Regex 123 matches the string "123" character by character.

Regex is case‑sensitive, so The does not match the .

'"The" =>
The
fat cat sat on the mat.'

Online practice: https://regex101.com/r/1paXsy/1

2. Metacharacters

Metacharacters have special meanings. The table below lists common ones:

Metacharacter

Description

.

Matches any single character except newline.

[ ]

Character class, matches any character inside the brackets.

[^ ]

Negated character class, matches any character not inside the brackets.

*

Matches zero or more of the preceding element.

+

Matches one or more of the preceding element.

?

Marks the preceding element as optional (0 or 1).

{n,m}

Matches the preceding element at least n and at most m times.

(xyz)

Capturing group matching exactly "xyz".

|

Alternation operator, matches either side.

\

Escape character for special symbols.

^

Matches start of a line.

$

Matches end of a line.

2.1 Dot Operator

The dot . matches any single character except newline. Example: .ar matches "car", "par", etc.

'".ar" => The
car
par
ked in the
gar
age.'

Online practice: https://regex101.com/r/xc9GkU/1

2.2 Character Set

Square brackets define a character set. Example: [Tt]he matches "The" and "the".

'"[Tt]he" =>
The
car parked in
the
garage.'

Online practice: https://regex101.com/r/2ITLQ4/1

2.2.1 Negated Character Set

When ^ appears at the start of a character set, it negates the set. Example: [^c]ar matches "ar" preceded by any character except "c".

'"[^c]ar" => The car
par
ked in the
gar
age.'

Online practice: https://regex101.com/r/nNNlq3/1

2.3 Quantifiers

The symbols *, +, and ? specify how many times the preceding element may occur.

2.3.1 *

* matches zero or more occurrences. Example: a* matches "" or "aaa".

'"[a-z]*" => The car parked in the garage #21.'

Online practice: https://regex101.com/r/Dzf9Aa/1

2.3.2 +

+ matches one or more occurrences. Example: c.+t matches strings starting with "c" and ending with "t" with at least one character in between.

'"c.+t" => The fat cat sat on the mat.'

Online practice: https://regex101.com/r/Dzf9Aa/1

2.3.3 ?

? makes the preceding element optional. Example: [T]?he matches "he" and "The".

'"[T]?he" => The car is parked in the garage.'

Online practice: https://regex101.com/r/cIg9zm/1

2.4 {} Quantifier

{n,m} limits the number of repetitions. Example: [0-9]{2,3} matches two to three digits.

'"[0-9]{2,3}" => The number was 9.
999
7 but we rounded it off to
10
.0.'

Online practice: https://regex101.com/r/juM86s/1

2.5 (...) Group

Parentheses group sub‑patterns. Example: (ab)* matches zero or more repetitions of "ab".

'"(ab)*" => The string "" (empty) or "abab" etc.'

Alternation inside a group: (c|g|p)ar matches "car", "gar", or "par".

'"(c|g|p)ar" => The
car
is
par
ked in the
gar
age.'

2.6 | Alternation

The pipe | acts as logical OR. Example: (T|t)he|car matches "The", "the", or "car".

'"(T|t)he|car" => The car is parked in the garage.'

2.7 Escape Special Characters

Backslash \ escapes special characters so they are treated literally, e.g., \. matches a literal dot.

'"(f|c|m)at\.?" => The fat cat sat on the mat.'

2.8 Anchors

^ matches the start of a string, $ matches the end.

2.8.1 ^

Using ^ ensures the pattern matches at the beginning of the string.

2.8.2 $

Using $ ensures the pattern matches at the end of the string.

3. Shorthand Character Sets

Shorthand

Description

.

Any character except newline.

\w

Word characters (letters, digits, underscore).

\W

Non‑word characters.

\d

Digits.

\D

Non‑digits.

\s

Whitespace characters.

\S

Non‑whitespace characters.

\f

Form feed.

\n

Newline.

\r

Carriage return.

\t

Tab.

\v

Vertical tab.

\p

CR/LF (DOS line ending).

4. Zero‑Width Assertions (Lookahead/Lookbehind)

Lookaheads and lookbehinds are non‑capturing groups that assert a condition without consuming characters.

4.1 Positive Lookahead (?=...)

Ensures that the following pattern exists. Example: (T|t)he(?=\sfat) matches "The" or "the" only when followed by a space and "fat".

'"(T|t)he(?=\sfat)" => The fat cat sat on the mat.'

4.2 Negative Lookahead (?!...)

Ensures that the following pattern does NOT exist. Example: (T|t)he(?!\sfat) matches "The" or "the" when not followed by " fat".

'"(T|t)he(?!\sfat)" => The fat cat sat on the mat.'

4.3 Positive Lookbehind (?<=...)

Matches only when preceded by a certain pattern. Example: (?<=(T|t)he\s)(fat|mat) matches "fat" or "mat" when preceded by "The " or "the ".

'"(?<=(T|t)he\s)(fat|mat)" => The
fat
cat sat on the
mat
.'

4.4 Negative Lookbehind (?<!...)

Matches only when NOT preceded by a certain pattern. Example: (?<!(T|t)he\s)(cat) matches "cat" not preceded by "The " or "the ".

'"(?<!(T|t)he\s)(cat)" => The cat sat on cat.'

5. Flags

Flags modify how the regex engine processes the pattern.

Flag

Description

i

Case‑insensitive matching.

g

Global search (find all matches).

m

Multiline mode; ^ and $ match start/end of each line.

5.1 Case Insensitive (i)

Using the i flag makes the pattern ignore case, e.g., /The/gi matches "The" and "the" globally.

5.2 Global Search (g)

The g flag returns all matches, not just the first.

5.3 Multiline (m)

The m flag makes ^ and $ apply to the start and end of each line.

6. Greedy vs Lazy Matching

By default regex is greedy, matching the longest possible string. Adding ? makes it lazy (minimal match).

'"(.*at)" => The fat cat sat on the mat.'
'"(.*?at)" => The fat cat sat on the mat.'

Online practice links are provided throughout the article.

License

MIT © Zeeshan Ahmad

regular expressionsprogramming fundamentalsRegexPattern Matchingstring validation
DevOps Engineer
Written by

DevOps Engineer

DevOps engineer, Pythonista and FOSS contributor. Created cpp-linter, commit-check, etc.; contributed to PyPA.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.