Fundamentals 21 min read

Understanding Regular Expressions: Definitions, Structure, and Practical Java Examples

This article introduces regular expressions, explains their core concepts, character classes, quantifiers, anchors, grouping, and assertions, and demonstrates how to solve common string‑validation, formatting, and placeholder‑replacement problems in Java with clear code examples.

IT Services Circle

Jan 30, 2023

Regular Expressions

Regular expressions (also called regex, regexp, or RE) are textual patterns composed of literal characters and special meta‑characters that describe a set of strings. They are widely used for searching, validating, extracting, and replacing text in many programming languages.

Typical Use Cases

Validate a password to contain upper‑ and lower‑case letters, digits, special characters (!@#￥%^&), and be 6‑12 characters long.

Format a numeric string like 12345678 into a currency style 12,345,678.

Replace placeholder expressions such as ${...} (similar to ES6 template syntax) with actual values.

Definition

A regular expression is a logical formula for string manipulation that combines ordinary characters (e.g., a‑z) and meta‑characters (e.g., +, \d) to define matching rules.

Functions

Match : Check whether a target string conforms to a pattern (e.g., password strength, phone number, URL).

Replace : Substitute parts of a string that match a pattern with another string.

Extract : Capture substrings that satisfy a pattern.

Structure

Regular expressions consist of ordinary characters (digits, letters, symbols) and meta‑characters such as +, ?, \d, \s, etc.

Character Types

Ordinary characters: [ABC], [^ABC], [A‑Z], [a‑d[m‑p]], [a‑z&&[^bc]] Non‑printable characters: \cx, \f, \n, \r, \s, \S, \t, \v Special characters: ^, $, \, *, +, ., ?, [, {, |,

( )

Quantifiers

: zero or more +: one or more ?: zero or one (or makes preceding token lazy when placed after a quantifier) {n}: exactly n times {n,}: at least n times {n,m}: between n and m times

Anchors

: start of the string (or line in multiline mode) $: end of the string (or line in multiline mode) \b: word boundary \B: non‑word boundary

Greedy vs Lazy Matching

Greedy quantifiers try to consume as many characters as possible while still allowing the overall pattern to match; lazy quantifiers (by appending ?) consume as few as possible. Example:

源字符串：...<div>hello <div>Regex</div> !</div>...
贪婪模式：<div>.*</div>  -> <div>hello <div>Regex</div> !</div>
惰性模式：<div>.*?</div> -> <div>hello <div>Regex</div>

Backtracking

When a regex engine encounters a quantifier or alternation, it may need to backtrack to previous decision points if a later part of the pattern fails. Excessive backtracking can impact performance.

Grouping, References, and Assertions

Grouping syntax: (...) creates capture groups numbered from left to right; named groups use (?<name>...).

Reference syntax: \1, \2, etc., to reuse captured text.

Assertions:

Positive look‑ahead: (?=pattern) Negative look‑ahead: (?!pattern) Positive look‑behind: (?<=pattern) Negative look‑behind:

(?<!pattern)

Java Regex Flags (Modes)

UNIX_LINES

: only \n is a line terminator. CASE_INSENSITIVE ( i): ignore case. COMMENTS: whitespace and # comments are ignored. MULTILINE ( m): ^ and $ match line boundaries. LITERAL: treat meta‑characters as literals. DOTALL ( s): . matches line terminators. UNICODE_CASE: case‑insensitive matching for Unicode. CANON_EQ: enable canonical equivalence. UNICODE_CHARACTER_CLASS: Unicode-aware predefined character classes.

Meta‑character Reference Table

Character

Description

Escapes the next character, creates a back‑reference, or introduces an octal/Unicode escape.

Matches start of the input (or line in multiline mode).

Matches end of the input (or line in multiline mode).

Zero or more of the preceding token.

One or more of the preceding token.

Zero or one of the preceding token, or makes it lazy.

{n}

Exactly n repetitions.

{n,}

At least n repetitions.

{n,m}

Between n and m repetitions.

Any character except line terminators (use [.\n] to include them).

(pattern)

Capturing group.

(?:pattern)

Non‑capturing group.

(?=pattern)

Positive look‑ahead.

(?!pattern)

Negative look‑ahead.

(?<=pattern)

Positive look‑behind.

(?<!pattern)

Negative look‑behind.

x|y

Alternation (match x or y).

[xyz]

Character class.

[^xyz]

Negated character class.

Digit (equivalent to [0-9]).

Word character (letters, digits, underscore).

Whitespace character.

Practical Java Examples

Problem 1 – Password validation using look‑aheads:

@Test
public void checkPassword(){
    String password = "aaa123@Z";
    Pattern compile = Pattern.compile("(?=.*\\d+)(?=.*[a-z]+)(?=.*[A-Z]+)(?=.*[!@#$%^&]+)[a-zA-Z\\d!@#$%^&]{6,12}");
    log.info(">>> {}", compile.matcher(password).matches());
}

Problem 2 – Insert commas into a numeric string:

@Test
public void scientific(){
    String number = "123456789";
    String result = number.replaceAll("(?=\\B(\\d{3})+$)", ",");
    log.info(">> {}", result);
}

Problem 3 – Replace placeholders using groups and a context map:

@Test
public void replaceHolder(){
    Map<String,String> context = new HashMap<>();
    context.put("company","north");
    context.put("project","blob");
    context.put("model","regex");
    String packages = "com.{company}.{project}.{model}.*";
    Pattern pattern = Pattern.compile("(\\{[^}]*\\})");
    Matcher matcher = pattern.matcher(packages);
    StringBuffer result = new StringBuffer();
    while(matcher.find()){
        String group = matcher.group();
        String key = group.substring(1, group.length()-1);
        matcher.appendReplacement(result, context.getOrDefault(key, ""));
    }
    matcher.appendTail(result);
    log.info(result.toString());
}

Tools

For a comprehensive list of regex operators, see Runoob Regex Operator Reference .

Extended Knowledge

Regular expression engines are typically based on NFA (non‑deterministic finite automaton) or DFA (deterministic finite automaton) implementations, each with different performance characteristics.

Conclusion

Regular expressions provide a concise yet powerful way to describe and manipulate text. Mastering their syntax, meta‑characters, quantifiers, and engine behavior is essential for developers across all programming domains.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Code Examples regular expressions regex pattern matching string-validation

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.