Understanding Regular Expressions: Definitions, Structure, and Practical Java Examples
This article introduces regular expressions, explains their core concepts, character classes, quantifiers, anchors, grouping, and assertions, and demonstrates how to solve common string‑validation, formatting, and placeholder‑replacement problems in Java with clear code examples.
Regular Expressions
Regular expressions (also called regex, regexp, or RE) are textual patterns composed of literal characters and special meta‑characters that describe a set of strings. They are widely used for searching, validating, extracting, and replacing text in many programming languages.
Typical Use Cases
Validate a password to contain upper‑ and lower‑case letters, digits, special characters (!@#¥%^&), and be 6‑12 characters long.
Format a numeric string like 12345678 into a currency style 12,345,678 .
Replace placeholder expressions such as ${...} (similar to ES6 template syntax) with actual values.
Definition
A regular expression is a logical formula for string manipulation that combines ordinary characters (e.g., a‑z ) and meta‑characters (e.g., + , \d ) to define matching rules.
Functions
Match : Check whether a target string conforms to a pattern (e.g., password strength, phone number, URL).
Replace : Substitute parts of a string that match a pattern with another string.
Extract : Capture substrings that satisfy a pattern.
Structure
Regular expressions consist of ordinary characters (digits, letters, symbols) and meta‑characters such as + , ? , \d , \s , etc.
Character Types
Ordinary characters: [ABC] , [^ABC] , [A‑Z] , [a‑d[m‑p]] , [a‑z&&[^bc]]
Non‑printable characters: \cx , \f , \n , \r , \s , \S , \t , \v
Special characters: ^ , $ , \ , * , + , . , ? , [ , { , | , ( )
Quantifiers
* : zero or more
+ : one or more
? : zero or one (or makes preceding token lazy when placed after a quantifier)
{n} : exactly n times
{n,} : at least n times
{n,m} : between n and m times
Anchors
^ : start of the string (or line in multiline mode)
$ : end of the string (or line in multiline mode)
\b : word boundary
\B : non‑word boundary
Greedy vs Lazy Matching
Greedy quantifiers try to consume as many characters as possible while still allowing the overall pattern to match; lazy quantifiers (by appending ? ) consume as few as possible. Example:
源字符串:...
hello
Regex
!
...
贪婪模式:
.*
->
hello
Regex
!
惰性模式:
.*?
->
hello
RegexBacktracking
When a regex engine encounters a quantifier or alternation, it may need to backtrack to previous decision points if a later part of the pattern fails. Excessive backtracking can impact performance.
Grouping, References, and Assertions
Grouping syntax: (...) creates capture groups numbered from left to right; named groups use (? ...) .
Reference syntax: \1 , \2 , etc., to reuse captured text.
Assertions: Positive look‑ahead: (?=pattern) Negative look‑ahead: (?!pattern) Positive look‑behind: (?<=pattern) Negative look‑behind: (?<!pattern)
Java Regex Flags (Modes)
UNIX_LINES : only \n is a line terminator.
CASE_INSENSITIVE ( i ): ignore case.
COMMENTS : whitespace and # comments are ignored.
MULTILINE ( m ): ^ and $ match line boundaries.
LITERAL : treat meta‑characters as literals.
DOTALL ( s ): . matches line terminators.
UNICODE_CASE : case‑insensitive matching for Unicode.
CANON_EQ : enable canonical equivalence.
UNICODE_CHARACTER_CLASS : Unicode-aware predefined character classes.
Meta‑character Reference Table
Character
Description
\
Escapes the next character, creates a back‑reference, or introduces an octal/Unicode escape.
^
Matches start of the input (or line in multiline mode).
$
Matches end of the input (or line in multiline mode).
*
Zero or more of the preceding token.
+
One or more of the preceding token.
?
Zero or one of the preceding token, or makes it lazy.
{n}
Exactly n repetitions.
{n,}
At least n repetitions.
{n,m}
Between n and m repetitions.
.
Any character except line terminators (use
[.\n]to include them).
(pattern)
Capturing group.
(?:pattern)
Non‑capturing group.
(?=pattern)
Positive look‑ahead.
(?!pattern)
Negative look‑ahead.
(?<=pattern)
Positive look‑behind.
(?<!pattern)
Negative look‑behind.
x|y
Alternation (match x or y).
[xyz]
Character class.
[^xyz]
Negated character class.
\d
Digit (equivalent to
[0-9]).
\w
Word character (letters, digits, underscore).
\s
Whitespace character.
Additional rows omitted for brevity
Practical Java Examples
Problem 1 – Password validation using look‑aheads:
@Test
public void checkPassword(){
String password = "aaa123@Z";
Pattern compile = Pattern.compile("(?=.*\\d+)(?=.*[a-z]+)(?=.*[A-Z]+)(?=.*[!@#$%^&]+)[a-zA-Z\\d!@#$%^&]{6,12}");
log.info(">>> {}", compile.matcher(password).matches());
}Problem 2 – Insert commas into a numeric string:
@Test
public void scientific(){
String number = "123456789";
String result = number.replaceAll("(?=\\B(\\d{3})+$)", ",");
log.info(">> {}", result);
}Problem 3 – Replace placeholders using groups and a context map:
@Test
public void replaceHolder(){
Map
context = new HashMap<>();
context.put("company","north");
context.put("project","blob");
context.put("model","regex");
String packages = "com.{company}.{project}.{model}.*";
Pattern pattern = Pattern.compile("(\\{[^}]*\\})");
Matcher matcher = pattern.matcher(packages);
StringBuffer result = new StringBuffer();
while(matcher.find()){
String group = matcher.group();
String key = group.substring(1, group.length()-1);
matcher.appendReplacement(result, context.getOrDefault(key, ""));
}
matcher.appendTail(result);
log.info(result.toString());
}Tools
For a comprehensive list of regex operators, see Runoob Regex Operator Reference .
Extended Knowledge
Regular expression engines are typically based on NFA (non‑deterministic finite automaton) or DFA (deterministic finite automaton) implementations, each with different performance characteristics.
Conclusion
Regular expressions provide a concise yet powerful way to describe and manipulate text. Mastering their syntax, meta‑characters, quantifiers, and engine behavior is essential for developers across all programming domains.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.