Introduction to Regular Expressions
Regular expressions (regex) are powerful patterns used for matching and manipulating text. They're essential for validation, searching, and text processing in programming.
Basic Syntax
Understanding the fundamental building blocks:
Literal Characters
abc- Matches the exact string "abc"123- Matches the exact string "123"
Metacharacters
.- Matches any single character^- Matches start of string$- Matches end of string*- Matches 0 or more times+- Matches 1 or more times?- Matches 0 or 1 time|- OR operator()- Grouping[]- Character class\- Escape character
Character Classes
Match specific sets of characters:
[abc] // Matches a, b, or c
[a-z] // Matches any lowercase letter
[A-Z] // Matches any uppercase letter
[0-9] // Matches any digit
[a-zA-Z] // Matches any letter
[^abc] // Matches anything except a, b, or c
Predefined Character Classes
Shortcuts for common patterns:
\d- Digit [0-9]\D- Non-digit [^0-9]\w- Word character [a-zA-Z0-9_]\W- Non-word character\s- Whitespace (space, tab, newline)\S- Non-whitespace
Quantifiers
Specify how many times a pattern should match:
a* // 0 or more 'a'
a+ // 1 or more 'a'
a? // 0 or 1 'a'
a{3} // Exactly 3 'a'
a{3,} // 3 or more 'a'
a{3,5} // Between 3 and 5 'a'
Greedy vs Lazy
.* // Greedy - matches as much as possible
.*? // Lazy - matches as little as possible
.+? // Lazy one or more
.{3,5}? // Lazy quantifier
Anchors and Boundaries
Match positions rather than characters:
^- Start of string/line$- End of string/line\b- Word boundary\B- Non-word boundary
^Hello // Matches "Hello" at start
World$ // Matches "World" at end
\bcat\b // Matches "cat" as whole word
\Bcat\B // Matches "cat" within a word
Groups and Capturing
Group patterns and capture matched text:
(abc) // Capturing group
(?:abc) // Non-capturing group
(a|b|c) // Alternation in group
(ab)+ // Group with quantifier
Backreferences
(\w+)\s\1 // Matches repeated words: "hello hello"
(["']).*?\1 // Matches quoted strings
Common Patterns
Practical regex patterns for everyday use:
Email Validation
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Phone Number (US)
^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$
URL
^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b
IP Address
^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
Date (YYYY-MM-DD)
^\d{4}-\d{2}-\d{2}$
Password Strength
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
Lookahead and Lookbehind
Match patterns based on what comes before or after:
Lookahead
(?=...) // Positive lookahead
(?!...) // Negative lookahead
// Example: Password must contain digit
^(?=.*\d).{8,}$
Lookbehind
(?<=...) // Positive lookbehind
(?
Regex in Different Languages
JavaScript
const regex = /\d+/g;
const text = "I have 2 cats and 3 dogs";
const matches = text.match(regex); // ["2", "3"]
// Replace
const newText = text.replace(/\d+/g, "X"); // "I have X cats and X dogs"
// Test
const isValid = /^\d{3}$/.test("123"); // true
Python
import re
text = "I have 2 cats and 3 dogs"
matches = re.findall(r'\d+', text) # ['2', '3']
# Replace
new_text = re.sub(r'\d+', 'X', text)
# Match
match = re.match(r'^\d{3}$', '123')
if match:
print("Valid")
PHP
$text = "I have 2 cats and 3 dogs";
preg_match_all('/\d+/', $text, $matches);
// $matches[0] = ["2", "3"]
// Replace
$newText = preg_replace('/\d+/', 'X', $text);
// Test
if (preg_match('/^\d{3}$/', '123')) {
echo "Valid";
}
Regex Flags
Modify regex behavior:
g- Global (find all matches)i- Case insensitivem- Multiline (^ and $ match line breaks)s- Dot matches newlineu- Unicodex- Extended (ignore whitespace)
/pattern/gi // Global, case insensitive
/pattern/m // Multiline mode
Performance Tips
Write efficient regex patterns:
- Be specific - avoid overly broad patterns
- Use non-capturing groups when you don't need the match
- Avoid nested quantifiers (catastrophic backtracking)
- Use anchors to limit search scope
- Test with large inputs
- Consider alternatives for complex parsing
Common Mistakes
Avoid these regex pitfalls:
- Forgetting to escape special characters
- Using greedy quantifiers when lazy is needed
- Not testing edge cases
- Overcomplicating patterns
- Using regex for parsing HTML/XML
- Not considering performance
Testing and Debugging
Tools for testing regex patterns:
- Regex101: Online regex tester with explanations
- RegExr: Visual regex builder
- Debuggex: Visual regex debugger
- RegexPal: Simple online tester
Best Practices
- Keep patterns simple and readable
- Add comments for complex patterns
- Test thoroughly with various inputs
- Use raw strings in languages that support them
- Consider maintainability
- Document your regex patterns
- Use named groups for clarity
Conclusion
Regular expressions are powerful tools for text processing. Start with simple patterns and gradually build complexity. Practice with real-world examples and use testing tools to validate your patterns. Remember that sometimes simpler string methods are more appropriate than regex.