Regular Expressions

Regex

Regular expressions are:

  • patterns used to match and manipulate text strings

Basic Patterns

  • Literal characters match themselves (“cat” matches “cat”)
  • Special characters have specific meanings:
    • . (dot) - matches any single character except newline
    • * matches 0 or more of the previous character
    • + matches 1 or more of the previous character
    • ? - matches 0 or 1 of the previous character
    • ^ - matches start of line
    • $ - matches end of line

Other syntax

  • \d - matches any digit (0-9)
  • \w - matches any word character (a-z, A-Z, 0-9, _)
  • \s - matches any whitespace character (space, tab, newline)
  • [] used to indicate a set of characters.
  • {m} matches m times
  • {m,n} matches m to n times

e.g., lower case letters: [a-z], upper case letters: [A-Z]

Examples

[abc] - matches any single character from the set (a, b, or c)

[^abc] - matches any single character NOT in the set

Python examples

Email validation pattern:

pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$'

Phone number pattern (US format):

pattern = r'\d{3}-\d{3}-\d{4}'