What Is a Regular Expression?

A regular expression (regex or regexp) is a sequence of characters that defines a search pattern. You use regex to find, match, extract, or replace text that follows a specific structure — for example, all email addresses in a document, all lines starting with a number, or all words longer than 10 characters. Regex is not a programming language — it is a mini-language for describing text patterns, supported natively in JavaScript, Python, PHP, Ruby, Go, Java, bash, and nearly every other programming language. It is also available in text editors (VS Code, Sublime Text, Notepad++) and tools like grep.

The Building Blocks: Literals and Metacharacters

A regex pattern consists of two types of characters. Literal characters match exactly — the pattern 'cat' matches the string 'cat'. Metacharacters have special meanings. The most important ones: . (dot) — matches any single character except a newline. * — matches the preceding element zero or more times. + — matches the preceding element one or more times. ? — matches the preceding element zero or one time (makes it optional). ^ — asserts the start of a line. $ — asserts the end of a line. [] — a character class; matches any one character inside the brackets. [^] — negated character class; matches any character NOT inside the brackets. | — alternation (OR); matches either the left or right side. () — a capturing group; groups part of the pattern. \ — escape character; treats the next metacharacter as a literal.

Character Classes and Shorthand

Character classes let you match sets of characters. [aeiou] matches any single vowel. [a-z] matches any lowercase letter. [A-Z] matches any uppercase letter. [0-9] matches any digit. [a-zA-Z0-9] matches any alphanumeric character. Regex also provides shorthand character classes: \d — matches any digit (equivalent to [0-9]). \D — matches any non-digit. \w — matches any word character (letters, digits, underscore; equivalent to [a-zA-Z0-9_]). \W — matches any non-word character. \s — matches any whitespace character (space, tab, newline). \S — matches any non-whitespace character. \b — a word boundary (the position between a word character and a non-word character). Example: \bcat\b matches 'cat' as a whole word but not 'concatenate'.

Quantifiers: Controlling How Many Times

Quantifiers specify how many times a pattern element must appear. * — zero or more times (e.g., 'go*gle' matches 'ggle', 'gogle', 'google', 'gooooogle'). + — one or more times (e.g., 'go+gle' matches 'gogle', 'google', but NOT 'ggle'). ? — zero or one time (makes it optional; e.g., 'colou?r' matches both 'color' and 'colour'). {n} — exactly n times (e.g., '\d{4}' matches exactly 4 digits). {n,} — n or more times (e.g., '\d{3,}' matches 3 or more digits). {n,m} — between n and m times (e.g., '\d{3,5}' matches 3, 4, or 5 digits). By default, quantifiers are greedy — they match as much as possible. Add ? after a quantifier to make it lazy (match as little as possible): .* is greedy; .*? is lazy.

Practical Regex Examples

These patterns cover the most common real-world uses. Email address (simplified): [a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}. UK postcode: [A-Z]{1,2}[0-9][0-9A-Z]?\s?[0-9][A-Z]{2}. US phone number: (\+1)?[\s\-.]?\(?\d{3}\)?[\s\-.]?\d{3}[\s\-.]?\d{4}. URL: https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*). IPv4 address: (\d{1,3}\.){3}\d{1,3}. Date (YYYY-MM-DD): \d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]). Hexadecimal colour: #[0-9a-fA-F]{3,6}. Empty lines: ^\s*$. Lines starting with a number: ^\d. Note: no single regex perfectly validates all valid emails or URLs — these are practical approximations suitable for most use cases.

Flags / Modifiers

Most regex engines support flags that modify how the pattern is applied. The most common: i (case-insensitive): /hello/i matches 'hello', 'Hello', 'HELLO'. g (global): find all matches, not just the first. m (multiline): ^ and $ match the start and end of each line, not just the whole string. s (dotAll / DOTALL): makes . match newline characters too. x (verbose/extended): allows whitespace and comments in the pattern for readability (Python, Ruby; not JavaScript). In JavaScript, flags appear after the closing slash: /pattern/gi. In Python, they are passed as arguments: re.findall(pattern, text, re.IGNORECASE).

How to Test Regex in Your Browser

The Regex Tester at allio.tools/tools/developer/regex-tester/ lets you write and test regular expressions instantly in your browser — no setup, no library imports. Enter your pattern, paste sample text, and see all matches highlighted in real time. You can toggle flags (i, g, m) and see capturing group results. Regex is notoriously difficult to write correctly on the first try — always test with real-world sample data that includes edge cases (empty strings, special characters, international characters, very long inputs). Testing in the browser before embedding regex in code saves significant debugging time.