What is a regular expression (regex)?

A regular expression (regex) is a pattern that describes a set of strings. It is used to search, validate, and transform text. For example, the pattern \d{3}-\d{4} matches phone number formats like 555-1234. Regex is supported in virtually every programming language and text editor. It looks cryptic at first but becomes an essential skill for text processing tasks.

What are the most important regex characters to learn first?

Start with these: . (any character), * (zero or more), + (one or more), ? (zero or one), ^ (start of string), $ (end of string), [] (character class — e.g. [a-z]), \d (any digit), \w (any word character), \s (whitespace), and () for grouping. With just these, you can write useful patterns for email validation, phone number extraction, and URL matching.

How do I test a regex pattern?

Use a regex tester that shows real-time matches as you type. Enter your pattern, your test string, and the tester highlights all matches. Good testers also show captured groups separately and explain what each part of the pattern matches. Test against edge cases — what happens with an empty string, very long input, or characters you did not expect?

What is the difference between greedy and lazy matching in regex?

By default, regex quantifiers (* and +) are greedy — they match as much text as possible. The pattern .+ applied to "start bold end" matches the entire string from start to end. Adding ? makes it lazy (.+?), matching as little as possible — it would match just "start ". Lazy matching is essential when extracting content between HTML tags or delimiters.

Regular Expressions (Regex) Explained for Beginners — With Examples

What Is a Regular Expression?

A regular expression (regex or regexp) is a sequence of characters that defines a search pattern. You use regex to find, match, extract, or replace text that follows a specific structure — for example, all email addresses in a document, all lines starting with a number, or all words longer than 10 characters. Regex is not a programming language — it is a mini-language for describing text patterns, supported natively in JavaScript, Python, PHP, Ruby, Go, Java, bash, and nearly every other programming language. It is also available in text editors (VS Code, Sublime Text, Notepad++) and tools like grep.

The Building Blocks: Literals and Metacharacters

A regex pattern consists of two types of characters. Literal characters match exactly — the pattern 'cat' matches the string 'cat'. Metacharacters have special meanings. The most important ones: . (dot) — matches any single character except a newline. * — matches the preceding element zero or more times. + — matches the preceding element one or more times. ? — matches the preceding element zero or one time (makes it optional). ^ — asserts the start of a line. $ — asserts the end of a line. [] — a character class; matches any one character inside the brackets. [^] — negated character class; matches any character NOT inside the brackets. | — alternation (OR); matches either the left or right side. () — a capturing group; groups part of the pattern. \ — escape character; treats the next metacharacter as a literal.

Character Classes and Shorthand

Character classes let you match sets of characters. [aeiou] matches any single vowel. [a-z] matches any lowercase letter. [A-Z] matches any uppercase letter. [0-9] matches any digit. [a-zA-Z0-9] matches any alphanumeric character. Regex also provides shorthand character classes: \d — matches any digit (equivalent to [0-9]). \D — matches any non-digit. \w — matches any word character (letters, digits, underscore; equivalent to [a-zA-Z0-9_]). \W — matches any non-word character. \s — matches any whitespace character (space, tab, newline). \S — matches any non-whitespace character. \b — a word boundary (the position between a word character and a non-word character). Example: \bcat\b matches 'cat' as a whole word but not 'concatenate'.

Quantifiers: Controlling How Many Times

Quantifiers specify how many times a pattern element must appear. * — zero or more times (e.g., 'go*gle' matches 'ggle', 'gogle', 'google', 'gooooogle'). + — one or more times (e.g., 'go+gle' matches 'gogle', 'google', but NOT 'ggle'). ? — zero or one time (makes it optional; e.g., 'colou?r' matches both 'color' and 'colour'). {n} — exactly n times (e.g., '\d{4}' matches exactly 4 digits). {n,} — n or more times (e.g., '\d{3,}' matches 3 or more digits). {n,m} — between n and m times (e.g., '\d{3,5}' matches 3, 4, or 5 digits). By default, quantifiers are greedy — they match as much as possible. Add ? after a quantifier to make it lazy (match as little as possible): .* is greedy; .*? is lazy.

Practical Regex Examples

These patterns cover the most common real-world uses. Email address (simplified): [a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}. UK postcode: [A-Z]{1,2}[0-9][0-9A-Z]?\s?[0-9][A-Z]{2}. US phone number: (\+1)?[\s\-.]?$?\d{3}$?[\s\-.]?\d{3}[\s\-.]?\d{4}. URL: https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*). IPv4 address: (\d{1,3}\.){3}\d{1,3}. Date (YYYY-MM-DD): \d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]). Hexadecimal colour: #[0-9a-fA-F]{3,6}. Empty lines: ^\s*$. Lines starting with a number: ^\d. Note: no single regex perfectly validates all valid emails or URLs — these are practical approximations suitable for most use cases.

Flags / Modifiers

Most regex engines support flags that modify how the pattern is applied. The most common: i (case-insensitive): /hello/i matches 'hello', 'Hello', 'HELLO'. g (global): find all matches, not just the first. m (multiline): ^ and $ match the start and end of each line, not just the whole string. s (dotAll / DOTALL): makes . match newline characters too. x (verbose/extended): allows whitespace and comments in the pattern for readability (Python, Ruby; not JavaScript). In JavaScript, flags appear after the closing slash: /pattern/gi. In Python, they are passed as arguments: re.findall(pattern, text, re.IGNORECASE).

How to Test Regex in Your Browser

The Regex Tester at allio.tools/tools/developer/regex-tester/ lets you write and test regular expressions instantly in your browser — no setup, no library imports. Enter your pattern, paste sample text, and see all matches highlighted in real time. You can toggle flags (i, g, m) and see capturing group results. Regex is notoriously difficult to write correctly on the first try — always test with real-world sample data that includes edge cases (empty strings, special characters, international characters, very long inputs). Testing in the browser before embedding regex in code saves significant debugging time.