### **Basic Characters**
- **.** (Dot): Matches any single character except a newline.
- Example: `a.b` matches "acb", "a+b", but not "ab" or "a\nb".
- *\\*_ (Backslash): Escapes the next character. This allows you to match characters that have special meaning in regex (e.g., `\.` matches a literal dot).
- Example: `\.` matches ".", `\\` matches "".
---
### **Character Classes**
- `\d`: Matches any digit (0-9). Equivalent to `[0-9]`.
- `\D`: Matches any non-digit. Equivalent to `[^0-9]`.
- `\w`: Matches any word character (alphanumeric characters plus underscore). Equivalent to `[a-zA-Z0-9_]`.
- `\W`: Matches any non-word character. Equivalent to `[^a-zA-Z0-9_]`.
- `\s`: Matches any whitespace character (spaces, tabs, newlines).
- `\S`: Matches any non-whitespace character.
- `[abc]`: Matches any single character within the brackets (a, b, or c).
- `[^abc]`: Matches any single character **not** within the brackets.
- `[a-z]`: Matches any lowercase letter from 'a' to 'z'.
- `[A-Z]`: Matches any uppercase letter from 'A' to 'Z'.
- `[0-9]`: Matches any digit from '0' to '9'.
---
### **Quantifiers**
Quantifiers specify how many times a character, group, or character class must be present in the input for a match to be found.
- **`*`**: Matches the previous element **zero or more** times.
- Example: `ab*c` matches "ac", "abc", "abbc", "abbbc", etc.
- **`+`**: Matches the previous element **one or more** times.
- Example: `ab+c` matches "abc", "abbc", but not "ac".
- **`?`**: Matches the previous element **zero or one** time (makes it optional).
- Example: `colou?r` matches "color" and "colour".
- **`{n}`**: Matches the previous **element exactly _n_** times.
- Example: `a{3}` matches "aaa".
- **`{n,}`**: Matches the previous element **_n_ or more** times.
- Example: `a{2,}` matches "aa", "aaa", "aaaa", etc.
- **`{n,m}`**: Matches the previous element **at least _n_ times** but no more than _m_ times.
- Example: `a{2,4}` matches "aa", "aaa", "aaaa", but not "a" or "aaaaa".
**Greedy vs. Lazy Quantifiers:**
By default, quantifiers are **greedy**, meaning they match as much text as possible. To make a quantifier **lazy** (match as little text as possible), add a `?` after it.
- `*?`: Matches zero or more times (lazy).
- `+?`: Matches one or more times (lazy).
- `??`: Matches zero or one time (lazy).
- `{n,}?`: Matches _n_ or more times (lazy).
- `{n,m}?`: Matches between _n_ and _m_ times (lazy).
- Example: `<.+>` (greedy) on `<a><b>` matches `<a><b>`.
- Example: `<.+?>` (lazy) on `<a><b>` matches `<a>` (and then `<b>` if searched again).
---
### **Anchors and Boundaries**
Anchors assert something about the string or the matching process.
- **`^`**: Matches the beginning of the string (or the beginning of a line if the multiline flag is enabled).
- Example: `^abc` matches "abc" only if it's at the start of the string.
- **`
**: Matches the end of the string (or the end of a line if the multiline flag is enabled).
- Example: `xyz
matches "xyz" only if it's at the end of the string.
- **`\b`**: Matches a word boundary (the position between a word character and a non-word character, or at the start/end of a string if the first/last character is a word character).
- Example: `\bcat\b` matches "cat" in "the cat sat" but not in "caterpillar".
- **`\B`**: Matches a non-word boundary.
- Example: `\Bcat\B` matches "cat" in "caterpillar" but not in "the cat sat".
---
### **Grouping and Capturing**
- **`( )`**: Groups multiple tokens together and creates a capturing group. The matched content can be referred to later.
- Example: `(abc)+` matches "abc", "abcabc", etc. The captured group would be "abc".
- **`\1`, `\2`, etc.**: Backreferences. Match the text captured by the Nth capturing group.
- Example: `(a)b\1` matches "aba".
- **`(?: )`**: Non-capturing group. Groups tokens but does not create a capturing group. This is useful for applying quantifiers to a group of characters without needing to capture the result.
- Example: `(?:abc)+` matches "abc", "abcabc", but "abc" is not captured.
- **`|`**: Alternation (OR operator). Matches either the expression before or the expression after the pipe.
- Example: `cat|dog` matches "cat" or "dog".
---
### **Lookarounds**
Lookarounds are zero-width assertions; they check for a pattern but don't include it in the match.
- **`(?=...)`**: Positive lookahead. Asserts that the characters following the current position match the pattern inside the lookahead, but doesn't consume those characters.
- Example: `Windows(?=95|98|NT|2000)` matches "Windows" only if it's followed by "95", "98", "NT", or "2000".
- **`(?!...)`**: Negative lookahead. Asserts that the characters following the current position do **not** match the pattern inside the lookahead.
- Example: `Windows(?!XP|Vista)` matches "Windows" only if it's **not** followed by "XP" or "Vista".
- **`(?<=...)`**: Positive lookbehind. Asserts that the characters preceding the current position match the pattern inside the lookbehind. (Note: Many regex engines require lookbehind patterns to have a fixed length).
- Example: `(?<=USD)\d+` matches numbers that are preceded by "USD".
- **`(?<!...)`**: Negative lookbehind. Asserts that the characters preceding the current position do **not** match the pattern inside the lookbehind. (Note: Fixed length often required).
- Example: `(?<!EUR)\d+` matches numbers that are **not** preceded by "EUR".
---
### **Flags (Modifiers)**
Flags change how the regex engine interprets the pattern. The way to specify flags varies between programming languages and tools (e.g., `/pattern/flags` in JavaScript or Perl, `re.compile(pattern, flags)` in Python).
- **`i`**: Case-insensitive matching.
- **`g`**: Global search (find all matches rather than stopping after the first match).
- **`m`**: Multiline mode. `^` and `
match the start/end of a line, not just the start/end of the entire string.
- **`s`** (or `.` all): Dotall mode. The dot (`.`) matches any character, _including_ newlines. (Behavior can vary, some engines use `s` for this).