Patterns define the characters that match at a particular location by specifying a "character range". Character ranges are specified with a "bracket expression", which defines the allowed characters. For example, the following specifies that the character must be one of "a", "b", or "c":
[abcdef]
As a simple demonstration, a pattern that matches either grey or gray would be:
gr[ae]y
Contiguous ranges
When a range includes a contiguous sequence of characters the "-" operator can be used to express all characters in the specified range. For example, the preceding character range can also be expressed in the more compact form:
[a-f]
To include a literal "-" in a character range, place the "-" at either the beginning or the of the range. For example, to define range consisting of lower-case letters and a hyphen:
[a-z-]
Inverted Ranges
It can often be convenient to specify which characters do not match, by specifing an "inverted range". A range is inverted when the first character of a range is a "^" For example, to match words that do not include a, b, or c, one could use the character range:
[^a-c]
Character escapes
Character ranges can include some non-printable characters.
| Shortcut | Character |
|---|---|
| \t | Tab |
| \n | New line |
Pre-defined Ranges
Character ranges are a very convenient way to represent groups of characters. Some ranges are so common that there is an even shorter notation for them.
| Shortcut | Range | Inverted | Range |
|---|---|---|---|
| \l | [a-z] | \L | [^a-z] |
| \u | [A-Z] | \U | [^A-Z] |
| \a | [a-zA-Z] | \A | [^a-zA-Z] |
| \h | [a-zA-Z_] | \H | [^a-zA-Z_] |
| \w | [a-zA-Z0-9_] | \W | [^a-zA-Z0-9_] |
| \d | [0-9] | \D | [^0-9] |
| \x | [0-9a-fA-F] | \X | [^0-9a-fA-F] |
| \o | [0-7] | \O | [^0-7] |
| \s | [ \t] | \S | [^ \t] |
Meta-characters
| Shortcut | Match |
|---|---|
| . | Any character except \n |
| _ | Any character (including \n) |