Patterns define the characters that match at a particular location by specifying a "character range". Character ranges are specified with a "bracket expression", which defines the allowed characters. For example, the following specifies that the character must be one of "a", "b", or "c":
[abcdef]
As a simple demonstration, a pattern that matches either grey or gray would be:
gr[ae]y
Contiguous ranges
When a range includes a contiguous sequence of characters the "-" operator can be used to express all characters in the specified range. For example, the preceding character range can also be expressed in the more compact form:
[a-f]
To include a literal "-" in a character range, place the "-" at either the beginning or the of the range. For example, to define range consisting of lower-case letters and a hyphen:
[a-z-]
Inverted Ranges
It can often be convenient to specify which characters do not match, by specifing an "inverted range". A range is inverted when the first character of a range is a "^" For example, to match words that do not include a, b, or c, one could use the character range:
[^a-c]
Character escapes
Character ranges can include some non-printable characters.
Shortcut | Character |
---|---|
\t | Tab |
\n | New line |
Pre-defined Ranges
Character ranges are a very convenient way to represent groups of characters. Some ranges are so common that there is an even shorter notation for them.
Shortcut | Range | Inverted | Range |
---|---|---|---|
\l | [a-z] | \L | [^a-z] |
\u | [A-Z] | \U | [^A-Z] |
\a | [a-zA-Z] | \A | [^a-zA-Z] |
\h | [a-zA-Z_] | \H | [^a-zA-Z_] |
\w | [a-zA-Z0-9_] | \W | [^a-zA-Z0-9_] |
\d | [0-9] | \D | [^0-9] |
\x | [0-9a-fA-F] | \X | [^0-9a-fA-F] |
\o | [0-7] | \O | [^0-7] |
\s | [ \t] | \S | [^ \t] |
Meta-characters
Shortcut | Match |
---|---|
. | Any character except \n |
_ | Any character (including \n) |