Character Ranges


Patterns define the characters that match at a particular location by specifying a "character range". Character ranges are specified with a "bracket expression", which defines the allowed characters. For example, the following specifies that the character must be one of "a", "b", or "c":

[abcdef]

As a simple demonstration, a pattern that matches either grey or gray would be:

gr[ae]y

Contiguous ranges

When a range includes a contiguous sequence of characters the "-" operator can be used to express all characters in the specified range. For example, the preceding character range can also be expressed in the more compact form:

[a-f]

To include a literal "-" in a character range, place the "-" at either the beginning or the of the range. For example, to define range consisting of lower-case letters and a hyphen:

[a-z-]

Inverted Ranges

It can often be convenient to specify which characters do not match, by specifing an "inverted range". A range is inverted when the first character of a range is a "^" For example, to match words that do not include a, b, or c, one could use the character range:

[^a-c]

Character escapes

Character ranges can include some non-printable characters.

Shortcut Character
\t Tab
\n New line

Pre-defined Ranges

Character ranges are a very convenient way to represent groups of characters. Some ranges are so common that there is an even shorter notation for them.

Shortcut Range Inverted Range
\l [a-z] \L [^a-z]
\u [A-Z] \U [^A-Z]
\a [a-zA-Z] \A [^a-zA-Z]
\h [a-zA-Z_] \H [^a-zA-Z_]
\w [a-zA-Z0-9_] \W [^a-zA-Z0-9_]
\d [0-9] \D [^0-9]
\x [0-9a-fA-F] \X [^0-9a-fA-F]
\o [0-7] \O [^0-7]
\s [ \t] \S [^ \t]

Meta-characters

Shortcut Match
. Any character except \n
_ Any character (including \n)