A character class is used to represent a set of characters. The following combinations are allowed in describing a character class:
Character Classes
Lua patterns come with a selection of "built-in" character classes that are useful in more situations:
Class | Description |
---|---|
x |
(where x is not one of the "magic" characters) represents the character x itself. |
%x |
(where x is any non-alphanumeric character) represents the character x . |
. |
(a dot) represents all characters. |
%a |
represents all letters. |
%c |
represents all control characters. |
%d |
represents all digits. |
%l |
represents all lowercase letters. |
%p |
represents all punctuation characters. |
%s |
represents all space characters. |
%u |
represents all uppercase letters. |
%w |
represents all alphanumeric characters. |
%x |
represents all hexadecimal digits. |
%z |
represents the character with representation 0 . |
When the built-in character classes are not sufficient for a task, custom character classes can be easily defined. Custom character classes are defined by surrounding the characters that should be included in the classes with square brackets.
For example, [set]
represents a character class consisting of the letters s
, e
, and t
. When
a character class contains a contiguous sequence of letters or numbers, that sequence can be
represented by a shorthand notation consisting of the first and last characters of the sequence,
separated by a -
. For example, the sequence of digits 0123456789
can be shorted to 0-9
. As an
even shorter alternative, Lua allows built-in character classes to be included in custom class
specifications.
Finally, character classes can also be defined as the "complement" (or inverse) of the specified
set. For built-in character classes the complementary class is specified by upper-case class name,
so since %d
is the class containing all digits 0-9
, then %D
is the class containing all
characters except for those. For custom character classes inversion is achieved by making
the first character of the set a ^
. As an example, [^set]
defines a character class containing
all characters except s
, e
, and t
.
Here are some examples showing alternative implementations for some of the built-in character classes above:
Class | Equivalent |
---|---|
%d |
[0123456789] |
%d |
[0-9] |
%D |
[^0-9] |
%a |
[a-zA-Z] |
%l |
[a-z] |
%u |
[A-Z] |
%a |
[%l%u] |
%w |
[%a%d] |
%x |
[0-9a-f] |
In a few cases we show multiple alternative implementations for the same built-in character class in order to show their flexibility.
Magic Characters
In the preceding discussion we saw a number of characters that had special meaning. For example, %
indicates a built-in character class, [
and ]
indicate a custom character class, and -
indicates a range of characters. These (and a few more) characters are designated "magic characters"
because they have special meaning in patterns.
The list of magic characters are:
^$()%.[]*+-?
This is important to understand when a pattern should interpret one of the magic characters
literally. For example, in the previous we matched a phone number which
consisted of magic characters +
, (
, )
, and -
There are two ways to indicate a literal interpretation of magic characters. First, a custom character class consisting of the magic character can be used to force a literal interpretation:
local pattern = "+%d[(]%d%d%d[)]%d%d%d[-]%d%d%d%d"
print(string.match("+1(234)567-8910", pattern)) -- +1(234)567-8910
print(string.match("+1(23)4567-8910", pattern)) -- nil
print(string.match("(234)567-8910", pattern)) -- nil
The second option is to escape the magic character with a %
:
local pattern = "%+%d%(%d%d%d%)%d%d%d%-%d%d%d%d"
print(string.match("+1(234)567-8910", pattern)) -- +1(234)567-8910
print(string.match("+1(23)4567-8910", pattern)) -- nil
print(string.match("(234)567-8910", pattern)) -- nil
Which to use is primarily a matter of personal preference and readability.