The next step to understanding patterns is repetition, which specify not only the class of the character, but how sequences of those characters should be matched.
Repetition specifications are indicated by placing one of the following characters immediately after a character class
Specifier | Description | Greedy |
---|---|---|
Default | matches any single character in the class | |
* |
matches a sequence consisting of 0 or more characters in the class | Yes |
- |
matches a sequence consisting of 0 or more characters in the class | No |
+ |
matches a sequence consisting of 1 or more characters in the class | Yes |
? |
matches 0 or 1 character in the class |
Greedy refers to how variable counts treat a matching sequence of characters. "Greedy" matching indicates that the pattern will always match the longest possible sequence, while "non-greedy" matching indicates that the pattern always match the shortest possible sequence.
Let's look at each of these to get a better idea of how this works. First, let's look at the default behavior, when no repetition is specified:
local pattern = "%d"
print(string.match("abc", pattern)) -- nil
print(string.match("1abc", pattern)) -- 1
print(string.match("12abc", pattern)) -- 1
print(string.match("123abc", pattern)) -- 1
When no repetition specifications are defined, by default a character class matches a single character. In the first example no match was made because there are no digits in the target, and therefore the call to match returned nil, while the pattern was able to match a single character of the other targets.
Next, let's look at *
which greedily matches 0 or more characters from the character
class:
local pattern = "%d*"
print(string.match("abc", pattern)) --
print(string.match("1abc", pattern)) -- 1
print(string.match("12abc", pattern)) -- 12
print(string.match("123abc", pattern)) -- 123
Notice that although there are no digits in the first target, the pattern still matched (it did not
return nil). Instead, it matched an empty string. This is due to the
repetition specification which allows the pattern to match 0
characters. For each of the other
targets, this pattern matched all available digits.
Now let's compare that behavior to that of a non-greedy match:
local pattern = "%d-"
print(string.match("abc", pattern)) --
print(string.match("1abc", pattern)) --
print(string.match("12abc", pattern)) --
print(string.match("123abc", pattern)) --
What is interesting in the case is that the pattern matched, but returned just an empty string. It is difficult to see this behavior with such a simple pattern, so let's repeat with a slightly more complicated pattern:
local pattern = "%d-%a"
print(string.match("abc", pattern)) -- a
print(string.match("1abc", pattern)) -- 1a
print(string.match("12abc", pattern)) -- 12a
print(string.match("123abc", pattern)) -- 123a
In this case, the pattern matched the fewest number of digits required to additionally match a
single letter. In order to match the letter, the pattern had to match all numbers leading up to it.
Another variation on the *
specifier is the +
specifier, which greedily-matches at least 1
character from the character class:
local pattern = "%d+"
print(string.match("abc", pattern)) -- nil
print(string.match("1abc", pattern)) -- 1
print(string.match("12abc", pattern)) -- 12
print(string.match("123abc", pattern)) -- 123
The main difference is shown in the top line, where *
matches an empty string, while +
simply
won't match. Finally, the ?
specifier matches 0 or 1 repetitions of the
character_classes:
local pattern = "%d?"
print(string.match("abc", pattern)) --
print(string.match("1abc", pattern)) -- 1
print(string.match("12abc", pattern)) -- 1
print(string.match("123abc", pattern)) -- 1
Now that we have the tools we need to understand and create patterns, let's move on to the next topic, captures.