Repetition


The next step to understanding patterns is repetition, which specify not only the class of the character, but how sequences of those characters should be matched.

Repetition specifications are indicated by placing one of the following characters immediately after a character class

Specifier Description Greedy
Default matches any single character in the class
* matches a sequence consisting of 0 or more characters in the class Yes
- matches a sequence consisting of 0 or more characters in the class No
+ matches a sequence consisting of 1 or more characters in the class Yes
? matches 0 or 1 character in the class

Greedy refers to how variable counts treat a matching sequence of characters. "Greedy" matching indicates that the pattern will always match the longest possible sequence, while "non-greedy" matching indicates that the pattern always match the shortest possible sequence.

Let's look at each of these to get a better idea of how this works. First, let's look at the default behavior, when no repetition is specified:

local pattern = "%d"

print(string.match("abc", pattern)) -- nil
print(string.match("1abc", pattern)) -- 1
print(string.match("12abc", pattern)) -- 1
print(string.match("123abc", pattern)) -- 1

When no repetition specifications are defined, by default a character class matches a single character. In the first example no match was made because there are no digits in the target, and therefore the call to match returned nil, while the pattern was able to match a single character of the other targets.

Next, let's look at * which greedily matches 0 or more characters from the character class:

local pattern = "%d*"

print(string.match("abc", pattern)) -- 
print(string.match("1abc", pattern)) -- 1
print(string.match("12abc", pattern)) -- 12
print(string.match("123abc", pattern)) -- 123

Notice that although there are no digits in the first target, the pattern still matched (it did not return nil). Instead, it matched an empty string. This is due to the repetition specification which allows the pattern to match 0 characters. For each of the other targets, this pattern matched all available digits.

Now let's compare that behavior to that of a non-greedy match:

local pattern = "%d-"

print(string.match("abc", pattern)) -- 
print(string.match("1abc", pattern)) -- 
print(string.match("12abc", pattern)) -- 
print(string.match("123abc", pattern)) -- 

What is interesting in the case is that the pattern matched, but returned just an empty string. It is difficult to see this behavior with such a simple pattern, so let's repeat with a slightly more complicated pattern:

local pattern = "%d-%a"

print(string.match("abc", pattern)) -- a
print(string.match("1abc", pattern)) -- 1a
print(string.match("12abc", pattern)) -- 12a
print(string.match("123abc", pattern)) -- 123a

In this case, the pattern matched the fewest number of digits required to additionally match a single letter. In order to match the letter, the pattern had to match all numbers leading up to it. Another variation on the * specifier is the + specifier, which greedily-matches at least 1 character from the character class:

local pattern = "%d+"

print(string.match("abc", pattern)) -- nil
print(string.match("1abc", pattern)) -- 1
print(string.match("12abc", pattern)) -- 12
print(string.match("123abc", pattern)) -- 123

The main difference is shown in the top line, where * matches an empty string, while + simply won't match. Finally, the ? specifier matches 0 or 1 repetitions of the character_classes:

local pattern = "%d?"

print(string.match("abc", pattern)) -- 
print(string.match("1abc", pattern)) -- 1
print(string.match("12abc", pattern)) -- 1
print(string.match("123abc", pattern)) -- 1

Now that we have the tools we need to understand and create patterns, let's move on to the next topic, captures.