Using Captures in Lua Patterns

Now that we have learned a bit more about character classes and repetition we have the tools we need to build patterns. Going back to our original example, we learned how to match an entire phone number:

local pattern = "+%d[(]%d%d%d[)]%d%d%d[-]%d%d%d%d"

print(string.match("+1(234)567-8910", pattern)) -- +1(234)567-8910
print(string.match("+1(23)4567-8910", pattern)) -- nil
print(string.match("(234)567-8910", pattern)) -- nil

This is very useful, but the real power in Lua patterns is the ability to extract information out of matched strings. This is done by using captures, which return specific portions of the matched string. A capture is created when portions of a pattern are surrounded in parentheses, ( and ).

The following example repeats the original example, but we have now created captures to locate and return the country code (cc), area code (ac), prefix (px), and suffix (sx) components of the phone number:

local pattern = "+(%d)[(](%d%d%d)[)](%d%d%d)[-](%d%d%d%d)"
-- (cc) ( ac ) ( px ) ( sx )

local cc, ac, px, sx = string.match("+1(234)567-8910", pattern)

print(cc) -- 1
print(ac) -- 234
print(px) -- 567
print(sx) -- 8910

In addition to the string.match library function there is also the string.gmatch function which performs a similar function, but returns an iterator over matches. Using this function is similar, but because it returns an iterator we have to access the result in a loop:

local pattern = "+(%d)[(](%d%d%d)[)](%d%d%d)[-](%d%d%d%d)"
-- (cc) ( ac ) ( px ) ( sx )

for cc, ac, px, sx in string.gmatch("+1(234)567-8910", pattern) do
print(cc) -- 1
print(ac) -- 234
print(px) -- 567
print(sx) -- 8910
end

It is a bit silly to iterator over a single match, but if we had a long string containing multiple phone numbers this would allow us to process each phone number in the string, one at a time. A more interesting example for our single-phone number case could be the following; since we know that phone numbers are sequences of digits, we could capture and iterate over each group of digits:

for d in string.gmatch("+1(234)567-8910", "%d+") do
print(d)
end

which produces the result:

1

234

567

8910