Substitution with Lua Patterns

Lua supports substitution, which allows matched strings to be replaced with some replacement text. For example, going back to our phone number example suppose we want to redact phone numbers out of a block of text:

local pattern = "+(%d)[(](%d%d%d)[)](%d%d%d)[-](%d%d%d%d)"
-- (cc) ( ac ) ( px ) ( sx )

local target = "abc +1(234)567-8910 def"

local repl = "+x(xxx)xxx-xxxx"

local result, count = string.gsub(target, pattern, repl)

print(result) -- abc +x(xxx)xxx-xxxx def
print(count) -- 1

Suppose we want to generalize the pattern to apply to a wider range of phone number formats. In order to accomplish this goal we need to match varying-length sequences of digits, then replace them with a redacted string of the same length. This can be achieved in several ways, but we will take this as an opportunity to demonstrate using a function for the replacement.

When a function is used for the replacement, the function is passed all captured text as arguments, of if pattern defines no captures, then the entire matched string will be passed to the function as a single argument. The function is expected to return either a string or number that will be used as the replacement text, or a falsy value to indicate that the replacement should not be made (and the original text retained).

To demonstrate, let's create a simple pattern that matches any sequence of digits, then return a redaction string of the same length:

Replacement Functions

local pattern = "%d+"

local target = "abc +1(234)567-8910 def"

local repl = function(capture)
return string.rep("x", string.len(capture))
end

local result, count = string.gsub(target, pattern, repl)

print(result) -- abc +x(xxx)xxx-xxxx def
print(count) -- 4

The next feature of string.gsub we want to demonstrate is the ability to use a table for the replacement. When a table is used for the replacement, each capture is used as a table key, then if the table returns a value that value is used to replace the captured text, otherwise the original text is retained.

For this example, suppose that the phone company has decided to introduce some new area codes, requiring existing phone numbers to be updated to the new area code. In this case we could define a table that maps old area codes to new area codes, then use patterns to match and update phone numbers in a document:

Replacement Tables

local pattern = "%d%d%d"

local target = "abc +1(234)567-8910 def"

local repl = {}
repl["234"] = "111"
repl["345"] = "222"
repl["456"] = "333"

local result, count = string.gsub(target, pattern, repl)

print(result) -- abc +1(111)567-8910 def
print(count) -- 3

A convenient feature of this implementation is that phone numbers with area codes that have not changed will not be updated, as long as their area codes are not in the replacement table.

The final function that we wanted to touch on is the ability to reference captured text in the replacement string. To demonstrate, let's suppose that we want to change the format of the phone numbers in our document. To do so we will use the same pattern that we created earlier, but now we want to reference the captured text in the replacement string.

Capture References

Captured text is referenced from the replacement string by index, starting from the left side of the pattern each capture group is numbered by the position of it's left-most parenthesis, starting from 1 and continuing for each capture group. In our example there are four capture groups, numbered 1 through 4, and we build our replacement string as shown below:

local pattern = "+(%d)[(](%d%d%d)[)](%d%d%d)[-](%d%d%d%d)"
-- 1 2 3 4

local target = "text +1(234)567-8910 text"

local repl = "%1.%2.%3.%4"

local result, count = string.gsub(target, pattern, repl)

print(result) -- text 1.234.567.8910 text
print(count) -- 1