unknown_document

We couldn't find that page... Were you looking for one of these?


Format Strings
We have already learned that Lua strings are immutable , and that changing a string produces a copy of the string which can make string construction through concatenation inefficient. String buffers can provide a more efficient means of constructing a string from one or more sub-strings in many situations. Lua format strings are another option that can provide a flexible and more efficient alternative. Format strings are strings that contain "placeholders" that identify where values are to be inserted, and a simple "format specifier" that defines how each value should be formatted. These strings are then passed to the string.format library function with the values to be used to render the final string. Let's look at an example: print ( string.format ( "%s %d %f" , "hello" , 123 , math.pi )) -- hello 123 3.141593 This example used a simple template, consisting of placeholders %s , %d , and %f , which were populated by the values "hello" , 123 , and math.pi , respectively. Placeholders are defined by % , followed by the format specifier that has the format: %[flags][width][.precision]type where specifiers in brackets are optional, and behave slightly differently depending on the type specifier. Learning to use Format strings effectively takes some experience, trial, and error, so we will touch on the highlights and provide examples to show some common use-cases. Type Specifier The type specifier indicates how the corresponding value should be formatted. Most of the type options relate to numerical values, and render them in various ways: Type Meaning Example Value Result s String %s abc abc d or i Decimal or integer %d 123 123 f Float %f math.pi 3.141593 e Scientific notation %e math.pi 3.141593e+00 E Scientific notation %E math.pi 3.141593E+00 g Autoformat %g math.pi 3.141593 G Autoformat %G math.pi 3.141593 o Octal %o 123 173 x Hexadecimal %x 123 7b c Character %c 97 a q Lua code %q "abc" "abc" Two Scientific notation options are available, with the difference being whether e or E are used to identify the exponent. There are also two Autoformat options, which evaluate the value and automatically determine whether to format the value as a float or using scientific notation . The two autoformat options differ in how they format the value when scientific notation is used, where g and G correspond to e and E format, respectively. Character formatting converts a character code (typically an ASCII code) into the corresponding character, and behaves like the string.char library function. Finally, the q type specified is a Lua-specific option that renders the value in a format that can be parsed and executed by Lua. Width The width field specifies the minimum number of characters of width the value should occupy. If the value requires fewer than the specified number of characters, then this many characters are occupied. If the value requires more characters, then use the required number of characters. We can see how this behaves by formatting math.pi to a range of widths : print ( string.format ( "%1f" , math.pi )) -- 3.141593 print ( string.format ( "%2f" , math.pi )) -- 3.141593 print ( string.format ( "%3f" , math.pi )) -- 3.141593 print ( string.format ( "%4f" , math.pi )) -- 3.141593 print ( string.format ( "%5f" , math.pi )) -- 3.141593 print ( string.format ( "%6f" , math.pi )) -- 3.141593 print ( string.format ( "%7f" , math.pi )) -- 3.141593 print ( string.format ( "%8f" , math.pi )) -- 3.141593 print ( string.format ( "%9f" , math.pi )) -- 3.141593 print ( string.format ( "%10f" , math.pi )) -- 3.141593 print ( string.format ( "%11f" , math.pi )) -- 3.141593 Not much happens when small width values are specified, then as the width exceeds 8 characters the value becomes "padded" with empty spaces. This is the default behavior, though we can change that with the Flags specifier. Flags As we saw in the previous section, when a value requires fewer characters of width that are specified the value becomes "left-padded" with spaces. Format strings provide a few flags that allow this behavior to be changed: Flag Meaning Example Value Result None Default %4d 1 1 - Left Align %-4d 1 1 + Show both + and - signs %+4d 1 +1 0 When width is present, zero-pad the value. %04d 1 0001 space prepend a space ( " " ) for positive values % 4d 1 1 Precision The precision field usually specifies a maximum limit of the output, depending on the particular formatting type. The precision specifier defines the maximum width of the output, which is accomplished by rounding the value to the right of the decimal point to the specified number of digits: print ( string.format ( "%4.0f" , math.pi )) -- 3 print ( string.format ( "%4.1f" , math.pi )) -- 3.1 print ( string.format ( "%4.2f" , math.pi )) -- 3.14 print ( string.format ( "%4.3f" , math.pi )) -- 3.142 print ( string.format ( "%4.4f" , math.pi )) -- 3.1416 print ( string.format ( "%4.5f" , math.pi )) -- 3.14159 print ( string.format ( "%4.6f" , math.pi )) -- 3.141593 print ( string.format ( "%4.7f" , math.pi )) -- 3.1415927 print ( string.format ( "%4.8f" , math.pi )) -- 3.14159265 print ( string.format ( "%4.9f" , math.pi )) -- 3.141592654 print ( string.format ( "%4.10f" , math.pi )) -- 3.1415926536 print ( string.format ( "%4.11f" , math.pi )) -- 3.14159265359
Working with Lua Strings
Words and other sequences of characters are represented by strings . Lua strings are immutable , meaning that they cannot be modified after they are created. Changing a string requires creating a new string that consists of the characters of the previous string , plus whatever changes are desired. We will learn more about this when we look at string buffers in the Strings chapter. Lua strings are defined by a sequence of characters contained in: Double quotes Single quotes Double square brackets The characters of strings defined with either single or double quotes must exist on a single line, while those in strings defined with brackets can be on multiple lines. Strings defined with single and double quotes are equivalent, and both types of quotes are supported so that quote characters can be included in strings themselves. print ( "this 'contains' quotes'" ) print ( 'this "also" contains quotes' ) The following compares strings defined with quotes vs brackets: -- single or double quotes print ( "this is a string" ) print ( "this is a \n multi-line string" ) -- square brackets print ( [[this is a string]] ) print ( [[this is a multi-line string]] ) Note that the quoted multi-line string definition included a \n at the point where the string broke between lines. This is called an escape character (sometimes also called an "escape sequence" or simply "escapes"), which is the topic of the next section.
Lua String Length
Lua includes two methods of determining the length of a string ; the length operator # and the library function string.len . Whereas similar functions in most high-level languages return the length as the number of characters in the string , Lua returns the number of bytes of memory occupied by the string. In many common cases such as ASCII these yield the same result, but case should be taken to watch for unexpected results. Let's take a look at a few examples: local x = "abcd" local y = "ab cd" print ( # x ) -- 4 print ( # y ) -- 5 print ( string.len ( x )) -- 4 print ( y : len ()) -- 5 In this case, we defined two strings containing only ASCII characters, and demonstrated how to find the length using both the length operator and the string.len library function. Both examples are basically the same, although the second example include a space ( " " ) to show that these are counted, just like any other character. Now, let's take a look at a few unicode strings, which can each occupy multiple bytes: local x = "中文" local y = "zhōng wén" local z = "😊" -- careful! print ( # x ) -- 6 print ( # y ) -- 11 print ( # z ) -- 4 print ( string.len ( x )) -- 6 print ( y : len ()) -- 11 print ( z : len ()) -- 4 These are (probably) not the results that we would have expected. Our main point is to make you aware of how Lua treats string length , and how it might produce unexpected results.
Escapes
Note that the quoted multi-line string definition included a \n at the point where the string broke between lines. This is called an escape character (sometimes also called an "escape Escape sequences are special combinations of characters that consist of a backslash \ followed by another character that defines the meaning of the escape. In this case, \n presents a newline character, which defined the point at which a string continues on the next line. Lua strings support a variety of Escape Character Meaning \a bell \b back space \f form feed \n newline \r carriage return \t horizontal tab \v vertical tab \ backslash \" double quote \' single quote \[ left square bracket \] right square bracket
Lua
Lua is a lightweight, cross-platform, high-level programming language that has become increasingly-popular as an embedded scripting language. Lua is often praised as a clean, simple, easy to learn scripting language that is applicable to a wide range of tasks. Today Lua can be found in a wide range of applications ranging from wezterm to xplr and neovim . Let's get started.
Working with Lua Patterns
In the previous section we looked at format strings , which provide a convenient way to create consistent, well-formatted strings that can contain unique values. In this chapter we look at patterns , which are roughly the opposite: patterns allow us to extract values from well-formatted strings. A pattern defines a specification that describes the content of a strings . When a pattern is compared to a "target" string , if the pattern describes the string we say that the target "matches the pattern". Patterns are generally composed of a sequence of smaller patterns , called "atoms". These atoms are arranged in a specific sequence to create the specification. As a simple example, let's build a pattern that matches phone numbers, then we will describe the process in more detail in the coming sections. We will assume phone numbers matching the format: +1(234)567-8910 The first step is to break this string into a sequence of components: +[country code]([area code])[prefix]-[line number] Next, we need to specify each component in terms of regular expressions, then combine them into the pattern. One possible final pattern might be: +%d[(]%d%d%d[)]%d%d%d-%d%d%d%d which is made up of: A literal + The " character class " %d , which specifies that the characters in those locations must be digits, Literal ( and ) surrounding a sequence of 3 digits %d Sequences of 3 and 4 digits, separated by a literal - If we apply this pattern to our target phone number, the target string is returned (which is a truthy value indicating that a match occurred. For comparison, we include a few other target strings with slightly different format, which produce nil (i.e. falsy ) values indicating that a match did not occur. local pattern = "+%d[(]%d%d%d[)]%d%d%d[-]%d%d%d%d" print ( string.match ( "+1(234)567-8910" , pattern )) -- +1(234)567-8910 print ( string.match ( "+1(23)4567-8910" , pattern )) -- nil print ( string.match ( "(234)567-8910" , pattern )) -- nil With that quick introduction, let's learn how to create patterns.
Captures with Lua Patterns
Now that we have learned a bit more about character classes and repetition we have the tools we need to build patterns. Going back to our original example, we learned how to match an entire phone number: local pattern = "+%d[(]%d%d%d[)]%d%d%d[-]%d%d%d%d" print ( string.match ( "+1(234)567-8910" , pattern )) -- +1(234)567-8910 print ( string.match ( "+1(23)4567-8910" , pattern )) -- nil print ( string.match ( "(234)567-8910" , pattern )) -- nil This is very useful, but the real power in Lua patterns is the ability to extract information out of matched strings. This is done by using captures , which return specific portions of the matched string. A capture is created when portions of a pattern are surrounded in parentheses, ( and ) . The following example repeats the original example, but we have now created captures to locate and return the country code (cc), area code (ac), prefix (px), and suffix (sx) components of the phone number: local pattern = "+(%d)[(](%d%d%d)[)](%d%d%d)[-](%d%d%d%d)" -- (cc) ( ac ) ( px ) ( sx ) local cc , ac , px , sx = string.match ( "+1(234)567-8910" , pattern ) print ( cc ) -- 1 print ( ac ) -- 234 print ( px ) -- 567 print ( sx ) -- 8910 In addition to the string.match library function there is also the string.gmatch function which performs a similar function, but returns an iterator over matches. Using this function is similar, but because it returns an iterator we have to access the result in a loop : local pattern = "+(%d)[(](%d%d%d)[)](%d%d%d)[-](%d%d%d%d)" -- (cc) ( ac ) ( px ) ( sx ) for cc , ac , px , sx in string.gmatch ( "+1(234)567-8910" , pattern ) do print ( cc ) -- 1 print ( ac ) -- 234 print ( px ) -- 567 print ( sx ) -- 8910 end It is a bit silly to iterator over a single match, but if we had a long string containing multiple phone numbers this would allow us to process each phone number in the string, one at a time. A more interesting example for our single-phone number case could be the following; since we know that phone numbers are sequences of digits, we could capture and iterate over each group of digits: for d in string.gmatch ( "+1(234)567-8910" , "%d+" ) do print ( d ) end which produces the result: 1 234 567 8910
Repetition in Lua Patterns
The next step to understanding patterns is repetition , which specify not only the class of the character, but how sequences of those characters should be matched. Repetition specifications are indicated by placing one of the following characters immediately after a character class Specifier Description Greedy Default matches any single character in the class * matches a sequence consisting of 0 or more characters in the class Yes - matches a sequence consisting of 0 or more characters in the class No + matches a sequence consisting of 1 or more characters in the class Yes ? matches 0 or 1 character in the class Greedy refers to how variable counts treat a matching sequence of characters. "Greedy" matching indicates that the pattern will always match the longest possible sequence, while "non-greedy" matching indicates that the pattern always match the shortest possible sequence. Let's look at each of these to get a better idea of how this works. First, let's look at the default behavior, when no repetition is specified: local pattern = "%d" print ( string.match ( "abc" , pattern )) -- nil print ( string.match ( "1abc" , pattern )) -- 1 print ( string.match ( "12abc" , pattern )) -- 1 print ( string.match ( "123abc" , pattern )) -- 1 When no repetition specifications are defined, by default a character class matches a single character. In the first example no match was made because there are no digits in the target, and therefore the call to match returned nil , while the pattern was able to match a single character of the other targets. Next, let's look at * which greedily matches 0 or more characters from the character class : local pattern = "%d*" print ( string.match ( "abc" , pattern )) -- print ( string.match ( "1abc" , pattern )) -- 1 print ( string.match ( "12abc" , pattern )) -- 12 print ( string.match ( "123abc" , pattern )) -- 123 Notice that although there are no digits in the first target, the pattern still matched (it did not return nil ). Instead, it matched an empty string. This is due to the repetition specification which allows the pattern to match 0 characters. For each of the other targets, this pattern matched all available digits. Now let's compare that behavior to that of a non-greedy match: local pattern = "%d-" print ( string.match ( "abc" , pattern )) -- print ( string.match ( "1abc" , pattern )) -- print ( string.match ( "12abc" , pattern )) -- print ( string.match ( "123abc" , pattern )) -- What is interesting in the case is that the pattern matched, but returned just an empty string. It is difficult to see this behavior with such a simple pattern, so let's repeat with a slightly more complicated pattern: local pattern = "%d-%a" print ( string.match ( "abc" , pattern )) -- a print ( string.match ( "1abc" , pattern )) -- 1a print ( string.match ( "12abc" , pattern )) -- 12a print ( string.match ( "123abc" , pattern )) -- 123a In this case, the pattern matched the fewest number of digits required to additionally match a single letter. In order to match the letter, the pattern had to match all numbers leading up to it. Another variation on the * specifier is the + specifier, which greedily-matches at least 1 character from the character class : local pattern = "%d+" print ( string.match ( "abc" , pattern )) -- nil print ( string.match ( "1abc" , pattern )) -- 1 print ( string.match ( "12abc" , pattern )) -- 12 print ( string.match ( "123abc" , pattern )) -- 123 The main difference is shown in the top line, where * matches an empty string, while + simply won't match. Finally, the ? specifier matches 0 or 1 repetitions of the character_classes : local pattern = "%d?" print ( string.match ( "abc" , pattern )) -- print ( string.match ( "1abc" , pattern )) -- 1 print ( string.match ( "12abc" , pattern )) -- 1 print ( string.match ( "123abc" , pattern )) -- 1 Now that we have the tools we need to understand and create patterns, let's move on to the next topic, captures .
Substitution in Lua Patterns
Lua supports substitution , which allows matched strings to be replaced with some replacement text. For example, going back to our phone number example suppose we want to redact phone numbers out of a block of text: local pattern = "+(%d)[(](%d%d%d)[)](%d%d%d)[-](%d%d%d%d)" -- (cc) ( ac ) ( px ) ( sx ) local target = "abc +1(234)567-8910 def" local repl = "+x(xxx)xxx-xxxx" local result , count = string.gsub ( target , pattern , repl ) print ( result ) -- abc +x(xxx)xxx-xxxx def print ( count ) -- 1 Suppose we want to generalize the pattern to apply to a wider range of phone number formats. In order to accomplish this goal we need to match varying-length sequences of digits, then replace them with a redacted string of the same length. This can be achieved in several ways, but we will take this as an opportunity to demonstrate using a function for the replacement. When a function is used for the replacement, the function is passed all captured text as arguments, of if pattern defines no captures, then the entire matched string will be passed to the function as a single argument. The function is expected to return either a string or number that will be used as the replacement text, or a falsy value to indicate that the replacement should not be made (and the original text retained). To demonstrate, let's create a simple pattern that matches any sequence of digits, then return a redaction string of the same length: Replacement Functions local pattern = "%d+" local target = "abc +1(234)567-8910 def" local repl = function ( capture ) return string.rep ( "x" , string.len ( capture )) end local result , count = string.gsub ( target , pattern , repl ) print ( result ) -- abc +x(xxx)xxx-xxxx def print ( count ) -- 4 The next feature of string.gsub we want to demonstrate is the ability to use a table for the replacement. When a table is used for the replacement, each capture is used as a table key, then if the table returns a value that value is used to replace the captured text, otherwise the original text is retained. For this example, suppose that the phone company has decided to introduce some new area codes, requiring existing phone numbers to be updated to the new area code. In this case we could define a table that maps old area codes to new area codes, then use patterns to match and update phone numbers in a document: Replacement Tables local pattern = "%d%d%d" local target = "abc +1(234)567-8910 def" local repl = {} repl [ "234" ] = "111" repl [ "345" ] = "222" repl [ "456" ] = "333" local result , count = string.gsub ( target , pattern , repl ) print ( result ) -- abc +1(111)567-8910 def print ( count ) -- 3 A convenient feature of this implementation is that phone numbers with area codes that have not changed will not be updated, as long as their area codes are not in the replacement table. The final function that we wanted to touch on is the ability to reference captured text in the replacement string. To demonstrate, let's suppose that we want to change the format of the phone numbers in our document. To do so we will use the same pattern that we created earlier, but now we want to reference the captured text in the replacement string. Capture References Captured text is referenced from the replacement string by index, starting from the left side of the pattern each capture group is numbered by the position of it's left-most parenthesis, starting from 1 and continuing for each capture group. In our example there are four capture groups, numbered 1 through 4, and we build our replacement string as shown below: local pattern = "+(%d)[(](%d%d%d)[)](%d%d%d)[-](%d%d%d%d)" -- 1 2 3 4 local target = "text +1(234)567-8910 text" local repl = "%1.%2.%3.%4" local result , count = string.gsub ( target , pattern , repl ) print ( result ) -- text 1.234.567.8910 text print ( count ) -- 1
Sorting Lists in Lua
One very common operation on lists is sorting them. Lua's table library contains the table.sort function local x = { "b" , "e" , "a" , "d" , "c" , } Default Sort -- list is sorted in-place table.sort ( x ) print ( x [ 1 ]) -- a print ( x [ 2 ]) -- b print ( x [ 3 ]) -- c print ( x [ 4 ]) -- d print ( x [ 5 ]) -- e Sorting in Reverse -- list is sorted in-place table.sort ( x , function ( a , b ) return a > b end ) print ( x [ 1 ]) -- e print ( x [ 2 ]) -- d print ( x [ 3 ]) -- c print ( x [ 4 ]) -- b print ( x [ 5 ]) -- a Random Order local math = require ( "math" ) -- list is sorted in-place table.sort ( x , function ( a , b ) return math.random () > 0.5 end ) print ( x [ 1 ]) -- d print ( x [ 2 ]) -- a print ( x [ 3 ]) -- b print ( x [ 4 ]) -- c print ( x [ 5 ]) -- e