Alternation in Neovim Patterns

If it often helpful for patterns to allow multiple atoms to match in a specific location. In patterns, this concept is often called "alternation", or the ability to have alternate atoms match in a particular location. Alternatives are specified by separating them with "|": Pattern Matches (a|b) a or b (a|b|c) a, b, or c where a, b, and c represent any atoms. Atoms are evaluated from left to right, and first match is returned. Lets demonstrate how to use alternation with the following buffer, which contains a list of English language codes. Initial Conditions e n-au en-ca en-ie en-jm en-nz en-za en-gb en-us COMMAND Top 1:1 /au\|ca As a simple example, lets first demonstrate how to select either of the literal strings "au" or "ca": au\|ca Simple alternate e n- a u en- ca en-ie en-jm en-nz en-za en-gb en-us NORMAL Top 1:4 Next, lets try an example with nested alternation. Suppose we want to select a literal "au" or any other code that ends with "a" or "m". We start by creating the nested alternation: \w$a\|m$ then create the outer alternation: au\|\w\%(a\|m\) and finally execute the search: Nested alternate en- a u en- c a en-ie en- jm en-nz en- za en-gb en-us NORMAL 22% 2:4

Repetition in Neovim Patterns

Many patterns include atoms that repeat multiple times. For example, to match a string with 6 letters one might define the pattern: \w\w\w\w\w\w which is both cumbersome, and limits the pattern to only a specific number of characters. What if we wanted to match words with 6 or 7 letters? This is resolved using quantifiers , which specify how atoms repeat in a pattern. Using quantifiers we could express the previous pattern as: \w\{6} which specifies that the "\w" atom is to be repeated 6 times. Greedy vs Non-greedy Greedy matching is the default behavior of regular expressions, where the regular expression engine will try to match as much text as possible. In contrast, non-greedy matching, also known as lazy matching, tries to match as little text as possible. Specifier From To Greedy * 0 all_inclusive Yes + 1 all_inclusive Yes ? 0 1 Yes = 0 1 Yes {} 0 all_inclusive Yes {n,m} n m Yes {n} n {n,} n all_inclusive Yes {,m} 0 m Yes {-n,m} n m No {-n} n {-n,} n all_inclusive No {-,m} 0 m No {-} 0 all_inclusive No Note ? is the standard "0 or 1", but cannot be used in Vim reverse searches . For this reason, Vim additionally defines = as equivalent to ? .

Anchors in Neovim Patterns

By default, patterns match anywhere in a string. In many cases you want patterns to match in only specific parts of a string. For example, if you want to match words at start with a pattern you don't want to match characters in the middle of the word. This can be achieved using anchors . Anchors don't match any characters, but instead they specify where in a string matches are allowed to begin and/or end. Some common anchors are: Anchor Description ^ Match at the start of a line $ Match at the end of a line \< Match at the start of a word \> Match at the end of a word \%^ Match at the start of a file \%$ Match at the end of a file Lets demonstrate the behavior of some of the more common anchors using the buffer below: Initial Conditions J ohn Evans Kim Aaron Mia Johnson Johnny Matthew Martin Nancy Kathleen John Heather Vanessa Perez-Johnson Scott Johnathon Ford Johnnie Dillon Odom Jessica Jennifer LittleJohn COMMAND Top 1:1 /John First, let's perform an un-anchored search to show the default behavior: Search for John J ohn Evans Kim Aaron Mia J ohn son John ny Matthew Martin Nancy Kathleen John Heather Vanessa Perez- John son Scott John athon Ford John nie Dillon Odom Jessica Jennifer Little John NORMAL 22% 2:11 We found 8 instances of the string "John", which appear at all different parts of the respective strings. Next lets add the "^" anchor and search again: Anchored search for John John Evans Kim Aaron Mia Johnson J ohn ny Matthew Martin Nancy Kathleen John Heather Vanessa Perez-Johnson Scott Johnathon Ford John nie Dillon Odom Jessica Jennifer LittleJohn NORMAL 33% 3:1 This matches only the 3 locations where "John" appears at the start of the line. Compare that to search with the "$" anchor: Anchored search for John J ohn Evans Kim Aaron Mia Johnson Johnny Matthew Martin Nancy Kathleen J ohn Heather Vanessa Perez-Johnson Scott Johnathon Ford Johnnie Dillon Odom Jessica Jennifer Little John NORMAL 44% 4:16 Here we match at the only two locations where the lines end with "John". Now let's try anchoring to work boundaries. First, let's try anchoring to the start of the word with "\<" Anchored search for John John Evans Ki m Aaron Mia J ohn son John ny Matthew Martin Nancy Kathleen John Heather Vanessa Perez- John son Scott John athon Ford John nie Dillon Odom Jessica Jennifer LittleJohn NORMAL 22% 2:11 Finally, lets anchor to the tail of the word, using ">": Anchored search for John John Evans Kim Aaron Mia Johnson Johnny Matthew Martin Nancy Kathleen J ohn Heather Vanessa Perez-Johnson Scott Johnathon Ford Johnnie Dillon Odom Jessica Jennifer Little John NORMAL 44% 4:16

Character Ranges in Neovim Patterns

Patterns define the characters that match at a particular location by specifying a "character range". Character ranges are specified with a "bracket expression", which defines the allowed characters. For example, the following specifies that the character must be one of "a", "b", or "c": [abcdef] As a simple demonstration, a pattern that matches either grey or gray would be: gr[ae]y Contiguous ranges When a range includes a contiguous sequence of characters the "-" operator can be used to express all characters in the specified range. For example, the preceding character range can also be expressed in the more compact form: [a-f] To include a literal "-" in a character range, place the "-" at either the beginning or the of the range. For example, to define range consisting of lower-case letters and a hyphen: [a-z-] Inverted Ranges It can often be convenient to specify which characters do not match, by specifing an "inverted range". A range is inverted when the first character of a range is a "^" For example, to match words that do not include a , b , or c , one could use the character range: [^a-c] Character escapes Character ranges can include some non-printable characters. Shortcut Character \t Tab \n New line Pre-defined Ranges Character ranges are a very convenient way to represent groups of characters. Some ranges are so common that there is an even shorter notation for them. Shortcut Range Inverted Range \l [a-z] \L [^a-z] \u [A-Z] \U [^A-Z] \a [a-zA-Z] \A [^a-zA-Z] \h [a-zA-Z_] \H [^a-zA-Z_] \w [a-zA-Z0-9_] \W [^a-zA-Z0-9_] \d [0-9] \D [^0-9] \x [0-9a-fA-F] \X [^0-9a-fA-F] \o [0-7] \O [^0-7] \s [ \t] \S [^ \t] Meta-characters Shortcut Match . Any character except \n _ Any character (including \n)

Neovim Pattern Basics

We saw in the searching chapter that we can search for literal strings in our text. In this chapter we expand upon searching for literal strings by introducing patterns . A pattern defines a specification for describing strings. When a pattern is compared to a "target" string, if the pattern describes the string we say that the target string "matches the pattern". Patterns are generally composed of a sequence of smaller patterns, called "atoms". These atoms are arranged in a specific sequence to create the specification. As a simple example, let's build a pattern that matches phone numbers of the format: +1(234)567-8910 The first step is to break this string into a sequence of components: +[country code]([area code])[prefix]-[line number] Next, we need to specify each component in terms of regular expressions, then combine them into the pattern. One possible final pattern might be: +\d(\d\{3\})\d\{3\}-\d\{4\} which is made up of: the character range "\d", which specifies that the characters in those locations must be digits, quantifiers such as {3}, which specify how many digits appear in each location, and literal strings "+", "(", ")", and "-".

Groups in Neovim Patterns

When building patterns it is often helpful to group sub-patterns together, for example to use the group in alternation or to apply a quantifier . For example, A group is created by wrapping the sub-pattern in \%(...). Technically, this creates a non-capturing group , which is slightly different from a capturing group which is discussed in the next section. All information in this section applies to both capturing and non-capturing groups, but this section uses the non-capturing notation because it is more efficient when capturing is not required. Notation Group Type \%( ... \) Non-capturing $ ... $ Capturing As a simple example, suppose we want to construct a pattern that matches "abcabcabc". Following the discussion in the quantifiers section, we might write: abc\{3\} Let's test this pattern using the following buffer: Initial Conditions a bc abcabcabc abcc abccc COMMAND Top 1:1 /abc\{3\} First we execute the search Search without group a bc abcabcabc abcc a bccc NORMAL 80% 4:1 then select the matching text what happened? The quantifier is applied to the atom on the left, which in this case is "c". To fix this, we need to create a group around the "abc" so that the entire string is treated as an atom: \%(abc\) then apply the quantifier to the group: \%(abc\)\{3\} and select the matched text Select matching text a bc a bcabcabc abcc abccc NORMAL 40% 2:1 which confirms that the updated pattern achieves our goal.

Match Boundaries

By default, when a pattern matches a string the entire match is returned. For example, suppose we have a buffer that contains a number of dates: Initial Conditions 2 022-01-14 2022-07-09 2022-10-23 2022-12-20 2023-01-26 2023-03-17 2023-08-06 2023-11-05 COMMAND Top 1:1 /2022-1\d-\d\d Suppose we wanted to edit all dates that occur in Q4 (months 10, 11, and 12) of 2022. Our first thought might be to define capturing groups then use back-references to search and replace to make the necessary edits. This is a great solution when the edits are well-defined, but this hypothetical scenario requires us to manually-edit the dates. Our next thought might be to search for all dates that meet these constraints with a pattern such as: 2022-1\d-\d\d Search with generic pattern 2 022-01-14 2022-07-09 2 022-10-23 2022-12-20 2023-01-26 2023-03-17 2023-08-06 2023-11-05 NORMAL 33% 3:1 then jump to a result, move the cursor to the month and day, make the necessary edits, then repeat the search for each match. This works, but can be improved. Our pattern matches to correct lines, but returns the entire date when we only want the month and date. This is a good use-case for adding match boundaries . At a high-level, match boundaries break the searching process up into two steps: The entire pattern is used to define which text to matched, then The match boundaries define which portion of the matched text to return Match boundaries are defined using one or both of the following markers: Shortcut Definition \zs Defines where the matched text will start \ze Defines where the matched text will end When \zs is present, the match will start with the character immediately to the right of it. Likewise, when \ze is present, the match will end with the character immediately to the left of it. Back to our example, our current pattern does a good job of matching the correct lines in the buffer, we just need to isolate the match to the month and date. This can be done by defining the starting boundary: 2022-\zs1\d-\d\d and executing the search again: Search with start boundary 2 022-01-14 2022-07-09 2022- 1 0-23 2022- 12-20 2023-01-26 2023-03-17 2023-08-06 2023-11-05 NORMAL 33% 3:6 Now, when we just from match to match to make our edits, we are already in the correct location. Pretty cool. To extend the example, let's add an ending boundary to isolate only the month portion of the date: 2022-\zs1\d\ze-\d\d and execute again: Search with start and end boundaries 2 022-01-14 2022-07-09 2022- 1 0 -23 2022- 12 -20 2023-01-26 2023-03-17 2023-08-06 2023-11-05 NORMAL 33% 3:6 Match boundaries provide an extra degree of freedom that can be leveraged in some cases, such as this.

Back-References in Neovim Patterns

In the previous section we introduced non-capturing group as a way to group sub-patterns into atoms so that they can be used in alternation and with quantifiers . In this section we introduce capturing groups , which have all of the same behaviors and uses as non-capturing groups, but additionally maintain a reference to (i.e. capture ) the matched text so that it can be used later. As an example, suppose we need to write a pattern that matches both single and double-quoted text in a document. A reasonable pattern to achieve this might be: ["'][a-z -]["'] We will use the following buffer to test it: Initial Conditions l ine of text line with "double-quoted" text line of text line with 'single-quoted' text line of text some "double-quoted text with 'single-quoted' text inside" COMMAND Top 1:1 /[\"'][a-z -]\+[\"'] Lets run it. Pattern without back-reference l ine of text line with " double-quoted" text line of text line with 'single-quoted' text line of text some "double-quoted text with ' single-quoted' text inside" NORMAL 29% 2:11 Hmm, thats not what we wanted. So what happened? In order to accommodate both single and double quotes we included them both in the pattern. But, since either type of quotation is acceptable, Vim happily matched inconsistent quotation marks. What we need is for the pattern to match the same type of quotation mark. This can be achieved using back-references , so let's update our pattern. To use back-references, first define one or more capturing groups in the pattern, then add the back-reference: $["']$[a-z -]\1 In this example, the back-reference "\1" is used to refer to the capturing group. Since there is only one capturing group, the back-reference \1 refers to it. Patterns can have multiple capturing groups, and those groups can also intersect. To identify which number to use for a back-reference, start from the left side of the pattern and count the opening-parentheses until you reach the group in question, then use that number as the reference. In addition, the entire matched portion of the string can be returned using the "\0" back-reference. Lets now try our updated pattern: Pattern with back-reference l ine of text line with " double quoted" text line of text line with 'single quoted' text line of text some "double quoted text with 'single quoted' text inside" NORMAL 29% 2:11 Now, that is a bit closer to the result we were looking for, but we should allow nested comments. We can add this by making one more change: Alternate pattern with back-reference l ine of text line with " double quoted" text line of text line with 'single quoted' text line of text some "double quoted text with 'single quoted' text inside" NORMAL 29% 2:11 So how does the back-reference work? When a capturing group matches an atom, it retains a reference to that atom. When a back-reference is included in the pattern, then the previously-matched content is inserted in its place before the back-reference matches. Therefore, if a double-quote matched in the capturing group, then the back-reference is looking for a double-quote. This explains the difference between our second and third patterns. Vim initially matched the double-quote, but because the pattern did not allow quotations in the matched string, the string starting with the double quote no longer matched. Vim started a new match with the single quote, which ended with the closing single quote. With the third patter, the first match (starting with the double-quote) extended all the way to the closing double-quote, which was our final result. This example showed how back-references can be used in when searching , but back-references are also very useful when searching and replacing in order to use the matched text in the replacement. The replacing section contains information about this.

Neovim

Neovim originated as a fork of Vim and while it continues to maintain backward-compatibility with Vim, the two projects have slightly different goals which has led them to evolve down slightly different paths. While Vim continues to be a great project with a great community, Neovim has added several new features that in our opinion significantly upgrade the user experience: Lua First and foremost, one of the major advantages of Neovim over Vim is the inclusion of Lua as a first-class alternative to Vimscript for plugins and configuration . Lua is easy to learn, read, and write, executes quickly, and allows Neovim to benefit from a lot of great work being done by Lua's extensive community. Tree-sitter Tree-sitter is a fast document parser that maintains a syntax tree for each document as it is edited, which basically replaces slow and often-inaccurate regular expressions when implementing a variety of features. Prior to Tree-sitter, syntax-highlighting, indentation, and folding were implemented using regular expressions, which was often slow, inaccurate, and lacked features such as the ability to handle nested code blocks. By leveraging Tree-sitter's syntax tree, Neovim gains additional contextual information about the document that can be leveraged to provide accurate and consistent syntax highlighting, indentations, and folds, improved navigation between classes, functions, parameters, conditional statements, as well as some useful extensions to text objects . You can learn more Neovim's Tree-sitter integration at the nvim-treesitter and nvim-treesitter-textobjects repos. Language Server Protocol (LSP) Neovim includes a built-in Language Server Protocol client, which provides a wide range of functionality. Whereas Tree-sitter improves the experience of working with documents, LSP provides similar benefits to projects , to provide improved code-completion, snippets, formatting, jump to definition, refactoring, etc. Learn more about setting up Neovim's LSP at the nvim-lspconfig repo. More Information This is just a brief summary of the key improvements that Neovim offers vs Vim. If you are transitioning from Vim to Neovim you can find a complete list of the differences here Neovim.io Github Home Github Releases Documentation

Look-Arounds in Neovim Patterns

Look-arounds are used to check what comes before or after, without consuming or capturing. ("Without consuming" means that matches for look-around assertions no not become part of the string to be replaced. Look-around patterns have two components: The main pattern to match, and a "test" pattern that will be used to qualify the main pattern's match The test pattern does not consume or capture any text, it is purely a logical test. One can think of look-arounds as "conditional patterns". That is, they create patterns that only match if another condition is met. Look-arounds come in two types: Positive Match this pattern if the test pattern also matches Negative Match this pattern if the test pattern does not match While each look-around applies in one of two directions: Ahead The test pattern is applied to the right of the main pattern Behind The test pattern is applied to the left of the main pattern We will take a look at each of these in the following sections.

unknown_document