unknown_document

We couldn't find that page... Did you mean one of these?


Groups
When building patterns it is often helpful to group sub-patterns together, for example to use the group in alternation or to apply a quantifier . For example, A group is created by wrapping the sub-pattern in \%(...). Technically, this creates a non-capturing group , which is slightly different from a capturing group which is discussed in the next section. All information in this section applies to both capturing and non-capturing groups, but this section uses the non-capturing notation because it is more efficient when capturing is not required. Notation Group Type \%( ... \) Non-capturing \( ... \) Capturing As a simple example, suppose we want to construct a pattern that matches "abcabcabc". Following the discussion in the quantifiers section, we might write: abc\{3\} Let's test this pattern using the following buffer: Initial Conditions a bc abcabcabc abcc abccc COMMAND Top 1:1 /abc\{3\} First we execute the search Search without group a bc abcabcabc abcc a bccc NORMAL 80% 4:1 then select the matching text what happened? The quantifier is applied to the atom on the left, which in this case is "c". To fix this, we need to create a group around the "abc" so that the entire string is treated as an atom: \%(abc\) then apply the quantifier to the group: \%(abc\)\{3\} and select the matched text Select matching text a bc a bcabcabc abcc abccc NORMAL 40% 2:1 which confirms that the updated pattern achieves our goal.
Look-Arounds
Look-arounds are used to check what comes before or after, without consuming or capturing. ("Without consuming" means that matches for look-around assertions no not become part of the string to be replaced. Look-around patterns have two components: The main pattern to match, and a "test" pattern that will be used to qualify the main pattern's match The test pattern does not consume or capture any text, it is purely a logical test. One can think of look-arounds as "conditional patterns". That is, they create patterns that only match if another condition is met. Look-arounds come in two types: Positive Match this pattern if the test pattern also matches Negative Match this pattern if the test pattern does not match While each look-around applies in one of two directions: Ahead The test pattern is applied to the right of the main pattern Behind The test pattern is applied to the left of the main pattern We will take a look at each of these in the following sections.
Anchors
By default, patterns match anywhere in a string. In many cases you want patterns to match in only specific parts of a string. For example, if you want to match words at start with a pattern you don't want to match characters in the middle of the word. This can be achieved using anchors . Anchors don't match any characters, but instead they specify where in a string matches are allowed to begin and/or end. Some common anchors are: Anchor Description ^ Match at the start of a line $ Match at the end of a line \< Match at the start of a word \> Match at the end of a word \%^ Match at the start of a file \%$ Match at the end of a file Lets demonstrate the behavior of some of the more common anchors using the buffer below: Initial Conditions J ohn Evans Kim Aaron Mia Johnson Johnny Matthew Martin Nancy Kathleen John Heather Vanessa Perez-Johnson Scott Johnathon Ford Johnnie Dillon Odom Jessica Jennifer LittleJohn COMMAND Top 1:1 /John First, let's perform an un-anchored search to show the default behavior: Search for John J ohn Evans Kim Aaron Mia J ohn son John ny Matthew Martin Nancy Kathleen John Heather Vanessa Perez- John son Scott John athon Ford John nie Dillon Odom Jessica Jennifer Little John NORMAL 22% 2:11 We found 8 instances of the string "John", which appear at all different parts of the respective strings. Next lets add the "^" anchor and search again: Anchored search for John John Evans Kim Aaron Mia Johnson J ohn ny Matthew Martin Nancy Kathleen John Heather Vanessa Perez-Johnson Scott Johnathon Ford John nie Dillon Odom Jessica Jennifer LittleJohn NORMAL 33% 3:1 This matches only the 3 locations where "John" appears at the start of the line. Compare that to search with the "$" anchor: Anchored search for John J ohn Evans Kim Aaron Mia Johnson Johnny Matthew Martin Nancy Kathleen J ohn Heather Vanessa Perez-Johnson Scott Johnathon Ford Johnnie Dillon Odom Jessica Jennifer Little John NORMAL 44% 4:16 Here we match at the only two locations where the lines end with "John". Now let's try anchoring to work boundaries. First, let's try anchoring to the start of the word with "\<" Anchored search for John John Evans Ki m Aaron Mia J ohn son John ny Matthew Martin Nancy Kathleen John Heather Vanessa Perez- John son Scott John athon Ford John nie Dillon Odom Jessica Jennifer LittleJohn NORMAL 22% 2:11 Finally, lets anchor to the tail of the word, using ">": Anchored search for John John Evans Kim Aaron Mia Johnson Johnny Matthew Martin Nancy Kathleen J ohn Heather Vanessa Perez-Johnson Scott Johnathon Ford Johnnie Dillon Odom Jessica Jennifer Little John NORMAL 44% 4:16
Positive Look-behind
Positive Look-Behind patterns are look-around patterns that match only if the text immediately to the left of the match also matches the test-pattern. As with all look-around patterns, the "test pattern" is not part of the match. To demonstrate, lets re-implement our solution from the match-boundaries section using a pattern containing a positive look-behind assertion. In this example we have the following buffer, which contains a variety of dates: Initial Conditions 2 022-01-14 2022-07-09 2022-10-23 2022-12-20 2023-01-26 2023-03-17 2023-08-06 2023-11-05 COMMAND Top 1:1 /\(2022-\)\@<=\(1\d-\d\d\) Our task is to match the month and day on all lines for which the year is 2022 and the month is in Q4 (months 10, 11, and 12). The first step is to create the pattern that matches our desired content, the month and day: \(1\d-\d\d\) The next step is to define the test pattern. We only want to match lines for which the year is 2022, so we can use the test pattern: \(2022-\) Finally, to mark the test pattern as a positive look-behind assertion we add \@<=, which gives us the final pattern: \(2022-\)\@<=\(-1d-\d\d\) When we execute this we get the same result as before. Move cursor down 2 022-01-14 2022-07-09 2022- 1 0-23 2022- 12-20 2023-01-26 2023-03-17 2023-08-06 2023-11-05 NORMAL 33% 3:6
Positive Look-ahead
Positive Look-Ahead patterns are look-around patterns that match only if the text immediately to the right of the match also matches the test-pattern. As with all look-around patterns, the "test pattern" is not part of the match. This can be better explained with an example. Using our buffer containing a variety of dates: Initial Conditions 2 022-01-14 2022-07-09 2022-10-23 2022-12-20 2023-01-26 2023-03-17 2023-08-06 2023-11-05 COMMAND Top 1:1 /\(\d\d\d\d\)\(-1\)\@= suppose we want to match all years for which the month is in Q4 (months 10, 11, and 12). The first step is to create the pattern that matches our desired content, the year portion of the date: \(\d\d\d\d\) Now that we have defined the main pattern, the next step is to define the test pattern. We only want to match the year when the month starts with a "1", so we can use the test pattern: \(-1\) Finally, to mark the test as a positive look-ahead assertion we use \@= to get the final pattern: \(\d\d\d\d\)\(-1d\)\@= when we execute this pattern: Move cursor down 2 022-01-14 2022-07-09 2 022 -10-23 2022 -12-20 2023-01-26 2023-03-17 2023-08-06 2023 -11-05 NORMAL 33% 3:1 Note that although our main pattern will match any year, only those years for which the month also matched the test pattern are highlighted.
Negative Look-behind
Negative Look-Behind patterns are look-around patterns that match only if the text immediately to the left of the match does not match the test-pattern. As with all look-around patterns, the "test pattern" is not part of the match. This can be better explained with an example, again using our buffer containing a variety of dates: Initial Conditions 2 022-01-14 2022-07-09 2022-10-23 2022-12-20 2023-01-26 2023-03-17 2023-08-06 2023-11-05 COMMAND Top 1:1 /\(2023-\)\@<!\(1\d\) Suppose we want to match all months in Q4 (months 10, 11, and 12) that are not in 2023. The first step is to create the pattern that matches our desired content, the month portion of the date: \(1d\) Now that we have defined the main pattern, the next step is to define the test pattern. We only want to match months for which the year is not 2023, so we can use the test pattern: \(2023-\) Finally, to mark the test as a negative look-behind assertion we use \<! to get the final pattern: \(2023-\)\@<!\(1\d\)" when we execute this pattern: Search with initial pattern 2 022-01- 1 4 2022-07-09 2022- 10 -23 2022- 12 -20 2023-01-26 2023-03- 17 2023-08-06 2023-11-05 NORMAL Top 1:9 Er... not quite what we were hoping for. This result highlights one of the complications with using negative look-arounds - it is very easy to create patterns that don't match. In this case, our test pattern works fine when assuming that only the month will match our main pattern, but days that start with a "1" also match the main pattern, and the month never matches the test pattern when the day matches. We can fix this by adding an alternate pattern to the test, so that it matches whenever the test is matched to a month. \(\(2023-\)\|\(-\d\d-\)\) Finally, with the modified test we get the result we were looking for: Search with updated pattern 2022-01- 1 4 2022-07-09 2022- 1 0 -23 2022- 12 -20 2023-01-26 2023-03-17 2023-08-06 2023-11-05 NORMAL 33% 3:6
Negative Look-ahead
Negative Look-Ahead patterns are look-around patterns that match only if the text immediately to the right of the match does not match the test-pattern. As with all look-around patterns, the "test pattern" is not part of the match. This can be better explained with an example, again using our buffer containing a variety of dates: Initial Conditions 2 022-01-14 2022-07-09 2022-10-23 2022-12-20 2023-01-26 2023-03-17 2023-08-06 2023-11-05 COMMAND Top 1:1 /\(2022\)\(-1\)\@! suppose we want to match the year 2022 for each date that is not in Q4 (months 10, 11, and 12). The first step is to create the pattern that matches our desired content, which is simply: \(2022\) Now that we have defined the main pattern, the next step is to define the test pattern. We only want to match the year when the month starts with a "1", so we can use the test pattern: \(-1\) Finally, to mark the test as a negative look-ahead assertion we use \@! to get the final pattern: \(2022\)\(-1\)\@! when we execute this pattern: Move cursor down 2 022 -01-14 2 022 -07-09 2022-10-23 2022-12-20 2023-01-26 2023-03-17 2023-08-06 2023-11-05 NORMAL 22% 2:1 Note that although our main pattern will match any time that the year is 2022, the test matches any month that starts with "1", which cancels the match. As a result, only those line for which the year is 2022 and the month does not start with a "1" are highlighted.
Basics
We saw in the searching chapter that we can search for literal strings in our text. In this chapter we expand upon searching for literal strings by introducing patterns . A pattern defines a specification for describing strings. When a pattern is compared to a "target" string, if the pattern describes the string we say that the target string "matches the pattern". Patterns are generally composed of a sequence of smaller patterns, called "atoms". These atoms are arranged in a specific sequence to create the specification. As a simple example, let's build a pattern that matches phone numbers of the format: +1(234)567-8910 The first step is to break this string into a sequence of components: +[country code]([area code])[prefix]-[line number] Next, we need to specify each component in terms of regular expressions, then combine them into the pattern. One possible final pattern might be: +\d(\d\{3\})\d\{3\}-\d\{4\} which is made up of: the character range "\d", which specifies that the characters in those locations must be digits, quantifiers such as {3}, which specify how many digits appear in each location, and literal strings "+", "(", ")", and "-".
Repetition
Many patterns include atoms that repeat multiple times. For example, to match a string with 6 letters one might define the pattern: \w\w\w\w\w\w which is both cumbersome, and limits the pattern to only a specific number of characters. What if we wanted to match words with 6 or 7 letters? This is resolved using quantifiers , which specify how atoms repeat in a pattern. Using quantifiers we could express the previous pattern as: \w\{6} which specifies that the "\w" atom is to be repeated 6 times. Greedy vs Non-greedy Greedy matching is the default behavior of regular expressions, where the regular expression engine will try to match as much text as possible. In contrast, non-greedy matching, also known as lazy matching, tries to match as little text as possible. Specifier From To Greedy * 0 all_inclusive Yes + 1 all_inclusive Yes ? 0 1 Yes = 0 1 Yes {} 0 all_inclusive Yes {n,m} n m Yes {n} n {n,} n all_inclusive Yes {,m} 0 m Yes {-n,m} n m No {-n} n {-n,} n all_inclusive No {-,m} 0 m No {-} 0 all_inclusive No Note ? is the standard "0 or 1", but cannot be used in Vim reverse searches . For this reason, Vim additionally defines = as equivalent to ? .
Alternation
If it often helpful for patterns to allow multiple atoms to match in a specific location. In patterns, this concept is often called "alternation", or the ability to have alternate atoms match in a particular location. Alternatives are specified by separating them with "|": Pattern Matches (a|b) a or b (a|b|c) a, b, or c where a, b, and c represent any atoms. Atoms are evaluated from left to right, and first match is returned. Lets demonstrate how to use alternation with the following buffer, which contains a list of English language codes. Initial Conditions e n-au en-ca en-ie en-jm en-nz en-za en-gb en-us COMMAND Top 1:1 /au\|ca As a simple example, lets first demonstrate how to select either of the literal strings "au" or "ca": au\|ca Simple alternate e n- a u en- ca en-ie en-jm en-nz en-za en-gb en-us NORMAL Top 1:4 Next, lets try an example with nested alternation. Suppose we want to select a literal "au" or any other code that ends with "a" or "m". We start by creating the nested alternation: \w\(a\|m\) then create the outer alternation: au\|\w\%(a\|m\) and finally execute the search: Nested alternate en- a u en- c a en-ie en- jm en-nz en- za en-gb en-us NORMAL 22% 2:4