Back-references

In the previous section we introduced non-capturing group as a way to group sub-patterns into atoms so that they can be used in alternation and with quantifiers. In this section we introduce capturing groups, which have all of the same behaviors and uses as non-capturing groups, but additionally maintain a reference to (i.e. capture) the matched text so that it can be used later.

As an example, suppose we need to write a pattern that matches both single and double-quoted text in a document. A reasonable pattern to achieve this might be:

["'][a-z -]["']

We will use the following buffer to test it:

Initial Conditions
line·of·text
line·with·"double-quoted"·text
line·of·text
line·with·'single-quoted'·text
line·of·text
some·"double-quoted·text·with·'single-quoted'·text·inside"
COMMAND
Top
1:1
/[\"'][a-z -]\+[\"']

Lets run it.

Pattern without back-reference
line·of·text
line·with·"double-quoted"·text
line·of·text
line·with·'single-quoted'·text
line·of·text
some·"double-quoted·text·with·'single-quoted'·text·inside"
NORMAL
29%
2:11
 

Hmm, thats not what we wanted. So what happened? In order to accommodate both single and double quotes we included them both in the pattern. But, since either type of quotation is acceptable, Vim happily matched inconsistent quotation marks. What we need is for the pattern to match the same type of quotation mark. This can be achieved using back-references, so let's update our pattern.

To use back-references, first define one or more capturing groups in the pattern, then add the back-reference:

\(["']\)[a-z -]\1

In this example, the back-reference "\1" is used to refer to the capturing group. Since there is only one capturing group, the back-reference \1 refers to it.

Patterns can have multiple capturing groups, and those groups can also intersect. To identify which number to use for a back-reference, start from the left side of the pattern and count the opening-parentheses until you reach the group in question, then use that number as the reference.

In addition, the entire matched portion of the string can be returned using the "\0" back-reference.

Lets now try our updated pattern:

Pattern with back-reference
line·of·text
line·with·"double·quoted"·text
line·of·text
line·with·'single·quoted'·text
line·of·text
some·"double·quoted·text·with·'single·quoted'·text·inside"
NORMAL
29%
2:11
 

Now, that is a bit closer to the result we were looking for, but we should allow nested comments. We can add this by making one more change:

Alternate pattern with back-reference
line·of·text
line·with·"double·quoted"·text
line·of·text
line·with·'single·quoted'·text
line·of·text
some·"double·quoted·text·with·'single·quoted'·text·inside"
NORMAL
29%
2:11
 

So how does the back-reference work? When a capturing group matches an atom, it retains a reference to that atom. When a back-reference is included in the pattern, then the previously-matched content is inserted in its place before the back-reference matches. Therefore, if a double-quote matched in the capturing group, then the back-reference is looking for a double-quote.

This explains the difference between our second and third patterns. Vim initially matched the double-quote, but because the pattern did not allow quotations in the matched string, the string starting with the double quote no longer matched. Vim started a new match with the single quote, which ended with the closing single quote.

With the third patter, the first match (starting with the double-quote) extended all the way to the closing double-quote, which was our final result.

This example showed how back-references can be used in when searching, but back-references are also very useful when searching and replacing in order to use the matched text in the replacement. The replacing section contains information about this.