Pattern Basics

We saw in the searching chapter that we can search for literal strings in our text. In this chapter we expand upon searching for literal strings by introducing patterns.

A pattern defines a specification for describing strings. When a pattern is compared to a "target" string, if the pattern describes the string we say that the target string "matches the pattern". Patterns are generally composed of a sequence of smaller patterns, called "atoms". These atoms are arranged in a specific sequence to create the specification.

As a simple example, let's build a pattern that matches phone numbers of the format:

+1(234)567-8910

The first step is to break this string into a sequence of components:

+[country code]([area code])[prefix]-[line number]

Next, we need to specify each component in terms of regular expressions, then combine them into the pattern. One possible final pattern might be:

+\d(\d\{3\})\d\{3\}-\d\{4\}

which is made up of:

  1. the character range "\d", which specifies that the characters in those locations must be digits,
  2. quantifiers such as {3}, which specify how many digits appear in each location, and
  3. literal strings "+", "(", ")", and "-".