Basics


We saw in the searching chapter that we can search for literal strings in our text. In this chapter we expand upon searching for literal strings by introducing patterns.

A pattern defines a specification for describing strings. When a pattern is compared to a "target" string, if the pattern describes the string we say that the target string "matches the pattern". Patterns are generally composed of a sequence of smaller patterns, called "atoms". These atoms are arranged in a specific sequence to create the specification.

As a simple example, let's build a pattern that matches phone numbers of the format:

+1(234)567-8910

The first step is to break this string into a sequence of components:

+[country code]([area code])[prefix]-[line number]

Next, we need to specify each component in terms of regular expressions, then combine them into the pattern. One possible final pattern might be:

+\d(\d\{3\})\d\{3\}-\d\{4\}

which is made up of:

  1. the character range "\d", which specifies that the characters in those locations must be digits,
  2. quantifiers such as {3}, which specify how many digits appear in each location, and
  3. literal strings "+", "(", ")", and "-".