ripgrep - Selecting Input Files

Now that we have learned the basics of working with ripgrep, let's start looking a bit deeper at how to use ripgrep effectively. Each time ripgrep is called it goes through a few basic steps:

Select files to be searched
Apply the specified pattern(s) to each in the specified input(s)
Format and return each line of output

In this chapter we look at step #1 more closely.

Smart Filtering

By default, ripgrep's "smart filtering" algorithm is applied, which collects inputs by searching each specified file and/or recursively searching through the specified directories. During this process, by default ripgrep:

ignores any files and directories that are defined in your .gitignore/.ignore file(s),
ignores any hidden files and directories, and
ignores any binary files

These are sensible defaults and are pretty close to what we want in many cases, but there are some cases where we want different behavior, and ripgrep provides a variety of options that allow us to modify that behavior.

The first option is to use the -u/--unrestricted option to reduce the level of default, or "smart", filtering. This option can be repeated up to 3 times in order to define the types of filtering that are desired, and each time it is called one of the 3 filter types above is disabled:

Applying this option once disables filtering according to your .gitignore/.ignore file(s),
Applying this option a second time disables the hidden file filter, and finally
Apply this option a third time disables the binary file filter.

This option is convenient, but can be a bit of a blunt instrument in some cases since, for example, it requires that the .gitignore/.ignore filter be disabled in order to search hidden files, which is often not what we need to do. As one might expect, ripgrep allows each filter to be enabled and disabled directly.

Ignoring `.gitignore/.ignore`

The --ignore and --no-ignore flags can be used to directly control whether or not to filter inputs according to .gitignore/.ignore. ripgrep actually provides even greater control over how it selects and treats .gitignore/.ignore files, which we will get into a bit later.

However, it is helpful to note that when both .gitignore and .ignore are present and not ignored, the rules in .gitignore are read first, then .ignore is read. This means that rules in .ignore take precedence over those defined in .gitignore.

Hidden Files and Directories

The --hidden and --no-hidden flags can be used to directly control how ripgrep is to treat hidden files and directories.

Binary Files

The --binary and --no-binary flags can be used to directly control how ripgrep treats binary files. However, there is a bit of nuance here since, after all, every file is technical a "binary" file.

By default, ripgrep deems a file to be "binary" if it encounters a NUL byte while parsing it. If encountered, ripgrep will throw away matches that may have occurred in the file, print a warning to the console, and return. However, when the --binary flag is used, if ripgrep encounters a NUL byte it continues searching until either a match is found, at which point it prints a warning to the console, ignores the match, and returns, or it reaches the end of the file.

ripgrep's --text and --no-text flags can also be used to disable NUL byte handling and search binary files as though they were text files. This can be a bit dangerous, since binary contents may be printed to the console if there is a match, which may cause escape codes to be printed that can cause problems in your terminal emulator.

In general, we recommend just using ripgrep's default setting unless you have a specific reason not to do so.

Traversing Directories

ripgrep gives us a few additional tools that we can use to define how we traverse directories searching for files for input.

To start, when ripgrep searches a directory and finds sub-directories, then searches those directories. In this case, we can say that the search has proceeded to a depth of "1", meaning the directory level below that in which the search started. It those directories contain even "deeper" sub-directories then ripgrep will proceed to depths of "2", "3", and maybe more before it finds the "bottom". As you might imagine this searching can take time, so in some cases it makes sense limit how "deep" ripgrep will descend into the directory tree.

The -d/--max-depth option provides exactly this functionality, allowing the maximum depth to which it should descend to be defined in order to avoid unnecessary directory traversal. When using this option, a depth of "0" indicates that ripgrep should only search the specified paths themselves, a depth of "1" indicates it should search only the immediate sub-directories, and numbers greater than 1 indicate that ripgrep should search deeper, but only up to that number of levels sub-directories before stopping.

Directory traversal can also sometimes lead to paths that exist on other file systems, which can significantly slow a search down due to network latency. The --one-file-system option tells ripgrep that it should only search for inputs on the file system from which the search began, and simply ignore any paths that exist on other file systems. One interesting thing about this option is that it still allows a single search to define paths that exist on different file systems, but it will limit directory traversal for each search to its own file system. If needed, this option can be disabled using the complementary --no-one-file-system option.

Filtering Files by File Type

ripgrep also provides a variety of options that filter encountered files in various ways, allowing greater control over the inputs.

One of the most common options is the --type option, which defines the file types that should be searched, and has the call signature:

rg --type <filetype> <pattern>

where filetype defines the file type to search, such as md, markdown, or txt, and pattern follows the same regular expression conventions we have reviewed in previous chapters. This option can be repeated several times to combine multiple file types in the same search.

There are also times where we want to do the opposite - we want to search for a pattern in any files except for one or two file types. ripgrep has us covered there too:

rg --type-not <filetype> <pattern>

As with --type, this can be called multiple times to omit multiple file types.

Both --type and --type-not take the <filetype> parameter, which expects the name of a supported file type. You can list all of the file types that ripgrep supports by calling:

rg --type-list

which will print all support file types, which can be quite long. If you know what you are looking for you pipe this output back to ripgrep to filter it. For exapmle:

rg --type-list | rg markdown

will take the complete list of supported file types and filter it down to only those lines that include markdown. Pretty cool.

Although we won't go into detail in this chapter, we should also note that ripgrep provides options for adding and removing file types, so that you can work with custom file types or remove some file types that you might not want ripgrep to search. If you want to learn more about these, check the ripgrep help screen for --type-add and --type-clear.

Filtering Files by File Size

It stands to reason that large files can take a long time to search. Similarly, there are also time where we know that our target files don't exceed a certain size. In both cases, ripgrep's --max-filesize option can help focus the search on the right files.

This option has the call signature:

rg --max-filesize <num><suffix>?

where num is a number, and suffix is an optional K, M, or G, corresponding to kilobytes, megabytes, and gigabytes, respectively. When no suffix is provided, then num is treated as bytes.

Following Symbolic Links

By default, ripgrep ignores symlinks while traversing directories, although can be enabled using the --follow option or, if it is already enabled, disabled with the --no-follow option.

Filtering Files and Directories by Name

Last but not least, filtering files and directories by name has the highest specificity, and therefore ripgrep treats it with the highest precedence. In order words, searching by name always take precedence over other methods of filtering out files and directories, which can be a very useful feature in some situations.

ripgrep filters file and directory names using "globs", which are similar to regular expressions in that they are a means of pattern-matching, but they are focused matching file and directory names and use a syntax that is specific to that application.

Specify globs using the --glob option, as in:

rg print --glob *.py

Notice that the glob option appears in the position of the call signature, which is unlike other options, but makes sense - the output of the glob defines the paths that ripgrep will search for matches.

glob Syntax

ripgrep's follows .gitignore style globs, which have a few characteristics:

We can invert the glob by prefixing it with a !, meaning that any matching files and directories should be excluded from the input. Note that this can take some getting used to. For example, a file that was previously-excluded but then matches an inverted pattern will become included again.

A / marks a directory separator, which may occur at the beginning, in the middle, or at the end of a glob. If the separator occurs at the beginning and/or middle of the pattern, then the pattern is considered to be relative to the current working directory. Otherwise, the pattern can match at any level below the current working directory, and can match both files and directories. On the other hand, if the separator occurs at the end of the pattern then the pattern will only match directories.

An asterisk * matching anything but a slash (/), while a ? matches any single character except the slash /. globs also have limited support for simple character classes, meaning that [a-zA-z] can be used to match a character within the specified range.

Finally, two consecutive asterisks (**) in patterns provides some special features:

First, globs that start with **/pattern recursively descend through directories looking for pattern matches at any level.

If the glob includes a trailing pattern/**, on the other hand, everything below the specified pattern matched.

Finally, two consecutive asterisks contained in the middle of a pattern such as left/**/right indicate that the left side pattern should match first, then after that continue searching for the right side pattern at any depth of sub-directories.

Case Sensitivity

ripgrep provides two options for defining the case sensitivity of the glob. First, the --iglob option is equivalent to the --glob option, except it indicates that the search should be done in a case-insensitive manner.

A second option is more explicit, but also more verbose - specifying the --glob-case-insensitive option executes a case-insensitive search, while a case-sensitive search can be executed by specifying the --no-glob-case-insensitive option.

Which option to use is more or less a matter of personal taste and style.

A Bit More about `.gitignore`

Before we close out this chapter, let's get back to one of the very first topics we covered - .gitignore files. By default, ripgrep ignores any files and directories that are listed in the .gitignore file, and we saw at the top of this chapter that we can use the --ignore and --no-ignore options the enable and disable that feature. That is a fairly blunt operation, however, as .gitignore files often contain a complex list of patterns that address a variety of files and directories that are unique to each programming language and framework that might be used in a project. ripgrep includes some additional flexibility that can come in handy from time to time.

First, we should repeat that the rules defined by globs have the highest precedence, and overrule any conflicting rules that might be defined in any ignore file. There are many cases where monkeying around with ignore files seems to be the right thing to do, but in reality globs can provide a more directly path to achieving the desired result.

However, for the sake of completeness, ripgrep allows an additional ignore file to be specified with the --ignore-file option, which defines an ignore file that should be read after both .gitignore and .ignore, meaning that it takes precedence over both of those files. This can provide a flexible way to change the rules that are to be in effect during a search.

ripgrep offers a handful of other options for fine-tuning how ignore files are interpreted and used. While we don't describe them in detail here if they sound interesting you can find more information about them in the ripgrep documentation.