Streams


We have seen how to execute commands, and how to build pipelines that pass information between commands to perform more complex tasks. In this section, we will touch on how this information is passed between commands, using "streams".

A stream refers to the information that flows through the pipeline, from command to command. A command receives information from the "input stream", processes it, then passes the result to the "output stream" (or possibly sends error information to the "error stream", if needed).

To help solidify the concept of a stream, suppose there is a command that reads text from the input, modifies it in some way, then passes the result to the output. One might expect the process of executing that command to be:

  1. Read the input into memory

  2. Modify the content in memory

  3. Pass the modified content to the output

However, consider two cases:

  1. Suppose that the input comes from a file that requires more memory than is available in the system. In this case it would be impossible to load the entire file into memory, and the command would fail.

  2. Suppose that the input comes from a continuous source, such as a temperature sensor that reads at regular intervals. Since the temperature is constantly being monitored, there is no such concept as "the end of the input".

In order to handle these situations effectively, the process is a bit more like this:

  1. open the input source

  2. read a line of data

  3. if the end of the input stream is detected, stop processing and close the stream

  4. perform some operation on that line and pass the result to the output

  5. repeat from step 2

With this in mind, once the input stream is opened and a line is read, the command doesn't know how much data is contained in the input; it simply reads a line, processes it and passes the result to the output, then reads another until (possibly) all input data has been consumed. Similarly, the next command in the pipeline simply receives each line of data, processes it, then passes the result to its output. This flow of lines of data led this concept to be called a "stream".

Standard Streams

When any of input, output, or error are not specified, they each default to a specific "standard" stream. Under the hood, each stream is implemented as a file that can be read from or written to.

There are three standard streams:

Standard Input

Standard input, called stdin and sometimes referenced numerically as "0", is the stream from which a program reads its input data, if not otherwise specified.

Not all commands require an input stream. For example, the ls command, which displays information about files contained in a directory, reads input from the filesystem without any input data stream.

Standard Output

Standard output, called stdout and sometimes referenced numerically as "1", is the stream to which a program writes its output data. By default this is usually connected to the terminal, so that the results of a command are printed to the screen.

Not all commands generate output. For example, the mv command, which renames a file on the filesystem, does not generate any output when it is successfully invoked.

Standard Error

Standard error, called stderr and sometimes referenced numerically as "2", is an alternative output stream that used by commands for error or diagnostic information.

The main purpose of stderr is to allow a command to generate diagnostic feedback without polluting the output stream.

Note

This section is intended to provide just enough information to allow users to begin understanding and using streams in many common applications. If you want to dig a bit deeper into the streams and how they work a good place to start is Everything is a File.

One of the useful features of the standard streams is that they can be replaced by other streams, combined etc, sent to in a process called redirection, which is the topic of the next section.