The cut
command is used to extract information from each line of content passed to
stdin.
The call signature is:
cut {options} {path}
When path is specified, the contents of that file are read and passed to stdin.
cut
is often used to either extract fields from
delimited files, or to extract positional
data by index. Let's look at each of those:
Extract By Delimited Fields
Data can be extracted from delimited fields in two steps.
First, each line is split based on the delimiter. By default the cut
command assumes
tab-delimited data, though this can be changed using the -d
option, followed by the
character to use as a delimiter.
Second, information is selected and returned by numerical index, starting from 1. Field index specifications can take several forms:
Spec | Meaning |
---|---|
n |
include field n |
n,m |
include fields n and m |
n-m |
include fields from n to m |
n- |
include fields from n to the end |
-m |
include fields from 1 to m |
where n and m are integers.
To demonstrate, suppose you have the following file:
and want to extract the color name from each line. This can be achieved by first setting
the delimiter to ,
, then selecting the first field from each line:
Alternately, if we wanted to extract the hex value for each color, simply select the second field:
Multiple fields can be specified using comma-separated values. One quirk of the cut
command is that fields are returned in the order they are read, not in the order they
are specified. For example, if we wanted to reverse fields 1 and 2 and return them we
might try the following:
Note that both fields are returned in the original order, despite the ordering of the spec.
Extract By Index
Data can be extracted from each line by column/position by specifying which column(s) to return. Columns are specified in a manner similar to above:
Spec | Meaning |
---|---|
n |
include column n |
n,m |
include columns n and m |
n-m |
include columns from n to m |
n- |
include columns from n to the end |
-m |
include columns from 1 to m |
where n and m are integers.
For example, to return only the first 6 characters from each line:
or to return 6 columns, starting from column 3:
Finally, more complex cases can combine multiple column-specifications as follows: