Using the Linux cut command to grab portions of lines from files


One surprisingly easy command for grabbing a portion of every line in a text file on a Linux system is cut. It works something like awk in that it allows you to select only what you want to see from files, enabling you to pull fields (regardless of the delimiter used), characters or bytes. To check on cut, you can ask about its version like this:

$ cut --version
cut (GNU coreutils) 8.32
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by David M. Ihnat, David MacKenzie, and Jim Meyering.

Selecting by field

To illustrate how the cut command works, we’ll first run commands using a sample “cities” file that contains details of the largest cities in the US in a tab-separated format. The lines in this file look something like what is shown below:

$ head -5 cities
Rank  Name           State       2021 Pop.  2010 Census  Change  Density (mi²) Area (mi²)
1     New York City  New York    8,230,290    8,190,210          4,083        300
2     Los Angeles    California  3,983,540    3,795,510          1,266        469
3     Chicago        Illinois    2,679,080    2,697,480          1,756        227
4     Houston        Texas       2,323,660    2,100,280            541        640

To select a particular field from this file, you might use a command like this that shows the 4th field:

$ cut -f 4 cities | head -11
2021 Pop.
8,230,290
3,983,540
2,679,080
2,323,660
1,733,630
1,585,010
1,581,730
1,427,720
1,347,120
1,011,790

To add the city names to your selection, you would select the 2nd and 4th fields. Since the tab character is the default delimiter for the cut command, it easily extracts these fields.

$ cut -f2,4 cities | head -11
Name    2021 Pop.
New York City   8,230,290
Los Angeles     3,983,540
Chicago 2,679,080
Houston 2,323,660
Phoenix 1,733,630
Philadelphia    1,585,010
San Antonio     1,581,730
San Diego       1,427,720
Dallas  1,347,120
Austin  1,011,790

The string -f1-4 would display the first four fields in the file:

$ cut -f1-4 cities | head -5
Rank    Name    State   2021 Pop.
1       New York City   New York        8,230,290
2       Los Angeles     California      3,983,540
3       Chicago Illinois        2,679,080
4       Houston Texas   2,323,660

To specify a different delimiter, you could add the -d option and use a command like this one, which pulls usernames from the /etc/passwd file:

$ cut -d: -f 1 /etc/passwd | head -10
root
bin
daemon
adm
lp
sync
shutdown
halt
mail
operator

To select to see both login names and assigned shells, try this:

$ cut -d: -f 1,7 /etc/passwd | head -10
root:/bin/bash
bin:/sbin/nologin
daemon:/sbin/nologin
adm:/sbin/nologin
lp:/sbin/nologin
sync:/bin/sync
shutdown:/sbin/shutdown
halt:/sbin/halt
mail:/sbin/nologin
operator:/sbin/nologin

The command above selects the 1st and 7th fields.

To count how many accounts use each of the shells, use a command like this:

$ cut -d: -f 7 /etc/passwd | sort | uniq -c
     17 /bin/bash
      1 /bin/sync
      1 /sbin/halt
     44 /sbin/nologin
      1 /sbin/shutdown
      1 /usr/sbin/nologin

Notice how many accounts cannot log in because they’re assigned the /sbin/nologin shell. These are, of course, accounts associated with system services.

Selecting by words

You can also use the cut command to select single and multiple words or strings from a file. Just remember that you need to specify the delimiter if the words or strings are not separated by tabs. The two command below show different amounts of each line. The first (delimited by blanks) displays the first field. The second (delimited by commas) displays all of the text up to the first comma.

$ cut -d' ' -f1 addresses
7610
6803
1089
3833
$ cut -d, -f1 addresses
7610 West Park Drive
6803 Gravel Branch Rd
1089 Plymouth Drive
3833 Abingdon Circle

If we asked for the first field without specifying a delimiter, we would see entire lines in any file that is not delimited by tabs.

$ cut -f1 addresses
7610 West Park Drive, Hyattsville, MD 20783
6803 Gravel Branch Rd, Hurlock, MD 21643
1089 Plymouth Drive, Rahway, NJ 07065
3833 Abingdon Circle, Norfolk, VA 23513

Selecting by characters

To select lines using character ranges, you can do something like this:

$ cut -c1-3 weekdays
Sun
Mon
Tue
Wed
Thu
Fri
Sat

This displays the first three letters of each line of a file that lists the days of the week.

Selecting by bytes

You can ask cut to select by bytes. Unless your data file includes characters that occupy more than a single byte, you would not see any differences. In this example, we might see a difference simply because the £ sign occupies two bytes.

$ cut -b1-23 cost
That biscuit cost me 2▒
$ cut -c1-23 cost
That biscuit cost me 2£

In the first command above, the response shows show a block of dots because it’s looking only at the first byte of the £ sign. In the second, we select by character, so it uses both bytes. We could also have just done this and added one more byte:

$ cut -b1-24 cost
That biscuit cost me 2£

Reversing your request

You can also select an option to reverse the output from your cut request. This doesn’t mean displaying it in reverse order, but means “doing the opposite”. Selecting the first four characters from a file is one thing. Select everything but those characters is its “complement”. Here’s an example:

$ cut -b1-4 addresses
7610
6803
1089
3833
$ cut --complement -b1-4 addresses
 West Park Drive, Hyattsville, MD 20783
 Gravel Branch Rd, Hurlock, MD 21643
 Plymouth Drive, Rahway, NJ 07065
 Abingdon Circle, Norfolk, VA 23513

Wrap-up

The cut command offers a lot of flexibility for selecting portions of each line in a file. Consult the man page for more information on its many options.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2021 IDG Communications, Inc.



Source link