Using the Linux cut command to grab portions of lines from files
One surprisingly easy command for grabbing a portion of every line in a text file on a Linux system is cut. It works something like awk in that it allows you to select only what you want to see from files, enabling you to pull fields (regardless of the delimiter used), characters or bytes. To check on cut, you can ask about its version like this:
$ cut --version cut (GNU coreutils) 8.32 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by David M. Ihnat, David MacKenzie, and Jim Meyering.
Selecting by field
To illustrate how the cut command works, we’ll first run commands using a sample “cities” file that contains details of the largest cities in the US in a tab-separated format. The lines in this file look something like what is shown below:
$ head -5 cities Rank Name State 2021 Pop. 2010 Census Change Density (mi²) Area (mi²) 1 New York City New York 8,230,290 8,190,210 4,083 300 2 Los Angeles California 3,983,540 3,795,510 1,266 469 3 Chicago Illinois 2,679,080 2,697,480 1,756 227 4 Houston Texas 2,323,660 2,100,280 541 640
To select a particular field from this file, you might use a command like this that shows the 4th field:
$ cut -f 4 cities | head -11 2021 Pop. 8,230,290 3,983,540 2,679,080 2,323,660 1,733,630 1,585,010 1,581,730 1,427,720 1,347,120 1,011,790
To add the city names to your selection, you would select the 2nd and 4th fields. Since the tab character is the default delimiter for the cut command, it easily extracts these fields.
$ cut -f2,4 cities | head -11 Name 2021 Pop. New York City 8,230,290 Los Angeles 3,983,540 Chicago 2,679,080 Houston 2,323,660 Phoenix 1,733,630 Philadelphia 1,585,010 San Antonio 1,581,730 San Diego 1,427,720 Dallas 1,347,120 Austin 1,011,790
The string -f1-4 would display the first four fields in the file:
$ cut -f1-4 cities | head -5 Rank Name State 2021 Pop. 1 New York City New York 8,230,290 2 Los Angeles California 3,983,540 3 Chicago Illinois 2,679,080 4 Houston Texas 2,323,660
To specify a different delimiter, you could add the -d option and use a command like this one, which pulls usernames from the /etc/passwd file:
$ cut -d: -f 1 /etc/passwd | head -10 root bin daemon adm lp sync shutdown halt mail operator
To select to see both login names and assigned shells, try this:
$ cut -d: -f 1,7 /etc/passwd | head -10 root:/bin/bash bin:/sbin/nologin daemon:/sbin/nologin adm:/sbin/nologin lp:/sbin/nologin sync:/bin/sync shutdown:/sbin/shutdown halt:/sbin/halt mail:/sbin/nologin operator:/sbin/nologin
The command above selects the 1st and 7th fields.
To count how many accounts use each of the shells, use a command like this:
$ cut -d: -f 7 /etc/passwd | sort | uniq -c 17 /bin/bash 1 /bin/sync 1 /sbin/halt 44 /sbin/nologin 1 /sbin/shutdown 1 /usr/sbin/nologin
Notice how many accounts cannot log in because they’re assigned the /sbin/nologin shell. These are, of course, accounts associated with system services.
Selecting by words
You can also use the cut command to select single and multiple words or strings from a file. Just remember that you need to specify the delimiter if the words or strings are not separated by tabs. The two command below show different amounts of each line. The first (delimited by blanks) displays the first field. The second (delimited by commas) displays all of the text up to the first comma.
$ cut -d' ' -f1 addresses 7610 6803 1089 3833 $ cut -d, -f1 addresses 7610 West Park Drive 6803 Gravel Branch Rd 1089 Plymouth Drive 3833 Abingdon Circle
If we asked for the first field without specifying a delimiter, we would see entire lines in any file that is not delimited by tabs.
$ cut -f1 addresses 7610 West Park Drive, Hyattsville, MD 20783 6803 Gravel Branch Rd, Hurlock, MD 21643 1089 Plymouth Drive, Rahway, NJ 07065 3833 Abingdon Circle, Norfolk, VA 23513
Selecting by characters
To select lines using character ranges, you can do something like this:
$ cut -c1-3 weekdays Sun Mon Tue Wed Thu Fri Sat
This displays the first three letters of each line of a file that lists the days of the week.
Selecting by bytes
You can ask cut to select by bytes. Unless your data file includes characters that occupy more than a single byte, you would not see any differences. In this example, we might see a difference simply because the £ sign occupies two bytes.
$ cut -b1-23 cost That biscuit cost me 2▒ $ cut -c1-23 cost That biscuit cost me 2£
In the first command above, the response shows show a block of dots because it’s looking only at the first byte of the £ sign. In the second, we select by character, so it uses both bytes. We could also have just done this and added one more byte:
$ cut -b1-24 cost That biscuit cost me 2£
Reversing your request
You can also select an option to reverse the output from your cut request. This doesn’t mean displaying it in reverse order, but means “doing the opposite”. Selecting the first four characters from a file is one thing. Select everything but those characters is its “complement”. Here’s an example:
$ cut -b1-4 addresses 7610 6803 1089 3833 $ cut --complement -b1-4 addresses West Park Drive, Hyattsville, MD 20783 Gravel Branch Rd, Hurlock, MD 21643 Plymouth Drive, Rahway, NJ 07065 Abingdon Circle, Norfolk, VA 23513
Wrap-up
The cut command offers a lot of flexibility for selecting portions of each line in a file. Consult the man page for more information on its many options.
Copyright © 2021 IDG Communications, Inc.