Many ways to sort file content on Linux
The Linux sort command can arrange command output or file content in a lot more ways than you might realize–alphabetically, numerically, by month and randomly are only some of the more interesting choices. In this post, we take a look at some of the more useful sorting options and explain how they differ.
The default
The default sort might seem fairly straightforward. Digits come first, followed by letters and, for each letter, lowercase characters precede uppercase characters. You can expect to see this kind of ordering:
012345aAbBcCdDeE
ASCII order
Looking at the numeric byte values for each of these letters, you may note that what you see above is not the “natural order” as far as ASCII is concerned.
$ echo 012345aAbBcCdDeE | od -bc 0000000 060 061 062 063 064 065 141 101 142 102 143 103 144 104 145 105 0 1 2 3 4 5 a A b B c C d D e E
As you’ll notice in this octal dump of the list of characters, uppercase letters have lower ASCII values and would come before lowercase letters if they were listed in ASCII order. To sort by byte value, prepend your sort command with LC_ALL=C. For example, here’s a comparison of sorting in byte order compared with the default sort order:
$ LC_ALL=C sort file $ sort file 0 0 1 1 2 2 3 3 4 4 5 5 A <== a <== B <== A <== C b D B E c a C b d c D d e e E
Numeric order
To sort numerically, you need to use -n or you’ll end up sorting numbers by character and 100 would pretend to be smaller than 2. Here’s a comparison between a default sort and a numeric sort:
$ sort numbers $ sort -n numbers 0 0 1 1 11 4 4 9 44 11 9 44
You can also sort numerically using a “human-friendly” sort order. This allows you to represent numbers with more than digits–such as 5M. The option for this sort order is -h. When you use it, 5K would be treated as larger than 500 and less than 5M. Here’s a comparison of the default sort and a human-friendly sort:
$ sort numbers $ sort -h numbers 0 0 1 1 11 4 4 9 44 11 500 44 5K 500 5M 5K 9 5M
By Month
To sort by month name, you would use the -M option. Here’s an example of a default sort and a sort by month:
$ sort months # sort -M months Apr Jan Aug Feb Dec Mar Feb Apr Jan May Jul Jun Jun Jul Mar Aug May Sep Nov Oct Oct Nov Sep Dec
Notice that sorting by month works whether you spell out the names of the months or use abbreviations:
$ sort -M months2 Jan Feb March Apr May June Jul August Sep October November Dec
Understand that a sort by month is not a sort by date. This sort option assumes that all months are in the same year.
$ sort events $ sort -M events Feb 10 2020 20:06 SOMETHING Jan 23 2020 10:42 SOMETHING Feb 11 2020 20:06 SOMETHING Jan 29 2020 09:17 SOMETHING Feb 12 2019 11:11 SOMETHING Feb 10 2020 20:06 SOMETHING Feb 27 2020 23:05 SOMETHING Feb 11 2020 20:06 SOMETHING Jan 23 2020 10:42 SOMETHING Feb 12 2019 11:11 SOMETHING <== Jan 29 2020 09:17 SOMETHING Feb 27 2020 23:05 SOMETHING <== Jun 26 2019 09:09 SOMETHING Jun 26 2019 09:09 SOMETHING
Reversing listings
To reverse the order of your sorted listings, add the -r option. Here’s a reverse listing of the months and human-readable numbers files:
$ sort -Mr months $ sort -hr numbers Dec 5M Nov 5k Oct 500 Sep 44 Aug 11 Jul 9 Jun 4 May 1 Apr 0 Mar Feb Jan
Random sorting
To sort text in a pseudorandom fashion, use -R with your sort command. Here are some of the earlier sorts using the random option.
$ sort -R months $ sort -R numbers Aug 500 Nov 4 Dec 44 Sep 5M Apr 0 Jan 1 Jul 5K Jun 11 May 9 Mar Feb Oct
The other way to sort data randomly is to use the shuf (for “shuffle”) command. Here are a couple examples using data from earlier examples in this post:
$ shuf months $ shuf numbers Nov 0 Jun 4 May 500 Aug 5K Apr 11 Dec 44 Jul 1 Feb 9 Mar 5M Oct Sep Jan
Sorting Command Output
You can also pipe data to any of the sort commands shown. The command below might not be particularly useful, but it demonstrates the point and shows some other commands related to sorting.
$ apropos sort | sort -r XConsortium (7) - X Consortium information versionsort (3) - scan a directory for matching entries tsort (1) - perform topological sort sort (1) - sort lines of text files qsort_r (3) - sort an array qsort (3) - sort an array comm (1) - compare two sorted files line by line bzip2 (1) - a block-sorting file compressor, v1.0.8 bunzip2 (1) - a block-sorting file compressor, v1.0.8 bsearch (3) - binary search of a sorted array apt-sortpkgs (1) - Utility to sort package index files alphasort (3) - scan a directory for matching entries
Copyright © 2020 IDG Communications, Inc.