Many ways to sort file content on Linux


The Linux sort command can arrange command output or file content in a lot more ways than you might realize–alphabetically, numerically, by month and randomly are only some of the more interesting choices. In this post, we take a look at some of the more useful sorting options and explain how they differ.

The default

The default sort might seem fairly straightforward. Digits come first, followed by letters and, for each letter, lowercase characters precede uppercase characters. You can expect to see this kind of ordering:

012345aAbBcCdDeE

ASCII order

Looking at the numeric byte values for each of these letters, you may note that what you see above is not the “natural order” as far as ASCII is concerned.

$ echo 012345aAbBcCdDeE | od -bc
0000000 060 061 062 063 064 065 141 101 142 102 143 103 144 104 145 105
          0   1   2   3   4   5   a   A   b   B   c   C   d   D   e   E

As you’ll notice in this octal dump of the list of characters, uppercase letters have lower ASCII values and would come before lowercase letters if they were listed in ASCII order. To sort by byte value, prepend your sort command with LC_ALL=C. For example, here’s a comparison of sorting in byte order compared with the default sort  order:

$ LC_ALL=C sort file		$ sort file
0				0
1				1
2				2
3				3
4				4
5				5
A <==				a <==
B <==				A <==
C				b
D				B
E				c
a				C
b				d
c				D
d				e
e				E
 

Numeric order

To sort numerically, you need to use -n or you’ll end up sorting numbers by character and 100 would pretend to be smaller than 2. Here’s a comparison between a default sort and a numeric sort:

$ sort numbers			$ sort -n numbers
0                               0
1                               1
11                              4
4                               9
44                              11
9                               44

You can also sort numerically using a “human-friendly” sort order. This allows you to represent numbers with more than digits–such as 5M. The option for this sort order is -h. When you use it, 5K would be treated as larger than 500 and less than 5M. Here’s a comparison of the default sort and a human-friendly sort:

$ sort numbers			$ sort -h numbers
0                               0
1                               1
11                              4
4                               9
44                              11
500                             44
5K                              500
5M                              5K
9                               5M

By Month

To sort by month name, you would use the -M option. Here’s an example of a default sort and a sort by month:

$ sort months		            # sort -M months
Apr                                 Jan
Aug                                 Feb
Dec                                 Mar
Feb                                 Apr
Jan                                 May
Jul                                 Jun
Jun                                 Jul
Mar                                 Aug
May                                 Sep
Nov                                 Oct
Oct                                 Nov
Sep                                 Dec

Notice that sorting by month works whether you spell out the names of the months or use abbreviations:

$ sort -M months2
Jan
Feb
March
Apr
May
June
Jul
August
Sep
October
November
Dec

Understand that a sort by month is not a sort by date. This sort option assumes that all months are in the same year.

$ sort events			    $ sort -M events
Feb 10 2020 20:06 SOMETHING         Jan 23 2020 10:42 SOMETHING
Feb 11 2020 20:06 SOMETHING         Jan 29 2020 09:17 SOMETHING
Feb 12 2019 11:11 SOMETHING         Feb 10 2020 20:06 SOMETHING
Feb 27 2020 23:05 SOMETHING         Feb 11 2020 20:06 SOMETHING
Jan 23 2020 10:42 SOMETHING         Feb 12 2019 11:11 SOMETHING <==
Jan 29 2020 09:17 SOMETHING         Feb 27 2020 23:05 SOMETHING <==
Jun 26 2019 09:09 SOMETHING         Jun 26 2019 09:09 SOMETHING

Reversing listings

To reverse the order of your sorted listings, add the -r option. Here’s a reverse listing of the months and human-readable numbers files:

$ sort -Mr months                   $ sort -hr numbers
Dec                                 5M
Nov                                 5k
Oct                                 500
Sep                                 44
Aug                                 11
Jul                                 9
Jun                                 4
May                                 1
Apr                                 0
Mar
Feb
Jan                           

Random sorting

To sort text in a pseudorandom fashion, use -R with your sort command. Here are some of the earlier sorts using the random option.

$ sort -R months		    $ sort -R numbers
Aug                                 500
Nov                                 4
Dec                                 44
Sep                                 5M
Apr                                 0
Jan                                 1
Jul                                 5K
Jun                                 11
May                                 9
Mar
Feb
Oct

The other way to sort data randomly is to use the shuf (for “shuffle”) command. Here are a couple examples using data from earlier examples in this post:

$ shuf months			    $ shuf numbers
Nov                                 0
Jun                                 4
May                                 500
Aug                                 5K
Apr                                 11
Dec                                 44
Jul                                 1
Feb                                 9
Mar                                 5M
Oct
Sep
Jan 

Sorting Command Output

You can also pipe data to any of the sort commands shown. The command below might not be particularly useful, but it demonstrates the point and shows some other commands related to sorting.

$ apropos sort | sort -r
XConsortium (7)      - X Consortium information
versionsort (3)      - scan a directory for matching entries
tsort (1)            - perform topological sort
sort (1)             - sort lines of text files
qsort_r (3)          - sort an array
qsort (3)            - sort an array
comm (1)             - compare two sorted files line by line
bzip2 (1)            - a block-sorting file compressor, v1.0.8
bunzip2 (1)          - a block-sorting file compressor, v1.0.8
bsearch (3)          - binary search of a sorted array
apt-sortpkgs (1)     - Utility to sort package index files
alphasort (3)        - scan a directory for matching entries

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2020 IDG Communications, Inc.



Source link