Finding files on Linux with the longest names
File names on Linux systems can be as long as 255 characters. While determining which files in a directory have the longest names might not be the most exciting task at hand, doing this with a script poses some interesting challenges that invite equally interesting solutions.
To start, consider passing the output of the ls command, which is used to list files, to a wc command that counts the characters like this:
$ ls myreport.txt | wc -c 13
If you counted the letters in “myreport.txt” by looking at “myreport.txt”, you likely noticed that there are 12, not 13 letters in that file name. This is because, just as in the command below, echo sends the requested text through the pipe along with a newline character at the end.
$ echo hello | wc -c 6
You can see this issue more clearly by passing the same output to the od -bc command. It makes the inclusion of the newline very obvious.
$ echo hello | od -bc 0000000 150 145 154 154 157 012 h e l l o n <=== There it is! 0000006
To avoid the extra character, just add a -n (remove newline) option to the command.
$ echo -n hello | wc -c 5
If you tried a command like the one below, you’d quickly see that the period is taken literally. The resulting “.” followed by a carriage return yields a length of 2.
$ for file in . do echo $file | wc -c done 2
The command below will generate a list with file names and lengths, but it has one serious problem. It will break file names including blanks into a number of parts and report the lengths of each part separately.
$ for file in `ls` do echo -n “$file “ echo -n $file | wc -c done
Here’s an example:
$ for file in `ls Speed*` do echo -n “$file “ echo -n $file | wc -c done Speeding 8 up 2 scripts 7 using 5 parallelization 15
In contrast, the command below will list all of the files in the current directory followed by their lengths.
$ for file in * do echo -n "$file " echo -n $file | wc -c done
The extra blank in the first echo command is used to leave a space between file names and lengths.
hello 5
Make some small changes and the command will sort the files by filename length.
$ for file in *; do len=`echo -n $file | wc -c`; echo $len $file; done | sort -n
Adding a tail command to the end will provide the name and length of the file with the longest name only.
$ for file in *; do len=`echo -n $file | wc -c`; echo $len $file; done | sort -n | tail -1 41 Speeding up scripts using parallelization
The script below displays only the file with the longest filename after prompting for the directory to be examined. It then finds the longest filename by retaining the longest filename encountered while looping through the files until it finds a longer one. The “for file in $dir/*” provides the needed looping without breaking up filenames on blanks.
It also ensures that the proper length of the longest file is included in the line following the “for file” command. It removes the name of the directory that it’s looking through along with the following “/” by using a sed command to reduce the string to just the file name. Commas are used in the sed command to avoid colliding with the backslash characters that are normally used with sed.
#!/bin/bash # find file with longest filename echo -n "dir> " read dir longestname=0 for file in $dir/*; do file=`echo $file | sed s,$dir/,,` sz=`echo $file | wc -c` # get filename length if [ $sz -gt $longestname ]; then longestname=`expr $sz - 1` # reduce by 1 for carriage return longname=$file fi done echo $longestname: $longname
Running this script should look something like this:
$ ./LongFname dir> . 41: Speeding up scripts using parallelization $ ./LongFname dir> ./bin 17: loop-days-of-week
Wrap-Up
Looping through a list of files to find those with the longest filenames requires a good understanding of how loops work and how blanks in filenames can complicate the required commands.
Copyright © 2022 IDG Communications, Inc.