Smart ways to compare files on Linux
Commands for comparing files have proliferated since the early days of Linux. In this post, we’ll look at a suite of commands available for comparing files and highlight the advantages that some of the newer ones provide.
diff
One of the oldest and still popular commands for detecting and reporting on file differences is the diff command. Comparing two lists of meeting attendees, the diff command will simply and clearly show you the differences.
$ diff attendance-2020 attendance-2021 10,12c10 < Monroe Landry < Jonathon Moody < Donnell Moore --- > Sandra Henry-Stocker
Only the lines that are different are displayed. The output precedes lines that are only in the first file with < and those only in the second file with >.
This output does not show the names of individuals who attended both meetings, but only those that only attended the 2020 meeting and those that only attended the 2021 meeting. If you only want to know whether the files are different, you can add the -q argument.
$ diff -q attendance-2020 attendance-2021 Files attendance-2020 and attendance-2021 differ
The diff command will not tell you anything if two files are the same. If you want confirmation that files are identical, you can add a -s argument.
$ diff attendance-2020 attendance-2021 $ diff -s attendance-2020 attendance-2021 Files attendance-2020 and attendance-2021 are identical
The diff command can also compare binary files (e.g., executables and images), but will only tell you if they are the same or different.
$ diff -s penguin.png penguin0.png Files penguin.png and penguin0.png are identical
If you want to see a side-by-side comparison of two text files, you can use the -y argument and see output like this:
$ diff -y attendance-2020 attendance-2021 Alfreda Branch Alfreda Branch Hans Burris Hans Burris Felix Burt Felix Burt Ray Campos Ray Campos Juliet Chan Juliet Chan Denver Cunningham Denver Cunningham Tristan Day Tristan Day Kent Farmer Kent Farmer Terrie Harrington Terrie Harrington Monroe Landry | Sandra Henry-Stocker Jonathon Moody < Donnell Moore < Leanne Park Leanne Park Alfredo Potter Alfredo Potter Felipe Rush Felipe Rush
colordiff
The colordiff command enhances the differences between two text files by using colors to highlight the differences.
$ colordiff attendance-2020 attendance-2021 10,12c10 < Monroe Landry < Jonathan Moody < Donnell Moore --- < Sandra Henry-Stocker
If you add a -u option, those lines that are included in both files will appear in your normal font color.
wdiff
The wdiff command uses a different strategy. It highlights the lines that are only in the first or second files using special characters. Those surrounded by square brackets are only in the first file. Those surrounded by braces are only in the second file.
$ wdiff attendance-2020 attendance-2021 Alfreda Branch Hans Burris Felix Burt Ray Campos Juliet Chan Denver Cunningham Tristan Day Kent Farmer Terrie Harrington [-Monroe Landry <== lines in file 1 start Jonathon Moody Donnell Moore-] <== lines only in file 1 stop {+Sandra Henry-Stocker+} <== line only in file 2 Leanne Park Alfredo Potter Felipe Rush
vimdiff
The vimdiff command takes an entirely different approach. It uses the vim editor to open the files in a side-by-side fashion. It then highlights the lines that are different using background colors and allows you to edit the two files and save each of them separately.
Unlike the commands described above, it runs on the desktop, not in a terminal window.
On Debian systems, you can install vimdiff with this command:
$ sudo apt install vim
vimdiff.jpg <=====================
kompare
The kompare command, like vimdifff, runs on your desktop. It displays differences between files to be viewed and merged and is often used by programmers to see and manage differences in their code. It can compare files or folders. It’s also quite customizable.
Learn more at kde.org.
kdiff3
The kdiff3 tool allows you to compare up to three files and not only see the differences highlighted, but merge the files as you see fit. This tool is often used to manage changes and updates in program code.
Like vimdiff and kompare, kdiff3 runs on the desktop.
You can find more information on kdiff3 at sourceforge.
Using checksums
One easy way to find out if files are the same or different is to compute checksums. If the results are the same, the likelihood that the files are different is infinitesimally small.
One of the primary advantages of using checksums is that the files don’t even need to be on the same system. Use the same checksum command and compare the results. The disadvantage is that checksums won’t tell you how the files are different or even how much they are different. If a single byte is different, the checksums will be dramatically different. That’s the way they work. These two files have only one letter that is not the same, yet the checksums are dramatically different:
$ shasum words-1 words-2 36e191c4a932d239233ca8cced35f7689d070c0c words-1 c09bb9b4b5f61a72a7ca6e933981e151cd35c9a7 words-2
Keep in mind that there are many commands for calculating checksums. A command like this should help you identify those that are installed on your system:
$ apropos checksum cksum (1) - checksum and count the bytes in a file Dpkg::Checksums (3perl) - generate and manipulate file checksums shasum (1) - Print or Check SHA Checksums sum (1) - checksum and count the blocks in a file tc-csum (8) - checksum update action
Wrap-Up
While there are many choices for comparing files (not all covered in this post), the ones that work best for you will depend on whether you just want to know if files are different or you want to work with the differences.
Copyright © 2021 IDG Communications, Inc.