Commands for finding out if compressed Linux files are the same
Compressed Linux files are helpful because they save disk space, but what should you do when you have a series of compressed files and want to determine if any are duplicates? The zdiff and zcmp commands can help.
To begin, if a directory contains two files like those below, it’s easy to tell just from the listing that they are not identical. After all, the file sizes are a little different. The files look like this:
$ ls -l total 200 -rw-r--r--. 1 shs shs 102178 Nov 22 2021 2021.gz -rw-r--r--. 1 shs shs 102181 Nov 22 11:19 2022.gz
If you compare the files with the diff command, it will confirm that the files differ:
$ diff 2021.gz 2022.gz Binary files 2021.gz and 2022.gz differ
What the diff command doesn’t tell you (because it examines the files byte by byte) is that the material that was compressed in creating these two files actually is identical. To determine that, you would need to use the zdiff or the zcmp command. If the file content that was compressed in each file is identical, you will get no output from the command from either of these commands.
$ zdiff 2021.gz 2022.gz $ $ zcmp 2021.gz 2022.gz $
After using gunzip to decompress the files, the resulting files are the same size and can be compared with the diff command to confirm their identical content. Again, the absence of output from the diff command indicates that there are no differences.
$ gunzip 2021.gz $ gunzip 2022.gz $ ls -l total 852 -rw-r--r--. 1 shs shs 383654 Nov 22 2021 2021 -rw-r--r--. 1 shs shs 383654 Nov 22 11:19 2022 $ diff 2021 2022 $
Clearly, the file content is the same. Why, then, do the compressed versions appear to be different? That’s because gzip retains the original file name and includes the file’s timestamp when it compresses a file. This information is not included in the comparisons.
Comparing compressed and non-compressed files
While both the zdiff and zcmp commands can determine whether two compressed files are the same, they can also compare the content of a compressed file with a non-compressed file. In other words, if you compare a compressed file with the file that contains the original content but is not compressed, you will still get confirmation that the content matches.
$ zdiff 2021.gz 2022 $ $ zcmp 2021.gz 2022 $
In fact, although there’s no benefit to using zdiff and zcmp with non-compressed files, the commands would still comply with your request. The command below compares the two files when both are decompressed.
$ zdiff 2021 2022 $
zdiff and zcmp differences
The main difference between the zdiff and zcmp commands is what they tell you when files are different. If you use the zdiff command, it will display any differences detected in the compressed content.
$ zdiff 2022.gz 2023.gz 6409c6409 < There may be only one active coprocess at a time. --- > There may be only one active coprocess at a time!
If you use the zcmp command, it will tell you that the file content is different and where any differences are located by byte and line number.
$ zcmp 2022.gz 2023.gz /dev/fd/5 - differ: byte 383573, line 6409
Wrap-Up
The zdiff and zcmp commands allow you to compare the content of files compressed with gzip. While both commands will show no output if the file content matches, they will show different details when the files are different. You can also use these commands to compare files compressed with gzip to files that are not compressed in order to determine if the original content is the same in both.
Copyright © 2022 IDG Communications, Inc.