- TunnelBear VPN review: An affordable, easy-to-use VPN with few a few notable pitfalls
- I use this cheap Android tablet more than my iPad Pro - and it costs a fraction of the price
- One of my favorite budget tablets this year managed to be replace both my Kindle and iPad
- Critical Vulnerabilities Found in WordPress Plugins WPLMS and VibeBP
- How to detect this infamous NSO spyware on your phone for just $1
Searching through compressed files on Linux
There are quite a few ways to search through compressed text files on Linux systems without having to uncompress them first. Depending on the format of the files, you can choose to view entire files, extract specific text, navigate through file contents searching for content of interest, and sometimes even edit content. I
First, to show you how this works, I compressed the words file on one of my Linux systems (/usr/share/dict/words) using these commands:
$ cp /usr/share/dict/words . $ 7z a words.7z words $ bzip2 -k words $ gzip -k words $ xz -k words $ zip words.zip words
The -k options used with the bzip2, gzip, and xz commands kept these commands from removing the original file, which they would by default. The resultant files then looked like this:
$ ls -l total 9164 -rw-r--r--. 1 shs shs 4953598 Oct 27 16:11 words -rw-r--r--. 1 shs shs 1230545 Oct 27 16:14 words.7z -rw-r--r--. 1 shs shs 1712421 Oct 27 16:11 words.bz2 -rw-r--r--. 1 shs shs 1476067 Oct 27 16:11 words.gz -rw-r--r--. 1 shs shs 1230236 Oct 27 16:11 words.xz -rw-r--r--. 1 shs shs 1476203 Oct 28 12:42 words.zip
Viewing compressed-file content
To view the entire content of a compressed file while leaving the compressed file intact, you can use any of these commands:
- for 7z: 7z x -so words.7z
- for bz2: bzcat words.bz2
- for gz: zcat words.gz
- for xz: xzcat words.xz
- for zip: zcat words.zip
For example:
$ bzcat words.bz2 | head -5 $ 7z x -so words.7z | head -5 1080 1080 10-point 10-point 10th 10th 11-point 11-point 12-point 12-point
You can also pipe the output to commands like more or grep, or simply watch it scroll rapidly down your screen.
$ 7z x -so words.7z | grep overclever overclever overcleverly overcleverness
Browsing with less
You can browse some types of compressed files (bz2, gz and xz) using the less command.
$ less words.bz2 $ less words.gz $ less words.xz 1080 1080 1080 10-point 10-point 10-point 10th 10th 10th 11-point 11-point 11-point 12-point 12-point 12-point ... ... ...
Searching for text in 7z files
The 7z command allows you to view files included in the archive, but searching their contents requires an extraction (-x) option. However, a command like that below leaves the compressed file intact but also extracts the contents in the process. The -so option tells the command to write data to standard out.
$ 7z x -so words.7z | grep clever | column clever cleverest cleverly overcleverly uncleverness cleverality clever-handed cleverness overcleverness clever-clever cleverish clevernesses unclever cleverer cleverishly overclever uncleverly
There doesn’t seem to be a grep-like command for 7z files, but commands like this work very well.
Searching for text in other types of compressed files
To search for specific text in compressed files, you can use commands like these:
$ bzgrep overclever words.bz2 $ zgrep overclever words.gz $ xzgrep overclever words.xz $ zipgrep overclever words.zip
For any of these commands, you should see these words that they pull from the compressed word files:
overclever overcleverly overcleverness
Editing compressed files
Using vi or vim, you can actually edit some compressed files (bz2, gz and xz files) to add, change, or remove content. The files will remain compressed on your disk, but you’ll be able to notice the size changes.
$ xzcat words.xz | tail -3 Zz zZt ZZZ $ vi words.xz $ xzcat words.xz | tail -3 zZt ZZZ I added this line!
Wrap-Up
Given all the ways that you can browse and select content from compressed files, it might be a good time to exercise your “overcleverness” and see how helpful the methods described in this post might be.
Copyright © 2021 IDG Communications, Inc.