Peering into binary files on Linux
Any file on a Linux system that isn’t a text file is considered a binary file–from system commands and libraries to image files and compiled programs. But these files being binary doesn’t mean that you can’t look into them. In fact, there are quite a few commands that you can use to extract data from binary files or display their content. In this post, we’ll explore quite a few of them.
file
One of the easiest commands to pull information from a binary file is the file command that identifies files by type. It does this in several ways–by evaluating the content, looking for a “magic number” (file type identifier), and checking the language. While we humans generally judge a file by its file extension, the file command largely ignores that. Notice how it responds to the command shown below.
$ file camper.png camper.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI),
density 72x72, segment length 16, Exif Standard: [TIFF image data,
little-endian, direntries=11, manufacturer=samsung, model=SM-G935V,
orientation=upper-left, xresolution=164, yresolution=172,
resolutionunit=2, software=GIMP 2.8.18, datetime=2018:04:30 07:56:54,
GPS-Data], progressive, precision 8, 3465x2717, components 3
The file command easily determined that “camper.png” is actually a jpg file, but in this case, it tells us a lot more. This includes the image resolution (3465×2717), the date and time the photo was taken, and details about the image and the cell phone used to take the photo. Not all jpg files will contain all of this data, but file will show you what is available.
Ask about a system binary and the output will look very different.
$ file /bin/date /bin/date: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV),
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
BuildID[sha1]=9ce744916618c6eef6f28ff70a3758675c307fb2, for GNU/Linux
3.2.0, stripped
In this case, we see that the date command is, not surprisingly, an ELF (extensible linking format) file along with some other details.
ldd
The ldd command lists the shared libraries that are used by an executable. The date command uses only a few.
$ ldd /bin/date linux-vdso.so.1 (0x00007fff21162000) libc.so.6 => /lib64/libc.so.6 (0x00007f2572f45000) /lib64/ld-linux-x86-64.so.2 (0x00007f2573141000)
ltrace
The ltrace traces library calls for an executable.
$ ltrace pwd getenv("POSIXLY_CORRECT") = nil strrchr("pwd", "https://www.networkworld.com/") = nil setlocale(LC_ALL, "") = "en_US.UTF-8" bindtextdomain("coreutils", "/usr/share/locale") = "/usr/share/locale" textdomain("coreutils") = "coreutils" __cxa_atexit(0x5644cb982120, 0, 0x5644cb985b20, 0x6c69747565726f63) = 0 getopt_long(1, 0x7fff17badb18, "LP", 0x5644cb985b40, nil) = -1 getcwd(nil, 0) = "" puts("/home/shs"/home/shs ) = 10 free(0x5644cbbdf440) = <void> __fpending(0x7f18d802a520, 0, 0x5644cb982120, 1) = 0 fileno(0x7f18d802a520) = 1 __freading(0x7f18d802a520, 0, 0x5644cb982120, 1) = 0
strace
The strace command traces system calls and is considered a very useful diagnostic, debugging and instructional utility. One unusual thing about it is that it sends its output to stderr (standard error) and the output of the command being traced to stdout (standard out). So, if you want to save the tracing information in a file, use commands like these:
$ strace ls camp* 2>output.txt camper_10.jpg camper.jpg camper.png $ $ head -8 output.txt execve("/usr/bin/ls", ["ls", "camper_10.jpg", "camper.jpg", "camper.png"], 0x7ffd7ec34f18 /* 34 vars */) = 0 brk(NULL) = 0x5646e6bae000 arch_prctl(0x3001 /* ARCH_??? */, 0x7ffc3d514cf0) = -1 EINVAL (Invalid argument) access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=61880, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 61880, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fb2b67a9000 close(3) = 0
hexdump
The hexdump command displays the content of binary files in hexadecimal. With the addition of the -C option, it also provides a character translation, so that we can easily pick out the “magic numbers” that identify the file types – JFIF and ELF in the samples below.
$ hexdump -C camper.jpg | head -5 00000000 ff d8 ff e0 00 10 4a 46 49 46 00 01 01 01 00 48 |......JFIF.....H| 00000010 00 48 00 00 ff e1 38 7e 45 78 69 66 00 00 49 49 |.H....8~Exif..II| 00000020 2a 00 08 00 00 00 0b 00 0f 01 02 00 08 00 00 00 |*...............| 00000030 92 00 00 00 10 01 02 00 09 00 00 00 9a 00 00 00 |................| 00000040 12 01 03 00 01 00 00 00 01 00 00 00 1a 01 05 00 |................| $ hexdump -C /bin/date | head -5 00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............| 00000010 03 00 3e 00 01 00 00 00 40 42 00 00 00 00 00 00 |..>.....@B......| 00000020 40 00 00 00 00 00 00 00 70 9a 01 00 00 00 00 00 |@.......p.......| 00000030 00 00 00 00 40 00 38 00 0d 00 40 00 1f 00 1e 00 |....@.8...@.....| 00000040 06 00 00 00 04 00 00 00 40 00 00 00 00 00 00 00 |........@.......|
This is not unlike the output you would get using the od (octal dump) command, but the display is a little easier to read.
$ od -hc camper.jpg | head -6 0000000 d8ff e0ff 1000 464a 4649 0100 0101 4800 377 330 377 340 020 J F I F 001 001 001 H 0000020 4800 0000 e1ff 7e38 7845 6669 0000 4949 H 377 341 8 ~ E x i f I I 0000040 002a 0008 0000 000b 010f 0002 0008 0000 * b v 017 001 002 b
strings
The strings command pulls out “strings” (sequences of printable characters) from a binary file. In the first example below, notice how it extracts the captured details on the camper image. Going much further than the first 12 lines probably won’t tell you much additional useful information with an image file since many sequences of bytes can appear to be text even though they are not.
$ strings camper.jpg | head -12 JFIF 8~Exif samsung SM-G935V GIMP 2.8.18 2018:04:30 07:56:54 0220 0100 2018:04:29 13:23:09 2018:04:29 13:23:09 0100 ASCII
You can also use the strings command to look into executables.
$ strings HelloWorld | head -16 /lib64/ld-linux-x86-64.so.2 puts __libc_start_main libc.so.6 GLIBC_2.2.5 __gmon_start__ H=(@@ []AA]A^A_ Hello World ;*3$" GCC: (GNU) 11.0.0 20210210 (Red Hat 11.0.0-0) GCC: (GNU) 11.0.1 20210405 (Red Hat 11.0.1-0) 3g961 running gcc 11.0.0 20210210 annobin gcc 11.0.0 20210210 GA*GOW
readelf
Another useful command for looking at ELF files is readelf. You will, however, need to select an option to determine what it’s going to display for you. In the example below, we’re looking at the header (-h). Use -a if you want to see all the available data.
$ readelf -h HelloWorld ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x401040 Start of program headers: 64 (bytes into file) Start of section headers: 23304 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 13 Size of section headers: 64 (bytes) Number of section headers: 31 Section header string table index: 30
Wrap-Up
There are quite a few commands on Linux systems for pulling information from binary files. They can be very helpful when you want to look more deeply into commands that you run or images that you want to know more about.
Copyright © 2021 IDG Communications, Inc.