Peering into binary files on Linux


Any file on a Linux system that isn’t a text file is considered a binary file–from system commands and libraries to image files and compiled programs. But these files being binary doesn’t mean that you can’t look into them. In fact, there are quite a few commands that you can use to extract data from binary files or display their content. In this post, we’ll explore quite a few of them.

file

One of the easiest commands to pull information from a binary file is the file command that identifies files by type. It does this in several ways–by evaluating the content, looking for a “magic number” (file type identifier), and checking the language. While we humans generally judge a file by its file extension, the file command largely ignores that. Notice how it responds to the command shown below.

$ file camper.png
camper.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI),
density 72x72, segment length 16, Exif Standard: [TIFF image data,
little-endian, direntries=11, manufacturer=samsung, model=SM-G935V,
orientation=upper-left, xresolution=164, yresolution=172,
resolutionunit=2, software=GIMP 2.8.18, datetime=2018:04:30 07:56:54,
GPS-Data], progressive, precision 8, 3465x2717, components 3

The file command easily determined that “camper.png” is actually a jpg file, but in this case, it tells us a lot more. This includes the image resolution (3465×2717), the date and time the photo was taken, and details about the image and the cell phone used to take the photo. Not all jpg files will contain all of this data, but file will show you what is available.

Ask about a system binary and the output will look very different.

$ file /bin/date
/bin/date: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV),
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
BuildID[sha1]=9ce744916618c6eef6f28ff70a3758675c307fb2, for GNU/Linux
3.2.0, stripped

In this case, we see that the date command is, not surprisingly, an ELF (extensible linking format) file along with some other details.

ldd

The ldd command lists the shared libraries that are used by an executable. The date command uses only a few.

$ ldd /bin/date 
linux-vdso.so.1 (0x00007fff21162000) 
libc.so.6 => /lib64/libc.so.6 (0x00007f2572f45000) 
/lib64/ld-linux-x86-64.so.2 (0x00007f2573141000)

ltrace

The ltrace traces library calls for an executable.

$ ltrace pwd 
getenv("POSIXLY_CORRECT") = nil 
strrchr("pwd", "https://www.networkworld.com/") = nil 
setlocale(LC_ALL, "") = "en_US.UTF-8" 
bindtextdomain("coreutils", "/usr/share/locale") = "/usr/share/locale" 
textdomain("coreutils") = "coreutils" 
__cxa_atexit(0x5644cb982120, 0, 0x5644cb985b20, 0x6c69747565726f63) = 0 
getopt_long(1, 0x7fff17badb18, "LP", 0x5644cb985b40, nil) = -1 
getcwd(nil, 0) = "" 
puts("/home/shs"/home/shs 
) = 10 
free(0x5644cbbdf440) = <void> 
__fpending(0x7f18d802a520, 0, 0x5644cb982120, 1) = 0 
fileno(0x7f18d802a520) = 1 
__freading(0x7f18d802a520, 0, 0x5644cb982120, 1) = 0

strace

The strace command traces system calls and is considered a very useful diagnostic, debugging and instructional utility. One unusual thing about it is that it sends its output to stderr (standard error) and the output of the command being traced to stdout (standard out). So, if you want to save the tracing information in a file, use commands like these:

$ strace ls camp* 2>output.txt
camper_10.jpg  camper.jpg  camper.png
$
$ head -8 output.txt
execve("/usr/bin/ls", ["ls", "camper_10.jpg", "camper.jpg", "camper.png"], 0x7ffd7ec34f18 /* 34 vars */) = 0
brk(NULL)                               = 0x5646e6bae000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffc3d514cf0) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=61880, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 61880, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fb2b67a9000
close(3)                                = 0

hexdump

The hexdump command displays the content of binary files in hexadecimal. With the addition of the -C option, it also provides a character translation, so that we can easily pick out the “magic numbers” that identify the file types – JFIF and ELF in the samples below.

$ hexdump -C camper.jpg | head -5 
00000000 ff d8 ff e0 00 10 4a 46 49 46 00 01 01 01 00 48 |......JFIF.....H| 
00000010 00 48 00 00 ff e1 38 7e 45 78 69 66 00 00 49 49 |.H....8~Exif..II| 
00000020 2a 00 08 00 00 00 0b 00 0f 01 02 00 08 00 00 00 |*...............| 
00000030 92 00 00 00 10 01 02 00 09 00 00 00 9a 00 00 00 |................| 
00000040 12 01 03 00 01 00 00 00 01 00 00 00 1a 01 05 00 |................| 
$ hexdump -C /bin/date | head -5 
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............| 
00000010 03 00 3e 00 01 00 00 00 40 42 00 00 00 00 00 00 |..>.....@B......| 
00000020 40 00 00 00 00 00 00 00 70 9a 01 00 00 00 00 00 |@.......p.......| 
00000030 00 00 00 00 40 00 38 00 0d 00 40 00 1f 00 1e 00 |....@.8...@.....| 
00000040 06 00 00 00 04 00 00 00 40 00 00 00 00 00 00 00 |........@.......| 

This is not unlike the output you would get using the od (octal dump) command, but the display is a little easier to read.

$ od -hc camper.jpg | head -6
0000000    d8ff    e0ff    1000    464a    4649    0100    0101    4800
        377 330 377 340   020   J   F   I   F   001 001 001     H
0000020    4800    0000    e1ff    7e38    7845    6669    0000    4949
            H     377 341   8   ~   E   x   i   f       I   I
0000040    002a    0008    0000    000b    010f    0002    0008    0000
          *    b        v   017 001 002    b      

strings

The strings command pulls out “strings” (sequences of printable characters) from a binary file. In the first example below, notice how it extracts the captured details on the camper image. Going much further than the first 12 lines probably won’t tell you much additional useful information with an image file since many sequences of bytes can appear to be text even though they are not.

$ strings camper.jpg | head -12
JFIF
8~Exif
samsung
SM-G935V
GIMP 2.8.18
2018:04:30 07:56:54
0220
0100
2018:04:29 13:23:09
2018:04:29 13:23:09
0100
ASCII

You can also use the strings command to look into executables.

$ strings HelloWorld | head -16
/lib64/ld-linux-x86-64.so.2
puts
__libc_start_main
libc.so.6
GLIBC_2.2.5
__gmon_start__
H=(@@
[]AA]A^A_
Hello World
;*3$"
GCC: (GNU) 11.0.0 20210210 (Red Hat 11.0.0-0)
GCC: (GNU) 11.0.1 20210405 (Red Hat 11.0.1-0)
3g961
running gcc 11.0.0 20210210
annobin gcc 11.0.0 20210210
GA*GOW

readelf

Another useful command for looking at ELF files is readelf. You will, however, need to select an option to determine what it’s going to display for you. In the example below, we’re looking at the header (-h). Use -a if you want to see all the available data.

$ readelf -h HelloWorld
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x401040
  Start of program headers:          64 (bytes into file)
  Start of section headers:          23304 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         13
  Size of section headers:           64 (bytes)
  Number of section headers:         31
  Section header string table index: 30

Wrap-Up

There are quite a few commands on Linux systems for pulling information from binary files. They can be very helpful when you want to look more deeply into commands that you run or images that you want to know more about.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2021 IDG Communications, Inc.



Source link