Reverse Engineering: A Security Researcher’s Toolkit


Craig Young, Principal Security Researcher at Tripwire, unpacks the modern security researcher’s toolkit to reverse engineer complex designs.

Spotify: https://open.spotify.com/show/5UDKiGLlzxhiGnd6FtvEnm
Stitcher: https://www.stitcher.com/podcast/the-tripwire-cybersecurity-podcast
RSS: https://tripwire.libsyn.com/rss
YouTube: https://www.youtube.com/playlist?list=PLgTfY3TXF9YKE9pUKp57pGSTaapTLpvC3

Tim Erlin: Welcome everyone to the Tripwire Cybersecurity Podcast. I’m Tim Erlin, vice-president of product management and strategy at Tripwire. Today, I am joined by one of our security researchers, Craig Young.

Craig Young: Thank you for having me. It’s a pleasure to be here with you.

On the Basics of Reverse Engineering

TE: Today, we’re here to talk a little bit about vulnerability discovery and vulnerability research. Can you start by just talking a little bit about what that process looks like?

CY: So, we’re talking about a situation in which a user input is not going to be handled in the way that it should have safely been handled. When we’re looking for vulnerabilities, it’s all about thinking about the different places where less trustworthy users can provide input to a system or a program and then trying to identify what types of inputs might go into this process that are going to corrupt it, cause it to give some results that were not the original intent of the developer and undermine the security of the application.

TE: When someone’s sitting behind a keyboard and typing into a form or website, but there are obviously other types of input that matter, is it strictly input coming from a user, or is that a sort of catch-all term for any types of input that a program could accept?

CY: It’s a catch-all term. Sometimes, there’s not necessarily going to be a human user, but it will be some kind of consumer or a system that’s involved with supplying some inputs. So, a common example of this would be like a network server that’s going to sit on the network and listen for requests from a user that might want to log in. If we look back to the early days of computers on the internet, something that was common to do would be to connect to remote service, send a long list of eight characters and see if it would crash. And that would be telling you that you’ve supplied input that it’s not handling properly.

TE: That brings up an interesting point. The distinction of going from a crashed service or an application to an exploit that allows remote code execution…how are those two things related?

CY: This is where the reverse engineering process comes into play. Say you’ve got a telnet server on the internet, and you send it 65 “A’s,” and it crashes. Well, this is something that was a problem, but how do we know if it’s actually a vulnerability? In order to do that, we have to actually look at the instructions within the program and recognize where it was carving out storage for this data. What actually happened that caused it to do something wrong? Maybe the software said, “Hey, this is not the right input. We should just abort and close ourselves gracefully.” Or maybe it had a mathematical error and wrote data into a place on the computer system where it was not supposed to. We use reverse engineering tools to inspect programs on a computer and get an idea of what’s happening when we feed input into those programs.

TE: Maybe we should draw a distinction between reverse engineering as a process/concept and specific reverse engineering tools. You could take that situation where you’re sending 65 “A’s” and the application on the other end crashes. You could reverse engineer what’s happening by sending different types of input and basically through trial and error coming to a conclusion about what specific input is causing the problem. That would be reverse engineering, right?

CY: Yes, that’s definitely a valid tactic towards reverse engineering. Simple trial and error.

In the early days, if you wanted to reverse a program, you might actually open it up in a hex editor, and you would read through the hex bytes and confer back or refer back to the processor instruction manuals, and you could take those hex codes and translate them into individual instructions and start working through to identify problems in that sense. So hex dump would be one of the original reverse engineering tools, so to speak.

TE: But we’ve as an industry moved beyond that at this point. So, there are also specific tools in the market for reverse engineering, right?

CY: Yeah. Some of the most basic tools like this would be required as part of developer toolkits. You’re not going to get very far with developing a complex computer system if you don’t have a way of debugging it. So, on the Linux world, you have GNU tools like objdump and others that will allow you to interpret the ELF binary headers, parse out functions and see disassembly so that you can understand what a program was doing. On the Windows side, you’ve got windbg and some of the other tools that are part of Visual Studio.

Sometimes debugging is about looking for compatibility issues, as well.

So, for example, working on the Tripwire products, we’ve seen before that Microsoft will change something within an API. We don’t have source code for the Windows APIs, but it has been necessary before to actually perform reverse engineering, looking into these binaries to understand what had changed that we could be able to keep our products compatible with these operations.

TE: So, that’s one group of tools that you were talking about. I think you were headed towards a second group, right?

CY: Yeah. Beyond these basic tools for being able to analyze binaries, which really you can get very far with, there are other more specialized tools that have come out over the years. In the open-source community, you have things like Radare. In the closed-source community proprietary tools, the big heavy hitter of this branding is IDA. These software packages can get into the thousands and tens of thousands of dollars to have comprehensive support.

TE: We’re also here to talk a little bit about a class that you taught back at SecTor around one of these specific tools. That’s Ghidra, right?

CY: The Ghidra tool was really a game changer in this space.

Previously, if you wanted to get this premium quality decompilation support, you’re talking about thousands of dollars of investment. The National Security Agency of the United States released to the public their toolset for doing reverse engineering and decompilation, and it’s absolutely a fabulous tool. It is an amazing tool for being free and opens up a lot of possibilities for individuals, students, whoever—people from companies that aren’t looking to spend five figures on software reverse engineering tools.

It gives them the ability to actually get their hands in and see high-quality decompilation associated with binaries that they might want to analyze. I wanted to be able to get more people involved with using this, so I prepared some course material that I taught at the SecTor trainings this past year to get people familiar with Ghidra.

TE: Ghidra being introduced or released by the NSA and effectively lowering the barrier of entry into this particular part of information security seems like a big deal in terms of the vulnerability research community.

CY: Yeah, absolutely. It’s a big deal for offensive and defensive research because it democratizes these kinds of capabilities in a major way.

TE: Do you have any thoughts about why the NSA chose to do that?

CY: Yeah, actually. I had attended a presentation at the REcon conference in Montreal the other year where they talked about this a little bit. They have their public reasons about wanting to show what that they’re giving back to society and whatnot. But I think more likely this has something to do with wanting to regain some PR after the debacle with the Shadow Brokers.

TE: I understand why you’re using Ghidra as part of your job for research. But what made you decide to teach about it? What was the motivation behind that?

CY: I had some discussions with people within InfoSec on Reddit and Discord channels. It seemed like there were a lot of people that were really excited about the fact that Ghidra is there, but a lot of people seemed intimidated about actually shifting over.

So, I thought that this would be a good opportunity to do some handholding and show not just “These are the features of Ghidra” but also “Here is how you use these features to perform these various tasks.” So, in the class, we’ll look over a Mirai sample and identify “Here’s how you can see where there’s some suspicious activity that they’re obfuscating some functionality.” And if we dig into it through these techniques, we can ultimately recover the cryptography keys that are being used to protect the settings. And I walk the class through how to then recover from a real-world Mirai malware sample.

TE: I tend to think of reverse engineering in the context of vulnerability discovery. But you bring up an example of reverse engineering malware as a means to defending against it more effectively. And that’s a different use case.

CY: Yeah, certainly. Much of what I do is related to looking for code paths of new vulnerabilities, but in my IOT research especially, I run some honeypots, and I will from time to time get interesting samples, and they tend to be somewhat easy to analyze. But every once in a while, you can see other exploits that are being delivered through the same malware. So, you can learn things from that. You can also learn things about the owner of the infrastructure that was being used for the malware campaigns.

TE: Now, people still use the commercial tools. What is it that Ghidra’s missing?

CY: In a nutshell, the, the biggest missing feature would be debugging and emulation support. If you’re using IDA Pro, for example, I could load up a binary from an IOT device, and I can then instruct IDA to connect to a debugging server that I’ve loaded on that device and interactively step through the disassembly. Ghidra does not currently have that, but it’s actually interesting to see that they’ve been talking about adding debugging support, meaning the ability to actively debug a binary that you’re reverse engineering. They’ve been talking about this for some time, and now in the last couple of months, I saw that they’ve actually pushed a repo into Git which seems to have support for debugging, and in the notes there, they’re also indicating that this will be growing to introduce support for emulation, as well.

Thoughts on a Career Doing Vulnerability Research

TE: If you think back to when you started doing vulnerability research, how were you spending your time in comparison to how you spend it now?

CY: When I started my vulnerability research, it was a lot more fuzzing-oriented. It was wildly effective for many, many years. And then the advent of more sophisticated fuzzing tools that would do generational fuzzing just kept on getting results for a long time. But over time, these types of flaws that you can find through fuzzers, they become less and less. And you start getting into other kinds of flaws that become rather difficult to see without more involved analysis.

TE: Alright. I really want to thank you for joining us. I think it was a really interesting conversation. I hope everybody who’s listening found it interesting as well. We’ll look forward to more material coming from you both on the vulnerability research side and on the tool usage and teaching side.

CY: Thank you, Tim. I really enjoyed it, as well.

TE: And thanks everyone for joining us. I hope you’ll join us for the next episode of the Tripwire Cybersecurity Podcast.





Source link