CH 0x07 Disassembling & Decompilation
CH 0x07 Disassembling & Decompilation
CH 0x07 Disassembling & Decompilation
p. wardle
📝 Note:
This book is a work in progress.
You are encouraged to directly comment on these pages ...suggesting edits, corrections,
and/or additional content!
To comment, simply highlight any content, then click the icon which appears (to the
right on the document’s border).
1
The Art of Mac Malware: Analysis
p. wardle
Mach-O binaries, by definition, are “binary” ...meaning that while readily readable by
computers, their compiled binary code is not designed to be directly readable by humans.
As the vast majority of Mach-O malware is solely “available” in this compiled binary form
(i.e. its source code is not available), we as malware analysts rely on tools that are
able to extract meaningful information from such binaries.
In the previous chapter we covered various static analysis tools that can aid in the
triage of unknown Mach-O binaries. However, if we truly want to comprehensively
understand a Mach-O binary (for example a specimen that appears to be a new piece of Mac
malware), other more sophisticated tools are required.
Advanced reverse-engineering tools offer the ability to disassemble, decompile (and even
dynamically debug) binaries. In this chapter, we’ll stick to the static analysis
approaches of disassembling and decompilation (though in later chapters we’ll cover
dynamic debugging as well). While these tools require at least an elementary
understanding of low-level reversing concepts (such as assembly code), and may lead to
time-consuming analysis sessions, their analysis abilities are invaluable and unmatched.
Even the most sophisticated malware specimen is no match for a skilled analyst wielding
these tools!
Before discussing the specifics of disassemblers and decompilers, a brief foray into
assembly code is required.
📝 Note:
Entire books have been written on the topics of disassembling binary code and the
assembly language.
Here, we provide only the basics (and take some liberties in simplifying various
concepts), and assume the reader is familiar with various basic reversing concepts
(such as registers, etc.).
2
The Art of Mac Malware: Analysis
p. wardle
Software (including malware) is written in a programming language ...an unambiguous
“human friendly”-ish language that may then be translated (compiled) into binary code.
Scripts that we discussed in Chapter 0x5 (“Non-Binary Analysis”), are not compiled per
se, but rather “interpreted” at runtime into commands or code that the system
understands.
As noted, when analyzing a compiled Mach-O binary suspected of being malicious, the
original source code is generally not available. We must leverage a tool that can
understand the compiled binary machine-level code, and translate it back into something
more readable: assembly code! This process is known as disassembling.
At its core, a disassembler takes as input a compiled binary (such as a malware sample)
and performs this translation back into assembly code. Of course, it’s up to us to make
sense of the provided assembly!
📝 Note:
There are various “versions” of assembly. We’ll focus on x86_64 (the 64-bit version of
the x86 instruction set), the System V ABI (calling convention) with Intel syntax.
...as this is the (current) instruction set and calling convention of macOS!
Assembly instructions are “represented by a mnemonic which [is], often combined with one
or more operands” [3]. Mnemonics generally describe the instruction:
add add rax, 0x100 Adds the second operand (e.g. 0x100) to the first.
mov mov rax, 0x100 Moves the second operand (e.g. 0x100) into the first.
jmp jmp 0x100000100 Jump to (i.e. continue execution at) the address in the
operand.
call call rax Execute the subroutine specified at the address in the
operand.
Generally, operands are either registers (a named memory ‘slot’ on the CPU) or numeric
values. Some of the registers you’ll encounter while reversing a 64-bit Mach-O binary
3
The Art of Mac Malware: Analysis
p. wardle
include, rax, rbx, rcx, rdx, rdi, rsi, rbp, rsp, and r8 - r15. As we’ll see shortly,
oftentimes specific registers are consistently used for specific purposes, which
simplifies reverse-engineering efforts.
📝 Note:
All 64-bit registers can also be “referenced” by their 32-bit (or smaller) components
...which you’ll (still) come across during binary analysis.
“All registers can be accessed in 16-bit and 32-bit modes. In 16-bit mode, the register
is identified by its two-letter abbreviation from the list above. In 32-bit mode, this
two-letter abbreviation is prefixed with an 'E' (extended). For example, 'EAX' is the
accumulator register as a 32-bit value.
Similarly, in the 64-bit version, the 'E' is replaced with an 'R' (register), so the
64-bit version of 'EAX' is called 'RAX'.” [3]
Before we wrap up our (cursory) discussion of assembly code, let’s briefly discuss
calling conventions. This will give us an understanding of how API (method) calls are
made, how arguments are passed in, and how the response is handled ...in assembly code.
Why is this relevant? Well, one can often gain a fairly comprehensive understanding of a
Mach-O binary by simply studying the system API methods it invokes. For example, a
malicious binary that makes a call to a “write file” API method, passing in both a
property list and path that falls within the ~/Library/LaunchAgents directory, is likely
persisting as a launch agent!
Thus, we often don’t need to spend hours understanding all assembly instructions in a
binary, but instead can focus on the instructions “around” API calls to understand:
To facilitate the explanation of calling conventions and method calls (at the assembly
level), we’ll focus on a snippet Objective-C, which creates a NSURL object that is
initialized with "www.google.com":
01 //url object
02 NSURL* url = [NSURL URLWithString:@"www.google.com"];
When a program wants to invoke a method or a system API call, it first needs to “prepare”
4
The Art of Mac Malware: Analysis
p. wardle
the arguments for the call. In the source code above, when invoking the URLWithString:
method (which expects a string object as its only argument), the Objective-C code passes
in the string "www.google.com".
At the assembly level, there are specific “rules” about how to pass arguments to a method
or API function. This is referred to as the calling convention. The rules of the calling
convention are articulated in an Application Binary Interface (ABI), and for 64bit macOS
system are as follows:
Argument Register
5th argument r8
6th argument r9
Thus, once a call is identified in the disassembly (by the call mnemonic), looking
backwards in the assembly code will reveal the values of the arguments passed to the
method or API. This can often provide valuable insight into the code’s logic (i.e. what
URL a malware sample is attempting to connect to, the path of a file it’s opening, etc.).
And what about when the call instruction returns? Consulting the ABI reveals that the
return value of the method or API call will always be stored in the rax register. Thus
once the NSURL’s URLWithString: method call returns, the newly constructed NSURL object
will be in the rax register.
As the rax register holds the return value when the call instruction completes, you’ll
often see disassembly with a call instruction, immediately followed by instructions
checking and taking an action based on the result of the value in the rax register. For
example (as we’ll see shortly) a malicious sample choosing not to infect a system if a
function that checks for network connectivity returns zero (NO/false) in the rax
register.
Something else that is imperative to understand when reversing Objective-C binary code is
5
The Art of Mac Malware: Analysis
p. wardle
the objc_msgSend [4] function.
Recall the following Objective-C code that simply constructs a URL object:
01 //url object
02 NSURL* url = [NSURL URLWithString:@"www.google.com"];
When this code is compiled, the compiler (llvm) will translate this Objective-C call (and
most other Objective-C calls), into code that invokes the objc_msgSend. Or as Apple
explains:
Apple developer documentation contains an entry for this function, stating that it “sends
a message with a simple return value to an instance of a class” [4]:
As the vast majority of Objective-C calls are routed through this function, it is
imperative to understand it when reversing compiled Objective-C code. So, let’s break it
down!
6
The Art of Mac Malware: Analysis
p. wardle
First, what does “sends a message ...to an instance of a class” even mean? Simply put,
this means invoking (calling) an object’s method.
📝 Note:
The Objective-C runtime is based on the notion of sending messages, and other rather
unique object originated paradigms.
For an in-depth discussion of the Objective-C runtime and its internals, consult the
following by nemo:
■ The first parameter (self) is “a pointer that points to the instance of the class
that is to receive the message” [4]. Or more simply put, it’s the object that the
method is being invoked upon. If the method is a class method, this will be an
instance of the class object (as a whole), whereas for an instance method, self
will point to an instantiated instance of the class as an object.
■ The second parameter, (op), is “the selector of the method that handles the
message” [4]. Again, more simply put, this is just the name of the method.
■ The remaining parameters are any values that are required by the method (op).
Recall that the ABI defines how arguments are passed to a function call. As such, we can
map exactly which registers will hold objc_msgSend’s arguments at time of invocation:
1st argument rdi self: object that the method is being invoked upon
7
The Art of Mac Malware: Analysis
p. wardle
Of course the registers rdx, rcx, r8, r9, are only used if the method being invoked
requires them (for arguments). For example, a method that only takes one argument will
only utilize the rdx register.
Also, like any other function or method call, once the call to o bjc_msgSend completes,
the rax register will hold the return value (which is actually the return value from the
method that was invoked).
This wraps up our very brief discussion on assembly language basics. Armed with
foundation understanding of this low-level language, let’s now look at disassembling
binary code.
Disassembling
Disassembling involves converting binary code (1s and 0s) back into assembly
instructions. This assembly code can then be analyzed to gain a comprehensive
understanding of the binary. A disassembler (discussed shortly) is a program that is able
to perform this translation and facilitate the analysis of compiled binaries.
Here, we’ll discuss various disassembling concepts, illustrated via real world examples
(taken directly from malicious code). It is important to remember that generally
speaking, the goal of analyzing a malicious code is to gain a comprehensive understanding
of its logic and capabilities ...not necessarily to understand each and every assembly
instruction. As we noted earlier, focusing on the logic around method and function calls
can often provide an efficient means to gain such an understanding.
As such, let’s briefly look at an example of disassembled code in order to illustrate how
to identify such calls, the parameters, and the (API) response. The end result? A
comprehensive understanding of the disassembled code snippet.
Malware sometimes contains logic to check if its host is connected to the internet. If
the infected system is offline, the malware will often wait (sleep) before trying to
connect to its command and control server for tasking.
A specific example of malware that checks for network connectivity is OSX.Komplex [7],
which contains a function named connectedToInternet. By studying the disassembled binary
code of this nation-state backdoor, we can confirm this function indeed checks if the
infected system is online as well as understand how it accomplishes this check.
8
The Art of Mac Malware: Analysis
p. wardle
Specifically our analysis will reveal the malware checks for network connectivity via
Apple’s NSData class, invoking the dataWithContentsOfURL: method [8]. If a remote URL
(www.google.com) is not reachable (i.e. the infected system is offline), the call will
fail, indicating the system is offline.
01 connectedToInternet() {
02
03 //url object
04 NSURL* url = [NSURL URLWithString:@"www.google.com"];
Previously we mentioned that Objective-C methods calls are “translated” into calls to the
the objc_msgSend function. Thus, it’s unsurprising to see a call to this function in the
disassembly:
01 connectedToInternet
02
03 ;move a pointer to the NSURL class into rdi
04 ; the rdi register holds the first parameter (‘self’)
05 mov rdi, qword [objc_cls_ref_NSURL]
06
07 ;move a pointer to the method name ‘URLWithString:’ into rsi
08 ; the rsi register holds the 2nd parameter (‘op’)
09 lea rsi, qword [URLWithString:]
10
11 ;load the address of the url in rdx
12 ; the rdx register holds the 3rd parameter, which is the 1st parameter passed to
13 ; the method being invoked (URLWithString:)
14 lea rdx, qword [_www_google_com]
15
16 ;move a pointer to objc_msgSend into the rax register
17 ; and then invoke it
18 mov rax, cs:_objc_msgSend_ptr
19 call rax
20
21 ;save the response into a (stack) variable named ‘url’
; the rax register holds the result of the method call
mov qword [rbp+url], rax
9
The Art of Mac Malware: Analysis
p. wardle
We also see that the single line of Objective-C code, (NSURL* url = [NSURL
URLWithString:@"www.google.com"],) was translated into several lines of assembly code.
First the parameters are initialized (in the expected registers) for a call to
objc_msgSend, the call is then made, and the result is saved.
Specifically, the rdi register (the first parameter) is loaded with a reference to the
NSURL class. Then, the second parameter (rsi) is loaded with the name of the method:
URLWithString:. Finally rdx is initialized with the string “www.google.com”. Now the
objc_msgSend can be made. Once the call completes, the newly initialized NSURL object is
returned in the rax register and stored into a local variable.
Once a NSURL object has been constructed the malware invokes the NSData’s
dataWithContentsOfURL: method. Again, before looking at the disassembly, let’s construct
a likely representation in Objective-C:
01 //data object
02 // initialized by trying to connect/read to google.com
03 NSData* data = [NSData dataWithContentsOfURL:url];
10
The Art of Mac Malware: Analysis
p. wardle
Similar to the disassembly for the call into NSURL’s URLWithString: method, here we see
the parameters being initialized (in the expected registers) for a call to objc_msgSend,
the call is then made, and the result is saved (into a variable named data).
The malware authors likely wrote something similar to the following Objective-C code to
implement this return-value logic:
01 //set flag
02 // YES (true) if google was reachable
03 isConnected = (data != nil) ? YES : NO;
04 return (int)isConnected;
And how does this look like in (disassembled) assembly code? Glad you asked:
11
The Art of Mac Malware: Analysis
p. wardle
18
19 leave:
20
21 ;move the value into rax
22 ; note: al is the lower byte of rax
23 mov al, byte [rbp+isConnected]
24 and al, 0x1
25 movzx eax, al
26
27 return
First the cmp instruction is used to compare the value of the data variable (returned
from the call to dataWithContentsOfURL). If it’s 0 (nil), the assembly code jumps to the
notConnected label and sets the value of the isConnected variable to 0. Otherwise, if the
dataWithContentsOfURL method returned a non-nil value, the isConnected variable is set to
one.
Finally, the isConnected variable is moved into the rax (eax) register by means of a few
instructions. Such instructions are required to ensure the boolean value is correctly
converted into a (larger) integer value to be returned to the caller.
As is often the case, a few lines of Objective-C code are often expanded into many
assembly instructions, which makes analyzing disassembled code rather time consuming.
However without access to source code, often we have little other choice. And, the
assembly instructions do provide unparalleled insight into the malware’s inner workings
...so much so that often we can completely reconstruct the malware’s code in a
higher-level language. Here for example a complete reconstruction of the
connectedToInternet function:
01 int connectedToInternet()
02 {
03 //result
04 BOOL isConnected = NO;
05
06 //url object
07 // let’s use google.com
08 NSURL* url = [NSURL URLWithString:@"www.google.com"];
09
10 //data object
11 // init’d by trying to connect/read to google.com
12 NSData* data = [NSData dataWithContentsOfURL:url];
12
The Art of Mac Malware: Analysis
p. wardle
13
14 //set flag
15 // YES (true) if google was reachable!
16 isConnected = (data != nil) ? YES : NO;
17
18 return (int)isConnected;
19 }
reconstruction of a connectivity check
SX.Komplex)
(O
Now, let’s walk through the (annotated) disassembly of malware’s code that both invokes
the connectedToInternet function, and then acts upon its response.
01 isConnected:
02
03 ;call the function
04 call connectedToInternet()
05
06 ;check a 0x0 or 0x1 was returned
07 and al, 0x1
08 mov byte [rbp+isConnected], al
09 test byte [rbp+isConnected], 0x1
10
11 ;take this if 0x0 (not connected)
12 jz notConnected
13
14 ;take this if 0x1 (connected)
15 jmp continue
16
17 ;sleep
18 notConnected:
19 mov edi, 0x3c
20 call sleep
21
22 ;check connection (again)
23 jmp isConnected
24
25 continue:
26 ...
onnectedToInternet, and processing the result
invoking c
SX.Komplex)
(O
13
The Art of Mac Malware: Analysis
p. wardle
First the code invokes the connectedToInternet function. As this function takes no
parameters, no register setup is required. Following the call the malware checks if the
return value is 0x0 (NO/false). This is accomplished via a test and a jz (jump zero)
instruction. The test instruction “performs a bitwise AND on two operands” [9] and sets
the zero flag based on the result. Thus if the connectedToInternet function returns a
zero, the jz instruction will be taken, jumping to the notConnected label. Here, the code
invokes the sleep function ...before jumping back to the isConnected label, to check for
connectivity once again. In other words, the malware will wait until the system is
connected to the internet, before continuing on.
With this comprehensive understanding, we can (re)construct this logic in the following
Objective-C code:
01 while(0x0 == connectedToInternet()) {
02 sleep(0x3c);
03 }
...in Objective-C
Of course not all Mac binaries (including malware) are written in Objective-C. Let’s look
at another (abridged and annotated) snippet of disassembly - this time from a Lazarus
Group first-stage implant loader (originally written in C++) [10]. Specifically, we’ll
walk through a snippet of assembly code from a function named getDeviceSerial:
01 ;function: getDeviceSerial(char*)
02 ; first arg (rdi): output buffer ...for device serial #
03 ; return (rax): status (success/error)
04
05 ;move pointer to output buffer into r14
06 mov r14, rdi
07
08 ;move kIOMasterPortDefault into r15 register
09 mov rax, qword [_kIOMasterPortDefault]
10 mov r15d, dword [rax]
11
12 ;invoke IOServiceMatching
13 ;1st arg (rdi): the string "IOPlatformExpertDevice"
14 lea rdi, qword [IOPlatformExpertDevice]
15 call IOServiceMatching
16
17 ;invoke IOServiceGetMatchingService
14
The Art of Mac Malware: Analysis
p. wardle
...definitely a more sizable chunk of assembly code! But not to worry, we’ll walk through
it in detail.
First, observe that the disassembler has extracted function declaration, which (luckily
for us) includes its original name as well as the number and format of its parameters.
From the name, getDeviceSerial, let’s assume (though we’ll also validate) that this
function will retrieve the serial number of the infected system. Since the function takes
as its only parameter, a pointer to a string buffer (char*), it seems reasonable to
assume the function will store the extracted serial number in this buffer (so that it is
available to the caller).
15
The Art of Mac Malware: Analysis
p. wardle
Starting at line #06, we see the function first moves this argument (recall rdi always
holds the 1st argument), the address of the output buffer, into the r14 register. Why? As
noted, the rdi register is initialized with the first argument for any function call. If
the getDeviceSerial function makes any other calls (which it does), the rdi register will
have to be reinitialized (for those other calls). Thus, the function must ‘save’ the
address of the output buffer into another (non-used) register, so that this address may
be used later ...for example, at the end of the function when it’s populated with the
extracted serial number.
The function then (lines #09 - 10) moves a pointer to kIOMasterPortDefault into rax, and
dereferences it into the r15 register. According to Apple developer documentation, the
kIOMasterPortDefault is “The default mach port used to initiate communication with
[11] Seems likely the malware will be communicating with IOKit as the means to
IOKit.”
extract the infected device’s serial number.
In lines 14 and 15, the function getDeviceSerial makes its first call into an Apple API:
the IOServiceMatching function. Apple notes this function creates “a matching dictionary
that specifies an IOService class match” taking in a single parameter, and returning the
matching dictionary [12]:
OServiceMatching function
the I
We know that when making a call to a function or method, the rdi register holds the first
argument. In line #14, we see the assembly code initialize this register with the value
16
The Art of Mac Malware: Analysis
p. wardle
of “IOPlatformExpertDevice”. In other words, it’s invoking the IOServiceMatching function
with the string “IOPlatformExpertDevice”.
Once the matching dictionary has been created, the code invokes the
IOServiceGetMatchingService function (line # 22). Apple documents state that this
OService object that matches a matching dictionary.”
function will “look up a registered I
[14]. For parameters, it expects a master port and a matching dictionary:
OServiceGetMatchingService function
the I
On line #20, the assembly code moves a value from the r15 register into the edi register
(the 32bit part of the rdi register). Looking back to line numbers 9-10, we see the code
previously moving the kIOMasterPortDefault into the r15 register. The code on line #20 is
simply moving kIOMasterPortDefault into the edi register (as the first argument for the
call to IOServiceGetMatchingService).
On line #21, we see rax being moved into the rsi register (recall the rsi register is
used as the 2nd parameter for function calls). And (following a function call), the rax
register holds the result of the call. This means the rsi register will contain the
matching dictionary from the call to IOServiceMatching (made on line #15).
17
The Art of Mac Malware: Analysis
p. wardle
Next, the code sets up the parameters for a call to the IORegistryEntryCreateCFProperty
function, which Apple documentation states “creates an instantaneous snapshot of a
registry entry property.” [14] In other words, the code is extracting the value of some
(IOKit) registry property. But which one?
The parameter setup for the call to the IORegistryEntryCreateCFProperty function begins
by loading the kCFAllocatorDefault into the rdx register (lines #29-13). The rdx
register is used for the 3rd argument, which for the call to
IORegistryEntryCreateCFProperty is the “allocator to use” [12].
18
The Art of Mac Malware: Analysis
p. wardle
Next (line #32), the address of the string “IOPlatformSerialNumber” is loaded into the
rsi register. As the rsi register is used for the 2nd argument, this (according to
Apple’s documentation for the IORegistryEntryCreateCFProperty function) is the property
name of interest!
On line #33, rcx, the 4th argument (“options”), is initialized to zero (xoring of
oneself, sets oneself to zero). Finally, before making the call, the value from r15d is
moved into the 32bit part of the rdi register (edi). This has the effect of initializing
the first parameter (rdi) with the value of kIOMasterPortDefault (previously stored in
r15d).
After the call to IORegistryEntryCreateCFProperty, the rax register will hold the value
of the required property: IOPlatformSerialNumber.
Finally, the function invokes the CFStringGetCString function to convert the extracted
property (which is (CF)string object) to a plain null-terminated “C-string”. Of course,
the parameters have to be initialized prior to this call (lines #42-45).
The edx register (the 32bit part of the rdx, argument #3) is set to 0x20, which specifies
the output buffer size. Then the ecx register (the 32bit part of the rcx, argument #4) is
set to the kCFStringEncodingUTF8 (0x8000100). The first argument (rdi) is set to the
value of rax, which is the result of the call to IORegistryEntryCreateCFProperty: the
extracted property value of IOPlatformSerialNumber.
Finally, the 2nd argument (rsi) is set to r14. And what is in the r14 register? Scrolling
back all the way to line #6, we see it comes from rdi, which is (was) the value of the
parameter passed to the getDeviceSerial. Since Apple’s documentation for
CFStringGetCString states the 2nd argument is the “C string buffer into which to copy the
[15] we now know the parameter passed to the getDeviceSerial function is a
string,”
buffer for a string!
This completes our (very thorough!) analysis of the malware’s getDeviceSerial function.
By focusing on the API calls made by this function, we were able to ascertain its exact
functionality: the retrieval of the infected system’s serial number
(IOPlatformSerialNumber) via IOKit. Moreover, via parameter analysis, we were able to
determine that the getDeviceSerial function would be invoked with a buffer for the serial
number.
However at this point, we can all agree that reading assembly code is rather tedious.
Luckily, due to recent advances in decompilers, there is hope!
19
The Art of Mac Malware: Analysis
p. wardle
Decompilation
Given a binary, such as a Mach-O, a disassembler can parse the file and translate the
binary code back into human-readable assembly, thus allowing detailed analysis to
commence.
Decompilers seek to take this translation one step further by recreating a source-code
level representation of extracted binary code. Source-code (i.e. C or Objective-C)
representation is both more succinct and “readable” than (dis)assembly, making analysis
of unknown binaries a simpler task.
Recall the getDeviceSerial function from the Lazarus Group first-stage implant loader.
The full disassembly of this function is about 50 lines. The decompilation? ...around 15:
getDeviceSerial decompiled
The decompilation is quite readable, and thus it is relatively easy to understand the
logic of this function!
01 int connectedToInternet()
20
The Art of Mac Malware: Analysis
p. wardle
02 {
03 if( (@class(NSData), &@selector(dataWithContentsOfURL:), (@class(NSURL),
04 &@selector(URLWithString:), @"http://www.google.com")) != 0x0)
05 {
06 var_1 = 0x1;
07 }
08 else {
09 var_1 = 0x0;
10 }
11 rax = var_1 & 0x1 & 0xff;
12 return rax;
13 }
connectedToInternet decompiled
SX.Komplex)
(O
📝 Note:
Taking into consideration the many benefits of decompilation over disassembly, one may
be wondering why disassembling was discussed at all.
First, even the best decompilers occasionally struggle to analyze complex binary code
(such as malware with anti-analysis logic). Disassemblers that simply translate binary
code (vs. attempt to (re)create source-code level representations) are far less
susceptible. Thus, “dropping down” to the assembly level code provided by the
disassembler may be the only option.
While decompilation can greatly simplify the analysis of binary code, the ability to
understand (dis)assembly code is arguably a foundational skill in comprehensive malware
analysis.
So far, we’ve discussed the concepts of disassembly and decompilation without mentioning
specific tools which provide these services. Such tools can be somewhat complex and thus
a bit daunting to the beginner malware analyst. As such, here we’ll briefly discuss one
such tool (Hopper), providing a high-level, hands-on “quick start” guide to binary
analysis!
21
The Art of Mac Malware: Analysis
p. wardle
“reverse engineering tool that lets you disassemble, decompile and debug your
applications.” [16]
Reasonably priced and designed natively for macOS, Hopper boasts a powerful disassembler
and decompiler that excels at analyzing Mach-O binaries. It’s a solid choice for Mac
malware analysis.
📝 Note:
A free demo version of Hopper is available from:
https://www.hopperapp.com/download.html
In this brief introduction to Hopper, we’ll disassemble and decompile Apple’s standard
“Hello World” (Objective-c) code:
01 #import <Foundation/Foundation.h>
02
03 int main(int argc, const char * argv[]) {
04 @autoreleasepool {
05 // insert code here...
06 NSLog(@"Hello, World!");
07 }
08 return 0;
09 }
Apple’s “Hello World” template code
Though trivial it affords us with an example binary, sufficient for illustrating many of
Hopper’s features and capabilities. An understanding of such features and capabilities,
of course, is imperative for the analysis of more complex (malicious) binaries.
We start by compiling the above Objective-C code, and confirm it is now (as expected), a
standard 64-bit Mach-O binary:
22
The Art of Mac Malware: Analysis
p. wardle
$ file helloWorld/Build/Products/Debug/helloWorld
First, launch Hopper.app. To start analysis of our helloWorld (or any) Mach-O binary
simply choose: File -> Open (⌘+O). Select the Mach-O binary for analysis and in the
loader window that is shown leave the defaults selected, and click ‘OK’:
Loader Window
(Hopper.app)
Once its analysis is complete, Hopper will automatically display the disassembled code at
the binary’s entry point (extracted from the LC_MAIN load command in the Mach-O header).
...but first, let’s look at various information and options within the Hopper UI.
On the far right is the “inspector” view. This is where Hopper displays general
information about the binary being analyzed, including the type of binary (Mach-O),
architecture/CPU (Intel x86_64), and calling convention (System V):
23
The Art of Mac Malware: Analysis
p. wardle
On the far left, is a segment-selector that can toggle between various views related to
symbols and strings in the binary. For example, the “Proc.” view shows procedures that
Hopper has identified during its analysis. This includes functions and methods from the
original source code, as well as APIs that the code invokes. For example, in our “hello
world” binary, Hopper has identified the main function and the call to Apple’s NSLog API:
24
The Art of Mac Malware: Analysis
p. wardle
procedure view
(Hopper.app)
The “Str” view shows the embedded strings that Hopper has extracted from the binary. In
our simple binary, the only embedded string is “Hello, World!”:
25
The Art of Mac Malware: Analysis
p. wardle
Before diving into the disassembly, it’s wise to peruse the extracted procedure names and
embedded strings as they are often an invaluable source of information about the
(possible) capabilities of the malware. Moreover, they can guide analysis efforts. Does a
procedure name or embedded string look of interest? Simply click on it, and Hopper will
show you exactly where it’s referenced in the binary.
By default, Hopper will automatically display the disassembly of the binary’s entry point
(often the main function). Here’s the disassembly of the main function in it its
entirety:
...fairly standard (dis)assembly. However, Hopper does provide helpful annotations such
as identifying function names (i.e. mapping imp__stubs__NSLog to NSLog). Moreover, as it
also generally understands API prototypes, it will identify function/method parameters
and annotate the assembly code as such.
For example, for the assembly code at address 0x0000000100000f42 which moves the rcx
register (a pointer to our “Hello, World!” string) into rdi, Hopper has identified this
as initializing the arguments for a call to NSLog (a few lines later).
Various components within the disassembly are actually pointers to data elsewhere in the
binary. For example, the assembly code at 0x0000000100000f3b (lea rcx, qword
[cfstring_Hello__World_]) is loading the address of the “Hello, World!” string into the
rcx register.
26
The Art of Mac Malware: Analysis
p. wardle
This string object (of type CFConstantString) itself contains pointers ...and
double-clicking on those again takes you to the specified address.
01 aHelloWorld:
02 0x0000000100000fa2 db "Hello, World!", 0 ; DATA XREF=cfstring_Hello__World_
Note that Hopper also tracks (backwards) cross-references! For example, it has identified
that the string bytes (at address 0x0000000100000fa2) are cross-referenced by the
cfstring_Hello__World_ variable. That is to say, the cfstring_Hello__World_ variable
contains a reference to the 0x0000000100000fa2 address.
Such cross-references greatly facilitate static analysis of the binary code. For example,
if you notice a string of interest, you can simply ask Hopper where in the code that
string is referenced. To view such cross-references, control-click on the address or item
and select “References to …” ...or with the address/item selected simply hit “X”.
For example, say we want to see where in disassembly, the “Hello World!” string object is
referenced. First we select the string object (at address 0x0000000100001008),
control-click to bring up the context menu, and “References to cfstring_Hello__World”:
27
The Art of Mac Malware: Analysis
p. wardle
cross references
(Hopper.app)
In this example there is only one cross-reference, the code at address 0x0000000100000f3b
(which falls within the main function). Click on this to jump to the code in the main
function, which references the “Hello World” string object:
28
The Art of Mac Malware: Analysis
p. wardle
Hopper also creates cross-references for functions, methods, and API calls so that you
can easily determine where in code these are invoked. For example, we can see via the
following “Cross References” window that the NSLog API is invoked within the main
function, specifically at 0x0000000100000f4b:
29
The Art of Mac Malware: Analysis
p. wardle
When bouncing around in Hopper (for example following pointer or cross-references), one
often wants to quickly return to a previous spot of analysis. Luckily the “esc” key is
mapped to “back” and will take you back to where you just were, or further (on multiple
key presses).
So far we’ve stayed in Hopper’s default display mode: “Assembly Mode.” As the name
suggests, this mode displays (dis)assembly of binary code. The display mode can be
toggled via a segment control found in Hopper’s main toolbar:
display modes
(Hopper.app)
■ Assembly mode:
The standard disassembly mode, in which Hopper “prints the lines of assembly code,
[15]
one after the other.”
■ Pseudo-Code mode:
This is Hopper’s decompiler mode, in which a “source-code like” or pseudo-code
representation is generated.
■ Hex mode:
This mode shows the raw hex bytes of the binary, which is about as low-level as you
can get!
30
The Art of Mac Malware: Analysis
p. wardle
Of the four display modes, the pseudo-code (decompiler) mode is arguably the most
powerful. To enter this mode, first select a procedure, then click on the 3rd button in
the Display Modes segment control:
This will instruct Hopper to decompile the code in the procedure in order to generate a
pseudo-code representation of the binary code. For our simple example “Hello World”
program, it does a lovely job:
01 #import <Foundation/Foundation.h>
02
03 int main(int argc, const char * argv[]) {
04 @autoreleasepool {
05 // insert code here...
06 NSLog(@"Hello, World!");
07 }
08 return 0;
09 }
Apple’s “Hello World”
...thus, making the binary analysis (of this trivial binary) a breeze!
31
The Art of Mac Malware: Analysis
p. wardle
This wraps up our overview of the Hopper reverse-engineering tool. While brief, it
provides the basics to begin reversing Mach-O binaries!
📝 Note:
For a more comprehensive “how to” on using and understanding Hopper, check out the
application’s official tutorial:
https://www.hopperapp.com/tutorial.html [16]
Up Next
Armed with a solid understanding of static analysis techniques, ranging from basic file
type identification to advanced decompilation, we’re now ready to turn our attention to
methods of dynamic analysis. As we’ll see, such dynamic analysis often provides a more
efficient means of performing malware analysis.
Ultimately though, static and dynamic analysis are complementary; their combination
provides the ultimate analysis approach.
32
The Art of Mac Malware: Analysis
p. wardle
References
4. objc_msgSend function
https://developer.apple.com/documentation/objectivec/1456712-objc_msgsend
11. “kIOMasterPortDefault”
https://developer.apple.com/documentation/iokit/kiomasterportdefault?language=objc
12. “IOServiceMatching”
https://developer.apple.com/documentation/iokit/1514687-ioservicematching?language=
objc
13. “IOServiceGetMatchingService”
https://developer.apple.com/documentation/iokit/1514535-ioservicegetmatchingservice
33
The Art of Mac Malware: Analysis
p. wardle
?language=objc
14. “IORegistryEntryCreateCFProperty”
https://developer.apple.com/documentation/iokit/1514293-ioregistryentrycreatecfprop
erty?language=objc
15. “CFStringGetCString”
https://developer.apple.com/documentation/corefoundation/1542721-cfstringgetcstring
?language=objc
16. Hopper
https://www.hopperapp.com/
34