Yang20 MSC
Yang20 MSC
Yang20 MSC
Reverse Engineering of an
Obfuscated Binary
Kaishuo Yang
Abstract
Reverse engineering is an important process employed by both attackers
seeking to gain entry to a system as well as the security engineers that protect
it. While there are numerous tools developed for this purpose, they often can be
tedious to use and rely on prior obtained domain knowledge. After examining a
number of contemporary tools, we design and implement a de-noising tool that
reduces the human effort needed to perform reverse engineering. The tool takes
snapshots of a target program’s memory as the user consistently interacts with
it. By comparing changes across multiple sets of snapshots, consistent changes
in memory that could be attributed to the user action are identified. We go on
to demonstrate its use on three Windows applications: Minesweeper, Solitaire
and Notepad++. Through assistance from the de-noising tool, we were able to
discover information such as the location of mines and values of cards in these
two games before they are revealed, and the data structure used for input to
Notepad++.
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 History of Reverse Engineering . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4 Case Studies 35
4.1 Experimental Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Minesweeper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.1 Goals of the analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
i
4.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Notepad++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.1 Goals of the analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Solitaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.1 Goals of the analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5 Conclusion 51
5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
ii
Chapter 1
Introduction
1.1 Motivation
As modern software has become more sophisticated, the potential damage resulting from a
hacker gaining unauthorised access has also increased. In response, developers create better
security mechanisms while hackers find new ways around them, leading to a never ending
game of cat and mouse. The use of reverse engineering is crucial for both the hacker trying
to find weaknesses in program code as well as the developer seeking to understand how an
attack is executed. This interesting dynamic can be easily seen in a variety of fields from
malware analysis to cheating in video games.
What this means is that reverse engineers generally aim to analyse a target program, under-
stand how each program component works and then potentially create their own version,
incorporating at least some features or functionality from the original program.
1
testing is often the primary method by which specific domain knowledge is obtained. Suc-
cessfully reverse engineering software requires analysis of information flow within the pro-
gram, commonly done either dynamically while the target program runs, or through static
analysis methods involving disassembly or decompilation of the program executable.
Dynamic analysis involves analysing the program while it is running. This is often
achieved by providing the target program with a controlled set of input data and then
observing how this data is used and possibly transformed by the program. This must be
repeated many times with different data sets in order to achieve a greater understanding of
how the target program works. Both creating the test cases and running them are labour
intensive as test frameworks, often designed for use with source code, cannot be used with
compiled programs
Static analysis involves disassembling the target program and then analysing the assem-
bly code produced to determine how it works. Even with modern disassembly tools it is
still difficult to fully understand a program’s function through disassembly because infor-
mation is lost during the compilation process as source code is turned into machine code
[25]. Important details that often cannot be retrieved from disassembly include:
All of the above need to be reproduced through human intuition and guesswork and as
a result static analysis can be frustrating and tedious for the human engineer.
1.3 Contributions
This thesis makes contributions towards the creation of a noise reduction system used in
reverse engineering. Through a literature review we have identified several shortcomings
in current reverse engineering tools. Based on this analysis, we then design and implement
a prototype tool that meets these gaps. Finally, we evaluated the effectiveness of our tool
for reverse engineering several common Windows applications. Specifically, the main con-
tributions of this thesis are:
• Literature review of reverse engineering methods and tools. After analysing a vari-
ety of contemporary tools, we identified several shortcomings that caused inefficiency
when used to reverse engineer programs:
• Development of a new tool that meets these shortcomings. Having identified prob-
lems with existing tools that complicate the reverse engineering process, we then set
out to design and implement a prototype tool to meet these gaps. The key features of
our tool are:
2
– Usage of intersecting snapshots to enable accurate analysis of program memory
to attribute changes in memory to user actions. The tool first obtained sets of
memory snapshots before and after a user action is performed. By using a process
of intersecting sets, locations that changed consistently are isolated, allowing the
user to accurately see the changes
– Analysis based purely on user actions without need for the user to have prior
knowledge of the system. As it is designed to be snapshot program data while
a user interacts with the program normally, it avoids the domain knowledge re-
quirement set by other tools
– A storage system for memory snapshots that saves snapshot sets with the action
performed to achieve them. This allows future analysis of any experiment data
without loss of information.
• Chapter 4 demonstrates the use of our denoising tool on a number of case studies to
illustrate how a typical reverse engineer would make use of this tool to achieve their
goals
3
4
Chapter 2
5
malware difficult as antivirus programs must first decompress the binary without acciden-
tally executing it before analysis is possible. Other forms of malware obfuscation include
”crypters” that simply encrypt malware without compression, making it difficult for an-
tivirus programs to decrypt and analyse the program. Finally ”protectors”, more commonly
known as ”packers”[67] combine features from both compressors and crypters to create a
encrypted and compressed program that is difficult for other programs, such as antivirus,
to fully analyse. On the other end, ”unpacking” refers to the process by which a packed pro-
gram is decompressed and decrypted to allow effective analysis. As malware has become
more sophisticated [15], new methods to decrypt and statically analyse these programs have
also appeared to counter the rising threat [37]. A number of these examples are discussed
below.
Eureka is a framework designed to assist with static analysis of packed malware by de-
obfuscating the target program and executing it in a controlled virtual environment [59].
During execution, Eureka uses a special kernal driver to detect programs unpacking and
executing its malicious payload. The detection is done by detecting system calls used by
common packing programs. For example, if a program begins to Once the malware has
unpacked its payload, Eureka dumps the process to disk and uses IDA Pro to disassemble
and analyse the code. This provides a visual representation of the program flow, enabling
specialists to better analyse and understand the malware.
AppSpear, an automatic unpacking system designed to unpack obfuscated Android ap-
plications [68] was developed in response to the rise of malware targeting mobile devices.
It decrypts application at the bytecode level and reassembles the bytecode, thus resulting in
the creation of an unobfuscated binary. This binary can then be examined by program anal-
ysis tools or automated malware detection systems. During testing, both the packed and
unpacked binaries were tested for sensitive behaviours which could indicate the presence
of malware. Out of 31 malware samples tested, there were a total of 208 sensitive behaviours
detected in the packed binaries compared to 2493 in the unpacked versions. This system of
unpacking a binary before analysis was thus shown as effective in correctly detecting mali-
cious behaviour.
A novel way to study packed malware through the extraction and analysis of API calls
was presented by Alazab, Venkatraman and Watters [14]. The group obtained a collec-
tion of malware, unpacked each sample to obtain the payload and then extracted API calls
used from the dissembled binary. By extracting API function calls from known malware,
a database of suspicious API calls often used by malware was assembled, which was then
sorted into six main categories: file searching, file copy and deletion, obtaining file informa-
tion, moving files, reading and writing files, changing file attributes. These details can then
be used as signatures for classifying future programs as malware or normal software.
6
to hinder any efforts at reverse engineering it. As a counter to such behaviour, advances
in reverse engineering led to strategies being developed to discreetly monitor programs in
a sandbox environment. A possible strategy is dynamic instrumentation which refers to
programs on the sandbox’s host machine being able to directly monitor the behaviour of
programs within the sandbox (guest). Dynamic instrumentation is discussed in further de-
tail in Section 2.2.
However, not all dynamic analysis is done in a virtual environment [62]. For example,
anti-virus programs must run in realtime on the users computer, analysing and preventing
suspected malware activity. Malware authors often use methods to obfuscate their code so
that the malware is not detected by traditional syntactic anti-virus systems. This is problem-
atic as specialists need to wait for each new executable or variation of the malware to attack
a number of machines in order to identify a file signature. As a result, it is beneficial to ex-
amine a program’s behaviour rather than code signature to determine whether it is malware
or not. A number of ways to dynamically analyse obfuscated malware have been devised,
such as:
• Stealthily debugging malware by breaking its code stream into chunks and debugging
each chunk
One way of identifying malware presence in real time is the identification and detection
of system opcodes likely to be used by such programs [55]. This is similar in concept to
the Warden program discussed in section 2.5.5 using the presence of certain assembly in-
structions (JMP, INT3) to detect unauthorised programs accessing a game’s memory region.
By looking at the frequency different opcode sequences appear in both a collection of mal-
ware and benign programs, we can calculate how likely it is for a given opcode sequence
to appear in a malicious program. During experimental testing done by researchers at the
University of Deusto, opcode sequences of length 1 and 2 were used to compare identified
malware with both their variants as well as benign software. It was found 60% of malware
variants exhibited 90% similarity in these opcode sequences compared to their original ver-
sions. On the other hand, 85% of benign software exhibited 60% or less similarity in opcode
sequences when compared to the original malware. This shows that it is possible to identify
malware at runtime based on the opcodes they use.
Cobra is another framework designed to enable stealthy dynamic analysis of programs
and identify malware as they are executed [63]. It implements a concept of ”stealth localised
execution” where a program’s code stream is broken up into several groups (blocks) of in-
structions at runtime. Specific code constructs, such as debugging commands, can be added
to each block in order to allow analysis of the code. These blocks are then executed in order,
one at a time, to mimic normal program execution while enabling program analysis without
alerting the target program of the presence of the debugger. As a result, malware can be
successfully studied without triggering their self destruct functionality.
Another common tool used for dynamic analysis is the Microsoft Process Explorer tool
which monitors actions performed by active programs in real time [54]. It identifies re-
sources that are being used by a process in order to track down problems with a program or
identify a program as malware if it is accessing unexpected resources. This tool uses Win-
dows kernel-mode functions and is able to divulge information not normally available to
userland debuggers, such as which threads are using the CPU.
7
2.2 Dynamic instrumentation
Dynamic instrumentation refers to the ability to monitor a program’s behaviour and per-
formance while it is running [51]. Tools used for dynamic instrumentation usually aim to
support important features such as step-by-step program execution, debugging [70], perfor-
mance analysis [38], and data and event logging [47]. These tools are often used by security
researchers to monitor malware execution in controlled environments and understand its
mechanisms in order to develop anti-malware tools.
In order to hide their presence, many dynamic instrumentation tools attempt to be in-
visible to the application being debugged, i.e. applications being monitored can only ”see”
operating system components such as main memory and cache without being able to detect
any activity associated with dynamic instrumentation [51].
DynamoRIO [18] is a tool used in the industry for dynamic instrumentation of applica-
tions. It virtualises processes by copying the application’s code into a cache then executing
instructions from the cache along with additional instructions generated for dynamic in-
strumentation. Each memory address that is accessed by the application is either an original
application address, or an instruction cache address that has been translated to an original
application address. Furthermore, any indirect branch targets are automatically converted
from application addresses to addresses in the instruction cache. This results in the dynamic
instrumentation becoming very transparent, and makes it difficult for the application to de-
tect that it is being monitored.
MALT is a newer approach used to debug stealthy malware that operates at system-
level (”ring 0”) which can evade most virtualised and bare-metal debugging attempts [69].
Instead of virtualising the malware as DynamoRIO does, MALT executes at the System Man-
agement Mode (SMM also known as ”ring -2”). SMM is a mode present on x86 CPUs de-
signed to be used by firmware to control very low level features such as power or system
hardware functions. Executing a debugger at SMM level grants it high privileges, includ-
ing being able to suspend even the computer’s operating system. As a result, MALT is
able to debug the malware without displaying artefacts that alert the malware of its pres-
ence. Testing of MALT showed that is it invisible to many popular anti-debugging and anti-
virtualisation strategies while it is also able to implement common debugging tools such as
breakpoints and step-by-step program execution.
ANUBIS [61] is a framework of several tools for dynamic code analysis of unknown pro-
grams. It was created to automate analysis of malware as there were too many samples to
be completely analysed by humans. It works by first running the target unknown binary
in an emulated PC environment and then monitoring its behaviour. Because it is done in a
controlled virtual machine, it is difficult for the malware to detect the presence of a monitor-
ing tool that has full control over its virtual operating environment. Actions such as system
calls and Windows API functions are tracked and then a detailed report is generated. Based
on this report, experts can decide whether or not the target binary requires further detailed
analysis.
8
Figure 2.1: Visualised stack frame during a buffer overflow attack. By writing to the int[]
buffer beyond its capacity, an attacker can overwrite the return address to alter the intended
program flow. During this process, both int p and the frame pointer are also overwritten.
XXXXX represents junk data bytes inserted into the int[] buffer in order to get to the return
address.
commands at the same privilege level of the targeted, vulnerable program [36]. As a result,
detecting buffer overflow is important as it is an example of detecting possible intrusions
made by an attacker. For example, consider the code fragment below:
f ( int p ){
int [] buffer
int x
...
During execution as shown in Fig 2.1, an attacker has overwritten the function’s return
address by overflowing the int[] buffer. This changes the intended program flow and it is
often difficult to detect the presence of such an attack.
Due to the risks associated with such attacks, detecting and preventing buffer overflow
vulnerabilities is a commonly studied topic in software security. A wide range of defensive
mechanisms have been developed, such as runtime analysis to detect invalid malloc and free
calls, static analysis to determine the safety of each pointer used [71], and finally the use of
dummy values to detect unwanted memory overwriting. The use of dummy (canary) values
to detect overflow, implemented in the ”StackGuard” patch for the gcc compiler [2], has also
been used commercially by Blizzard to detect memory access in World of Warcraft. When a
program is compiled, StackGuard adds an otherwise useless ’canary’ memory block before a
function’s return address. Consider the code fragment above, but this time StackGuard has
been used during compilation which implants canary values in the program memory space
and maintains a record of these canary values elsewhere. An attacker attempts a buffer
overflow attack as shown in Fig 2.2, again overflowing the int[] buffer in order to corrupt
the return address. However this time, they have to overwrite the canary value before the
return address. The program is able to detect a change in the canary value thanks and self
terminates to prevent the attack.
In games such as ”World of Warcraft”, the game security system uses a more advanced
type of canary value known commonly as ”trapped memory” segments. These regions of
memory appear to contain relevant game information, such as player location data, but are
actually just non functional and often unencrypted copies of the real data. Any attempts to
modify these ”trapped” regions of memory results in the detection of unauthorised access
due to the mismatch between the data stored in the ”trapped” region and the memory ad-
9
Figure 2.2: Visualised stack frame during a buffer overflow attack with StackGuard protec-
tion. The attacker has, just like in Figure 2.1, attempted to attack the program by overflowing
int[] buffer in order to alter the return address. However this time, they have also over-
written the ”canary value” provided by StackGuard. As the program maintains a record of
expected canary values and locations, it notices that this canary value has changed and thus
self terminates to stop the attack.
dresses containing the real game data. Attempts in evading such buffer overflow detection
techniques have included identifying and saving canary values prior to an attack, and re-
instating them after the return address has been overwritten. However, due to the recent
heavy obfuscation of the such security systems, identifying trapped memory is difficult and
as a result similar strategies used in the past to counter this protection are no longer effective.
In addition to the approach of using canary values to detect buffer overflow, there have
also been ways to protect against overflow attacks without modifying the source code. Both
CCured [44] and a later development, Softbound [43], aim to detect overflow attacks that
modify the expected pointer values of applications. At compile time, pointer base and
bound metadata are stored in a separate storage system and runtime checks inserted into
code. These checks cause the storage to be checked upon load and store of pointer values
to ensure there has been no unexpected modification. During testing all forms of spatial
violations were detected showing that this system of checking pointer values is an effective
defence against buffer overflow attacks. However, the overhead from using such a system
was also significant at up to 127% with Softbound and 83% with CCured.
10
Figure 2.3: IDA Pro Control Flow Graph resulting from dissembling a function containing
multiple if statements [50]
in the disassembly process [60]. Interactive disassemblers, as their name suggests, actively
interact with the machine code while disassembling it. This means that they make use of
dynamic analysis tools such as debuggers alongside the traditional disassembly methods to
actively connect related code segments within the program [3], resulting in a graphical view
of the code’s structure. A control-flow graph, such as the one shown in Figure 2.3, may be
generated during this process can be used to aid in further understanding of the program’s
components.
On the other hand, a decompiler seeks to generate high level (such as C or Java) code
based on a program’s binary or assembly code. Even for a single language, there are many
different versions of programs that could result in the same, or very similar, assembly code.
For example, switch, nested if and multiple jump statements could all be functionally iden-
tical and result in a series of JMP assembly instructions. As a result, a decompiler can only
make an educated guess based on the control-flow analysis of the program when it is pro-
ducing output and any code produced will only be a possible version of the program with-
11
out any guarantees of resemblance to the real source code. Furthermore, obfuscation such
as the stripping of symbols means that it is difficult to identify variables and function names
as well as establish meaningful connections between them [26].
Ollydbg is a x86 debugger often used for reverse engineering of programs [56] with a
built-in disassembly and assembly system. It is popular amongst both engineers seeking
to verify correct operation of their own software as well as hackers looking to crack pro-
grams made by other developers. Ollydbg traces most program actions such as system and
API calls as well as provides a user friendly interface for working with program memory.
By providing these features, a reverse engineer is able to disassemble a piece of software,
modify it then reassemble it.
Another approach to reverse engineering includes the use of binary analysis tools to ex-
amine code at a binary level. By investigating features such as data types and control paths
in a binary, it is possible to gain insight into the otherwise hard to analyse workings of a
program without having access to its source code [41]. This is not an easy task as a vari-
ety of factors make binary code complex and difficult to understand. These include both
incidental causes such as compiler settings and runtime libraries, as well as deliberate mea-
sures taken to obfuscate binary code. For example, a tool called Zipr [29] rewrites binaries
efficiently to create an obfuscated binary whose CPU and memory footprint is less than 5%
more than the original [31]. This results in a program that is functionally identical to the
original, but has vastly different binary code, making it harder to perform binary analysis.
As a result, a number of innovative ways to perform binary analysis have been created.
Examples of newer binary analysis tools able to defeat obfuscation include BINSEC/SE
and BinGold. BINSEC/SE [21] is a new dynamic symbolic execution engine that enables
modular analysis of target binaries. It is able to perform functions such as obtain func-
tion parameters and return values from a binary and worked well in analysing and reverse
engineering malware as well as deobfuscate binaries. While different in implementation,
BinGold [30] is another binary analysis tool that defeats obfuscation by examining both data
and control flow to find similarities between binaries. Because data and control flow are
generally unaffected by light obfuscation or refactoring, BinGold performs better at tasks
such as identifying program authorship and presence of copied components.
While most binary analysis tools are platform dependant, there has also been attempts
to develop tools to support multi-platform analysis. For example, REV.NG [27] is a binary
analysis framework based on QEMU that allows the user to recover CFGs and function
boundaries from binaries built for a variety of system architectures. QEMU is a machine
emulator that uses dynamic binary translation to parse binary code from any of the sup-
ported architectures into a custom intermediate representation (IR). By performing binary
analysis on this custom IR, REV.NG was able to extract CFGs and function boundaries from
a number of different binaries.
As opposed to most previous tools focusing on optimising the back-end analysis, B2R2
[34] is a binary analysis framework focusing on an efficient front-end design. The front-end
consists of the disassembler and lifter which had been ignored by researchers in the past who
optimised analysing the intermediate representation that is produced by the front-end. By
using strategies such as parallelism in its analysis, B2R2 was able to perform binary analysis
with an order of magnitude improvement in efficiency.
12
them. Common examples of these modifications include
• ”Cheats” that enable players to gain an unfair advantage over others by gaining access
to additional information or actions not normally possible in games
We will examine the development of such modifications in the game World of Warcraft
to exemplify the reverse engineering tools and techniques used.
13
Figure 2.4: An example of navigation mesh based pathfinding. The black path is one that
a bot (circle) is likely to take within the navigation mesh(red) when collecting from 4 herb
nodes (green) available.
Gathering bots
Gathering bots gather resources from nodes such as herb bushes or ore veins in the game
world. Because each node disappears after being harvested from and is re-generated in a
different location, gathering bots must make use of a pathfinding system to travel between
nodes. Nodes may appear randomly in a number of fixed locations and as a result a useful
navigation mesh will encompass all known node locations. Bots can then use this navigation
mesh to travel from one node to the next, as shown in Figure 2.4.
When a bot reaches the target, it must interact with the node to harvest resources. This
interaction is the equivalent of a human player performing a single click on the node, at
which point the resources within are deposited into the player’s inventory and the node
disappears. After harvesting a node, the bot will then move to the next nearby node and
repeat the process.
Combat bots
Combat bots traverse the game world and kill enemies that they encounter which is de-
sirable as players are rewarded with items and experience points for defeating enemies.
Because monsters disappear when killed and then reappear in a different location after a
short time, combat bots have to be capable of movement.
Combat bots can operate on a navigation mesh system similar to gathering bots, but
some also function using direct movement between objectives. Bots using direct movement
often rely on automatic use of in-game functions such as ”target creature with specified
name” and ”move to target” which are provided by the developers for legal use by human
players through ingame actions. As a result, bots using these functions have more similar
movement characteristics to humans compared to bots using a navigation mesh system.
For example, consider the scenario in Fig 2.5 where a player character (black circle) must
kill the four monsters (black pigs). There is a distinct difference in pathfinding behaviour
between a navigation mesh based bot (red path) and a direct movement bot (green path).
This is detectable using techniques such as the lowest common prefix algorithm [42] to dis-
14
Figure 2.5: A comparison in pathfinding behaviour between combat bots using navigation
mesh (red) and direct movement (green)
tinguish between human and waypoint based bots and will be discussed in detail in Section
2.5.5.
15
dynamically linked libraries (DLLs), methods primarily used by bots to receive information
from and send instructions to the game. Its memory scanning capability was once able to
scan all memory addresses and processes on the user’s computer but also lead to signifi-
cant backlash and accusations of spyware [65] [40]. Furthermore, some programs have a
legitimate need to monitor the game and while more popular ones such as Windows system
drivers and Fraps (a screen recording software) are whitelisted, Warden has falsely flagged
and banned others.
16
The longest common prefix (LCP) for each character is recorded by calculating the longest
repeated subpath a character has used. Because bots travel using a navigation mesh to
move in straight lines between predefined waypoints, over a long running time they will
inevitably start to repeat previously used path segments. Humans on the other hand do
not have a defined navigation mesh and are unlikely to repeat exact movement patterns.
A study done by researchers at the Technical University Vienna in 2009 showed that bots’
average path segment repetitions and LCP tended to scale linearly with run time while that
of humans stayed at a constant low rate [42] [48]. After around 10 to 45 minutes, there was
a clear difference in LCP values between humans and bots.
17
Figure 2.6: A simplified example of the WoW memory space.
a result both the major bot developers have shut down in response [8] [1].
• Read player character data by accessing memory at ”WoW base address → character
data offset”. As this location is 0x08 bytes offset from the WoW base address, we can
use a pointer reference to access the character data from the base address.
18
2.5.8 Sending commands
After a bot successfully accesses the game’s memory space and obtains the necessary game
information, it must then issue commands to the game in order to control the character.
Movement and interaction are performed by modifying values or calling functions from
memory.
Human players move their characters within the game world through either the use of
keyboard keys to issue relative movement actions (such as ”move left”), or by right clicking
on locations in the game world to issue a command for the character to move there (”click
to move”). Relative movements require taking into account factors such as character facing
direction, camera angle, relative distance to target, etc, while click to move commands only
rely on the target’s absolute coordinate. Therefore it is much easier for a bot to use the click
to move system. When a player uses the click to move command, the following set of events
occurs:
• Game begins moving player character from current location to target in a straight line
• When player character reaches the target, game clears ’click to move target’ variable
The bot, through direct memory access, can write the target coordinates straight into the
games ’click to move target’ memory address and begin moving the character.
When the bot detects that the player character is close to a resource node, it will attempt
to gather from the node. In a normal human player’s operation, the program flow is as
follows:
• Resources from the node are deposited into the player’s bag
Similar to writing data to memory, external programs are also able to call functions from
memory if they know the base address of the function call when the program is loaded
into memory. Thus, the bot is able to simply call the BeginGathering() function when it is
near a node and gather resources without having to perform clicking. Although functions
in WoW were loaded into the program memory at runtime, their addresses could be found
by disassembling the game executable and statically analysing program flow by deducing
their relation to static variables. For example, if the memory location of the player location
variable was already obtained, one could obtain functions related to player movement by
finding any functions that modified this memory location.
19
20
Chapter 3
3.1 Introduction
In this chapter we explore the design and implementation of the memory tool at the centre
of this thesis. The memory tool’s purpose is to simplify the process of reverse engineering
the organisation of data in target programs and we will examine its use in some case studies
later in Chapter 4.
First we will consider why the reverse engineering process needs simplification. Re-
verse engineering of unknown programs is slow because we often must manually discover
required memory locations (also known as ’offsets’) in an iterative fashion. For example if
we wanted to know how to automatically move a player character in World of Warcraft,
the memory address holding the ”player location value” is of interest. Experimental testing
has shown that many games, including World of Warcraft, uses a fixed memory allocation
system [33] as previously shown in Figure 2.6.
In order to find this address, we might complete the following steps:
• 1. Move player to known location in game world, e.g. position with coordinates X =
1200, Y = 730, Z = 65
• 2. Iterate over the game’s memory space and create a list of all memory locations
containing the numerical value ”1200”
• 3. Move player to another known location in the game world, e.g. position with coor-
dinates X = 1300, Y = 730, Z = 65
• 4. Intersect the lists created in steps 2 and 3, thus creating a list of memory locations
whose values have changed from 1200 to 1300.
Steps 3 and 4 must be repeated many times in order to narrow down the address to a
single location for each value we are interested in. In the example of automating player
character movement, this means we have to repeat the entire process for the X, Y and Z
coordinates. As one can imagine, this takes a significant amount of time and effort to build
a list of memory locations in larger reverse engineering projects. Therefore our goal is to
create a tool that assists in automating parts of the reverse engineering process. We used a
simple Windows game, Minesweeper, to test this tool.
3.1.1 Design
The tool consists of memory dumper and dump analysis modules that work together to
analyse a target program’s operation by automating the analysis described previously. The
21
Figure 3.1: System architecture diagram of the tool showing the important components.
The memory dumper saves the target’s memory to disk. The analyser takes the snapshots
created by the dumper and compares them to determine important locations in memory
user interacts with the target program in a consistent manner and the memory dumper saves
the target’s memory data to disk as a series of snapshots. The analyser then reads memory
files and compares them to locate differences. This process is shown in Fig 3.1.
After the memory dumper has created a number of snapshots with consistent changes
between them, we can then use the analyser. The analyser detects differences between snap-
shots and given enough snapshots, it will be able to narrow down the locations in memory
which have consistently changed and mark them as important.
Key aspects of the design were a focus on tool efficiency, effectiveness and usability.
We used an iterative approach to optimise the tool in all three areas over the course of its
development. Due to the number and size of snapshots, iterating over each pair of snapshots
to find differences is computationally intensive. As a result, we had to optimise our tool to
complete this process efficiently to be able to perform a meaningful amount of testing.
In addition to efficiency, we had to ensure the tool was effective by taking measures to
eliminate false positives (noise) that cluttered the results. To achieve this, we implemented
an effective algorithm to detect and isolate memory regions that consistently changed as a
result of consistent user input. This filtered out the noise and thus made the analysis more
effective.
Finally, in order to be easily usable, we designed the tool’s output to be easy for the user
to understand. Initial versions used a visual representation of the target program’s memory
which proved to be hard for the user to read. By the final version, the output had evolved to
become a simple text output showing the address and data of important memory regions.
22
results in winning the level, while an incorrect use of logic results in a player exploding one
of the mines, immediately losing the level and having to begin the entire level again. There
are many possible goals for reverse engineering this game, such as:
• Finding squares with hidden mines easily
• Marking and unmarking mines automatically without user input
The primary goal of reverse engineering this game is to assist the user in determining
the relationship between an action and the memory space it affects. One approach involves
taking a snapshot of the target program, performing an action that changes the program
state and then taking another snapshot. The program will then take the difference of the
two snapshots to find memory locations that changed after the action was performed. We
have four assumptions heading into the experiment:
• A user action causing a change in program state must also cause a change in memory.
These changes in memory are useful to the engineer.
• Not every change in memory must be caused by a user action. These changes are not
useful and are considered noise.
• Statically allocated memory locations for a program will remain at a consistent and
fixed location between runs. This assumption is made because programs allocate
memory at runtime and generally make no attempts to randomise memory locations.
This is because randomising memory locations involves also updating pointer refer-
ences to each one and is too costly to be beneficial. While certain exceptions exist, such
as Denuvo anti-tamper software which repeatedly encrypt and decrypt programs at
runtime, they are rare and we will not be encountering them in this experiment.
• Dynamically allocated memory may remain at a consistent and fixed location under
some special circumstances particularly in cases when memory is allocated in exactly
the same order on every run. While malloc and dealloc operations may be determinis-
tic to a degree [57] [53], this is likely to not be the case in concurrent applications [46] or
those involving unpredictable user input. In such cases, the order in which memory,
and as a result the location of of certain memory data, is likely to change.
An example of such cases would be changes in the game state of minesweeper as shown
in Figures 3.2 and 3.3. In Figure 3.2, we can see an example of the first case where the user
has right clicked on the circled tile and caused it to be marked with a flag. In the game’s
memory space, a value corresponding to the state of that tile has changed. Finding the spe-
cific locations of this value in memory is important for finalising the relationship between
the action (”right click on the tile”) and memory location affected. On the other hand, con-
sider Figure 3.3 where there has been no user action, yet the timer value still changes. This
is noise because regardless of what user action was performed, the memory location storing
the timer value will always change. It is not very useful to identify this location as a result
of user actions despite it changing every time we perform an action.
Because program state and memory data can change without external output, we must
take many snapshots in order to filter out such noise and leave us with only the relevant
memory data. Therefore the tool must take snapshots at regular intervals while the user
performs actions in a consistent way.
Furthermore, because consistent, unrelated events such as timers will cause noise, we
also must use certain noise reduction techniques to filter this form of noise out over the
course of several experiments. Our main strategy was comparing sets of experiment runs
based on the same user action until there is a clear set of consistent memory changes.
23
Figure 3.2: An example where an outside action causing a change in program state (the
circled tile has been flagged) causes a change in program memory.
Figure 3.3: An example where no outside actions have occurred, yet there is still a change in
program state (the timer clock value has increased) and program memory.
3.3 Iteration 1
The first prototype was designed to be a helper tool that could find memory locations of
interest and assist in understanding the purpose of these locations. It was written in Java as
a starting point and made extensive use of the JNA (Java Native Access) library. Java runs
inside a Java Virtual Machine (JVM) and normally cannot access low level information such
as memory data of other processes as they lie outside the scope of its own JVM instance.
JNA essentially functions as a wrapper that allows the user to write applications in pure
Java and make use of native Windows functions without having to directly call Windows
API functions. The JNA library handles calling the equivalent Windows API functions when
necessary which means the user can avoid using C, via the Java Native Interface, within
their Java application. Because our experiment relied on reading and writing to the memory
space of other programs, JNA was essential for providing us access to Windows API calls
that can fetch this information. For example, two Java methods provided by JNA which
24
Figure 3.4: A simple example of a user interacting with the minesweeper game while the
tool takes snapshots. The red line indicating the boundary of the ’exposed’ region has been
added for comparison to information below.
The Java tool simply took snapshots of the target program memory at regular intervals
of 5 seconds over a minute period making a total of 12 snapshots per run while the user
interacts with the target program in a consistent, repeated manner. It was believed that such
consistent interactions would generate consistent results which then could be analysed later
to identify changes. For example, if the user repeatedly flagged and unflagged a single tile
in the minesweeper game, then in theory only the memory location corresponding to that
square would change. After snapshots were taken, the tool would then perform an inter-
section operation over all snapshots and output an image file representing changes detected
where each pixel represented a single byte of memory. Shading of pixels represented the
number of intersections where the value of that pixel changed, with darker pixels signify-
ing more changes. This was useful as a preliminary experiment because we were unsure
about the magnitude of memory changes resulting from a single action and thus wanted a
visual representation that provided the user with an overview of the number and location
of changes at a glance. The user could then use this information in conjunction with other
tools(such as Ollydbg) to analyse the actual data changes.
A simple example of using the tool on minesweeper is shown in Figure 3.4. The user
repeatedly flags and unflags the tile in sync with the tool taking snapshots at set intervals.
Because the tool was originally calibrated to perform one snapshot every 5 seconds, this
meant that the user had to ensure they performed the correct action between snapshots or
risk the experiment being contaminated by user errors. For example, consider Figure 3.5
which demonstrates three different possible scenarios encountered by a user attempting to
find information regarding a certain tile, ”X”, within Minesweeper by repeatedly flagging
and unflagging this tile while the Java tool takes snapshots:
• In attempt 1, the user was too slow to perform the flag action that was expected be-
tween 40-45 seconds into starting the program. As a result, between snapshot 8 and 9,
25
Figure 3.5: Examples of correct (3) and incorrect (1 and 2) operation of the basic Java tool.
Figure 3.6: An expanded and annotated section of the final output displayed by the program
where darker pixels represent locations in memory that have changed more often and lighter
pixels represent locations that have changed less often.
there is no change to tile X’s value when the program expects one. Therefore the ’true’
memory location of tile X would be excluded from the experiment results as it did not
change when the program expected such a change.
• In attempt 2, the user was too quick to perform the flag action before snapshot 6 was
taken. As a result snapshot 6 was taken with tile X in a ’flag’ state, same as snapshot
5. This results in the same issues with experiment results as attempt 1.
• Finally in attempt 3, the program and user behaved as expected, with exactly one
action performed between each snapshot.
While the tool required precise timing from the user to avoid problems, it achieved the
desired function of helping with reverse engineering when used correctly. An example of
the output is displayed in Figure 3.6 where darkest pixels represent memory locations ex-
hibiting the most changes. By identifying the regions with significant numbers of changes,
I was able to then inspect the values at these locations with tools such as Ollydbg while the
program ran to understand how and why they changed. Without the tool narrowing the
search region down, it would have required significantly more work to manually browser
the entire memory space instead of just the small areas that changed.
Figure 3.7 shows a small section of the game’s memory changes during this process. The
circled value alternating between 8F and 8E represents the ’flagged’ tile and is the key in-
formation we wish to discover. The memory addresses corresponding to the area of change
were found and examined in Ollydbg to view the specific contents at those locations. It was
found that the memory addresses belonging to some of the darkest pixels correlated with
the region in memory where the game board is stored. In other regions of the image output
there existed noise as there are many other regions of dark pixels which are memory loca-
tions which changed, but not directly as a result of user action. For example, the variable
containing time since game start continues to change as we perform the experiment but is
26
Figure 3.7: A visualisation of the data structure (array) containing the game board in mem-
ory. Note that the green region represents the ’exposed’ region and that states 1 and 3 are
identical. The ’flagged’ tile has changed from value 8F to 8E and back to 8F.
27
unrelated to user input. As a result, the memory region containing the clock variable as well
as any graphical assets used to render the clock will continue to change and be detected as
a dark region in the image output even though it is not very useful for the experimental
purpose of locating the memory region modified by a user clicking on the game board.
The tool mostly achieved its purpose and helped to clarify the data structure for certain
programs such as minesweeper and confirmed the presence of a fixed region in memory
containing the game board. While it did suffer from noise by locating regions that were not
interesting, the key regions were properly displayed as dark areas in the program output. In
addition, a consistently changing memory location corresponding to the tile being clicked
on was located by analysing the image output. Locations of most change (darkest regions)
were identified and the memory locations were opened in Ollydbg for manual analysis. As
a result, the value of a single memory location changing between values of ’8F’ (no flag) and
’8E’ (flag) was attributed to the user clicking on the tile and changing its state.
However the prototype could not be used as a standalone tool and was often confusing
due to significant noise in the output image. In addition due to a lack of optimisation, long
run times of up to several hours ensured that only a limited number of tests could be run on
a given day. Due to these issues it was clear a better solution was required.
3.4 Iteration 2
The first iteration was a good starting point that correctly identified locations where data
changed and supported further analysis of programs. However, two major problems with
the execution and output were identified and improved upon in our second iteration:
• Usability of the output data
• Program execution speed
3.4.1 Usability
The program output shown in Figure 3.6 was difficult to decipher as it lacked detailed in-
formation such as how values in memory were changing. As a result, the program by itself
was not able to easily connect a user action with a memory location and as a result overall
program usability suffered. The concept of a visual representation was useful to identify
rough areas of change in memory but was not specific about the exact location or values of
change. As a result, while it was successful for visualising the ’big picture’ in a preliminary
experiment, it was not particularly useful for obtaining the exact data needed for further
tests. Furthermore, because large amounts of program memory did not change, the user
must look over large amounts of white space to identify useful sections. This would prove
to be further problematic with larger programs as image sizes quickly became unmanage-
able sizes when even a relatively small program using 10MB of memory generated a large
image file of approximately 2000x5400 pixels.
In order to avoid these issues, the previous image based output was abandoned and in-
stead the location and values of memory changes were displayed directly. A sample output
of this is shown in Figure 3.8. By changing the output to display numerical values at each
memory location, the program was also able to now be used independently of other pro-
grams to determine patterns of change in memory. While this approach appeared to run
into the original problem of requiring the user to pore over large lists of meaningless data
including noise, this was in fact not the case as relatively few locations changed. Finally, by
having access to exact memory locations and values, it provided a good foundation for the
final version of the program.
28
Figure 3.8: A sample of the tool output showing how changes in value of memory locations
are displayed.
29
User action No user action
Consistent change Useful Not useful
Random change Not useful Not useful
No change Not useful Indeterminate
Table 3.1: A summary of possible user action and memory change combinations and
whether the result is helpful in determining the user action to be the cause of the memory
change.
enough to C# that it could be ported without much major changes. By using C#, we had ac-
cess directly to the low level Windows functions previously accessible only through the JNA
wrapper and was no longer slowed by the internal operations of the JVM. This further sped
up execution to around 3 to 5 minutes per experiment and also had the additional benefit
that C based Windows functions were much better documented than their JNA counter-
parts, leading to easier coding and debugging.
3.5 Iteration 3
While the upgrades added in iteration 2 significantly improved both the program runtime
and readability of the final result, it still suffered from noise due to memory values in tar-
get programs changing because of reasons other than user input. Noise included memory
locations such as the ones keeping track of ’time elapsed’ as these would change every time
a snapshot was made even though user input is not what causes a change in time elapsed.
Because there was noise in the output, it was difficult to determine exactly what changes
were caused by the user. In addition, the analysis process was still reliant on user timing
and manual interpretation of the results. As a result additional improvements were needed
to filter out this noise and make the analysis process more streamlined.
1. User performs an action and a memory region changes consistently as a result of the
action. This is the key useful case as it immediately tells us that region in memory
is linked to the action we performed. An example of this is user flagging a tile in
minesweeper, and we detect a change in the tile’s memory state.
2. No user action and a memory region changes consistently. This is observed to occur
in memory regions that have no relation to a user action. For example, the memory
location containing a timer value changes as a user idles.
3. User performs an action and a memory region changes inconsistently. We cannot di-
rectly establish a relationship between the user action and memory region
30
Example Location Experiment 1 Experiment 1 Experiment 2 Experiment 2
Snapshot 1 Snapshot 2 Snapshot 1 Snapshot 2
1 0x00034000 8E 8F 8E 8F
2 0x00034040 0F 1F 1F 2F
3 0x00034080 0F 1F 2F 2F
5. User performs an action and a memory region is not changed. Not useful as clearly
the memory region is not related to the user action being performed
6. No user action and a memory region remains unchanged. The memory region in ques-
tion is of indeterminate usefulness and more information is needed to assess its rela-
tion to user actions.
For the purposes of establishing direct relationships between user actions and affected
memory regions, cases which fall into category 1 are of the most interest to us. However
in preliminary experiments, categories 2, 3 and 4 are also included in results due to the
the program recording any memory region that experienced a change, without the ability
to distinguish between consistent and random changes. Furthermore, because the experi-
ments lacked a phase where no user input was expected, it was not possible to distinguish
between changes caused by user actions(categories 1,3) and changes that would have oc-
curred regardless of user actions(categories 2,4). The third iteration would need to solve
these shortcomings by using additional noise reduction strategies.
To solve these problems, we improved the strategy of intersecting sets of snapshots by
looking at not just what memory locations changed, but also how values changed between
snapshots. In addition, we further identified regions of memory that consistently changed
after a user action is performed. This resulted in obtaining a definitive set of memory loca-
tions that changed predictably in response to user actions.
This approach involved creating pairs of game state dumps that only differed by one
game action, such as flagging and unflagging of a single minesweeper tile. These pairs of
dumps were then compared and the difference stored on disk. After a number of differences
were recorded, they were compared to differences of other pairs and matching changes be-
tween the differences were recorded. By using this method of intersecting snapshots, only
addresses that changed consistently were identified, i.e. addresses that changed each time
between the two game states, but the change was the same between each test.
For example consider Table 3.2 where three memory locations which are found to have
changed during two experiment runs of the program where the user performed the same
action of marking the same tile in the Minesweeper game board.
While each memory location’s value did change during the experiments, only example 1
displayed a consistent change in both experiment runs given consistent input actions. That
location, 0x01005362, consistently changed between 8E and 8F. Therefore, we can hypothe-
size that this memory location is directly affected by the user action.
31
Figure 3.9: Example of console output of new tool where user dictates when snapshots
are taken. User must input the type of action performed (in this case ”click 1 1”) before
commencing the snapshot process. A snapshot is only taken when the user sends input to
the program by pressing enter.
First of all the user was given more control over the snapshot taking and saving process.
Prior to an experiment run, the user inputs the name and type of the action to be tested for.
The user then interacts with a target program as before, but the tool only took snapshots
when the user desired, i.e. by pressing a key on the tool’s console. The snapshot would then
be automatically saved to disk with the specified action encoded in its name. An example
of a typical program run is shown in Figure 3.9.
While this was a minor change, it gave the user better control over the experiment pro-
cess and allows better snapshot management. The old system saved memory dumps to file
based on a timestamp format, as we can see in Figure 3.10. This worked fine in the initial ex-
periments as memory dumps were not intended to be reused. However as we now wished
to preserve experiment data for future use, a new storage method was required.
This was changed by encoding the user action into the name of saved snapshots. Before
starting the recording process, a user must enter the the name of action being examined,
such as ”click 1 1” when experimenting with the user action of ”click on the tile at row 1
and column 1”. Once the test action was defined, the program can then be commanded
to take snapshots when needed. An example of the program console is shown in Figure
3.9. The resulting snapshots are saved to disk with the experiment details saved in the
filename. The new storage format, shown in Figure 3.11, is more useful as it records the
user action (”click at location 1-1”) as well as experiment number (1) and snapshot numbers
(1-4). Afterwards, the program would then parse both the file data as well as the file name
to establish a connection between the action taken and changes in memory.
Finally, we were able to automate the analysis process further as well as preserve the data
in its original form for future processing due to the improved naming format. Formerly in
memory, the recorded snapshots were stored in a data structure consisting of a collection
of byte arrays, with each array containing a single snapshot of the memory. In the new
implementation, instead of working with simple arrays, we instead have a collection of
MemoryDump data structures as follows:
32
Figure 3.10: Saved memory dumps under old format
By allowing the tool access to information regarding the details of the test, it was able
to automatically connect specific memory locations to actions that affects them. For example,
previously after an experiment the tool displayed "Location 0x00034000 change 8E -> 8F".
With contextual information from the user it was able to more clearly output information
such as: "Click 1 1 causes location 0x00034000 change 8E -> 8F".
Furthermore, these changes in the analysis process in addition to the improved data stor-
age system allows us to store large amounts of snapshots for mass analysis at a later time.
As each snapshot is encoded with the experiment type and numbers in its filename, we can
easily serialise and deserialise snapshots stored on disk back into its in memory represen-
tation. This is important as it greatly increases the level of automation of the program and
potentially allows future analysis methods to be used for data we record today.
33
34
Chapter 4
Case Studies
We will now discuss the process used to evaluate the tool developed. The purpose of the
evaluation was to assess how useful the tool was in assisting a user to perform reverse en-
gineering of a target program. It was hoped that the tool would expose information about
the target programs’ data structures in memory. To assess its usefulness, we tested the op-
eration of our tool on three different programs and recorded the memory locations and data
it reported. Finally we verified the accuracy of this information by examining the recorded
memory locations in target programs and observing how the data changes during program
operation.
• Notepad++ : Text input, moving the input cursor and highlighting a region of text
• Solitaire : Moving a card in the playing field and flipping (revealing) a hidden card
35
After enough iterations had been performed, defined as when the memory tool no longer
saw a change in the number of affected memory locations, the experiment was stopped. The
affected memory locations and data would then be analysed and verified using a third party
tool.
For all experiments, we used a desktop PC running Windows 10 Home Edition 64-bit.
This machine had 32 gigabyte of RAM and a quad core i7 6700k CPU at 4.2GHz. For verifi-
cation, we used OllyDbg v1.10.
4.2 Minesweeper
Minesweeper is a popular puzzle game shipped with the Windows Operating System since
1992 [22] where the player’s goal is to find all hidden mines in a grid of tiles. Players can
right click a tile to mark it with a flag indicating the suspected location of a mine or left click
a tile to reveal it. Successfully revealing all non-mined tiles and marking all mines results
in a victory while revealing a tile, through a left click action, containing a mine results in an
immediate loss. The objective is to win the game by correctly identifying all mines in the
shortest amount of time.
Each tile has a number of distinct states such as:
• A revealed empty tile which shows the number of mines near it. Note a revealed mine
causes the game to end immediately.
• A tile with a flag on it placed by the player showing that they believe there is a mine
in this tile.
The game itself has other functions such as a clock keeping track of the time spent on the
current level as well as a high score system keeping track of the player’s best playthroughs
defined as victories in the shortest amount of time.
• Determine how the different states of tiles are stored in memory, e.g. unrevealed vs
empty vs flagged
We were interested in seeing if it was possible to extract key information from the game
as it would show that the reverse engineering process worked and allows it to be used on
larger programs. For example, if we were able to locate mines before they are revealed in
Minesweeper, another reverse engineer using this process may be able to reveal his oppo-
nent’s units in a larger real time strategy game.
36
Furthermore, we were interested in seeing if our results were consistent over not just
several experiment runs, but also program executions. This was useful to know as it could
provide hints on how data structures are created at runtime.
4.2.2 Method
As Minesweeper is no longer included in Windows 10 we used Winmine [9], a 2001 version
of Minesweeper bundled with Windows XP to perform tests on. To perform the experiment
we ran the Minesweeper game and gave it controlled user actions while snapshots were
taken with the memory tool.
After launching Minesweeper, a typical experiment run involves:
• Use memory tool to take snapshot of game memory with no tiles flagged.
• Use memory tool to take snapshot of game memory with tile flagged.
• Remove the flag that we previously placed, returning the game to its initial state so we
may repeat these steps.
In order to get a good sample size, five locations on the game board area were tested:
• x=1, y=1
• x=1, y=2
• x=1, y=4
• x=2, y=1
• x=4, y=1
The flag and unflag process was repeated for each of these locations five times for a to-
tal of ten snapshots for the memory tool to analyse. This number of experiment runs was
picked as previous preliminary experiments showed that three to four runs were sufficient
to remove noise and isolate affected memory locations. However there were considerations
taken into account that should this be insufficient to isolate the memory locations we would
be able to perform more. As the iteration 3 of the tool was used in these experiments, snap-
shots were taken at user controlled intervals by interacting with the tool’s command line
interface.
For example, when testing the first location listed above, we defined the test as ”click 1 1”
in the tool to communicate that we were testing clicking on the x=1, y=1 location as shown in
Figure 3.9. We then performed our user action of flagging or unflagging the tile and pressed
enter on the tool to save the memory dump. As memory dumps are encoded with both
the experiment run and snapshot number there was no confusion or mixing of experiment
results during analysis. For example the tool can parse the filename ”Dump click 1 1 1 4”
and know that it the result of the 4th snapshot of the 1st experiment run testing the user
action ”click 1 1”.
The tool then analysed the snapshots and returned any memory locations it believes
are connected to the user action performed. Finally, these results were verified by opening
the process in Ollydbg and manually analysing the values of the locations returned by the
memory tool.
37
Figure 4.1: Example result of verifying experiment run in Ollydbg. The tile being tested is
at location 1,1 (top left corner, small red square outline). The game area in memory dump is
shown by the large red square outline
4.2.3 Results
With each experiment run, it was found that ten snapshots were sufficient in isolating the
key memory locations relevant to the user action of flagging a square. Furthermore it was
found that only one memory location in each test was isolated and that memory location’s
value consistently alternated between 8E (flag) and 8F (no flag). The result of observing one
such key location in Ollydbg for verification is shown in Figure 4.1.
The experiment demonstrated that each tile has a distinct location in memory containing
a value for each state the tile could be in. Examples of such values and their meanings are
listed below:
• 8F corresponds to a mine
Furthermore, by locating multiple tiles, we were able to determine that there is a 2D array
structure stored in memory representing the game board. Each tile on a given row of the
board is stored adjacently in 4 bytes of memory, with some unknown bytes separating the
rows. When viewed in Ollydbg, a clear 2D array can be seen which corresponds to the game
board.
For each memory location the results were consistent. Restarting the Minesweeper game
or using a ’fresh’ versus modified board produced no difference in results. Furthermore,
while the values of memory locations changed consistently, their addresses did not. For
example, the ”x=1, y=1” tile always had an address of 0x01005361, ”x=1, y=2” always had
an address of 0x01005362 (1 more than the ”x=1, y=1”), ”x=1, y=4” always had an address of
0x01005364 (3 more than the ”x=1, y=1”), etc. This confirmed our assumption that memory
locations would be fixed.
By combining these findings, we can therefore accomplish our goal of locating mines
before they are revealed by obtaining the location in memory where the game board is stored
38
and then finding any values of 9F in that region. By cross referencing the location of each 9F
value in memory, we can then accurately determine where a mine is.
4.2.4 Conclusion
Through our testing on Minesweeper, it was found that the memory tool was useful in
assisting with reverse engineering as we were successfully able to identify the data structure
of the game and understand how parts of the game worked. As it was able to isolate the key
memory location affected by the user action, we were able to locate and explore the game
board in memory. Finally by using our information gained, we were able to identify hidden
mines and avoid them in the game. This shows that tool is viable for strategically reducing
noise and identifying useful memory locations to help us reverse engineer programs.
4.3 Notepad++
Notepad++ is a free text and source code editor for Windows released in 2003. It features
basic editing and formatting options, resulting in a minimalistic operational footprint on any
user’s system. Basic formatting such as font and text size can be changed in this program
through selecting text and clicking the format tab, but this program is not commonly used
by those who wish to perform complex or extensive text formatting. Users can perform a
variety of file editing tasks such as:
• Changing the position of the text input cursor by either clicking a position in the text
and setting the cursor to that position, or using the arrow keys to move the cursor to a
new position relative from its previous one.
Often text editors make use of various data structures to optimise performance during
text editing tasks such as:
• A Gap Buffer [28] is a dynamic buffer following the input cursor which is maintained
by the text editor. When the user fills up the buffer by typing text or moves the cursor,
the text editor creates a new buffer behind the input cursor by moving all text after the
cursor backwards in memory. This allows the editor to efficiently implement insertion
and deletion from the middle of a text block without having to rearrange the text in
memory every time a user performs these actions.
39
• A Rope is a binary tree where leaves store individual strings and their lengths while
nodes hold the sum of leaf lengths in its left subtree [13]. As a result, each node splits
the overall string in two, with the left subtree holding the first part and right subtree
holding the later half. Input is handled by splitting the existing tree at the location
of the cursor, adding the new string to one of the subtrees then concatenating the
resulting subtrees.
• A Piece Table is a data structure that tracks all insertions and deletions made to a docu-
ment relative to its original version [20]. As both the original text and subsequent edits
are saved in immutable blocks, no information is lost through editing. This allows the
text editor to easily implement the undo functionality.
It was hoped that by analysing Notepad++’s memory changes we can discover which, if
any, of these data structures are used in Notepad++.
4.3.2 Method
To perform the experiment, we used the latest version of the memory tool for analysis,
Notepad++ 7.3 32 bit as the test target and Ollydbg for verification. As with Minesweeper,
tests would be run on Notepad++ in a controlled environment and the memory tool tried
to form relations between user actions and memory locations. Similar to the Minesweeper
experiments, each set of Notepad++ actions were performed five times for a total of ten
snapshots per experiment run. Finally, these memory locations would be viewed using the
Ollydbg process memory viewer to determine if the memory tool made correct conclusions.
We input a controlled sequence of characters, such as ”zxcvbnm”, to investigate text
input handing. This string was chosen initially as it is uncommon and thus easier to identify.
To test the text input handling, the memory tool was run while this string was typed into an
empty document in the following order:
• Snapshot taken with empty document
• Cursor moved back to original position on page (first line, before first character)
Finally, the text selection experiment was performed by comparing snapshots with and
without a selected section of text. The text document and region of selection remained con-
stant between experiments. The procedure involved:
40
• Snapshot taken with no text highlighted
4.3.3 Results
The memory tool was able to positively identify three memory regions that consistently
changed as a result of simple user text input. When viewing these locations in Ollydbg,
there was clear evidence that the text input was being saved plainly in memory, with each
character of the string visible in memory. Furthermore, when inserting shorter strings (such
as less than 5 letters), the same number of consistently changing memory regions were de-
tected. However when inserting longer strings (such as more than 20 letters), the denoising
tool failed to detect any consistently changing regions. The length of substring needed to
fail the experiment changed between experiment runs. This suggests the possible presence
of a gap buffer as longer insertions cause the total string length to exceed the capacity of the
gap buffer and as a result the program moves the entire string to a larger buffer elsewhere in
memory. Because the string now resides at a new memory location, the denoising tool does
not detect it as consistent change.
Furthermore, 2 different memory locations were found to have changed as a result of
cursor movement. By locating these 2 memory locations, we are able to perform further
analysis and understand how they work. For example, when testing the cursor position
of second line, 18th character, it was found that the following memory locations changed
consistently:
• 0x03039ED8
• 0x03039EE0
While these results by themselves are not very meaningful, further investigation into these
locations with Ollydbg presents the result shown in Figure 4.2. When the cursor was at
position second line, 18th character, the values of the key memory locations were:
• 0x03039ED8 = 0x24
• 0x03039EE0 = 0x24
As a result, we can conclude that these memory locations are indeed of interest because the
value 0x24 corresponds to the decimal value of 36. The cursor is at the 36th character in the
document, hence we can deduce that Notepad++ stores cursor information as the character
number it is located after.
Unfortunately, when analysing snapshots during the text highlighting experiment, the
memory tool was unable to find any memory locations that changed consistently. It is possi-
ble that highlighting may be defined as a attribute of a text location rather than being tracked
by a separate variable defining the start and end locations of a highlighted block of text.
4.3.4 Conclusion
Overall the memory tool was useful in locating memory regions related to user actions on
Notepad++ and successfully assisted in analysis of two out of the three scenarios. It was
able to remove noise from the experiment data and pinpoint the relevant memory location
41
Figure 4.2: Example result of verifying experiment run in Ollydbg. The cursor is at location
second line, after 18th character in Notepad++. The consistently changing memory locations
identified by the memory tool are shown in the red boxes in Ollydbg
for text input, thus allowing us to analyse and understand how Notepad++ handles its text.
When using Ollydbg to verify the memory locations found by our tool, there was discernible
change such as input text string actually being visible in the memory space. This shows that
the memory locations discovered were correctly connected to the user action. When dealing
with cursor locations, it was able to discover the memory locations containing location of
the cursor and help us to understand how Notepad++ parses its cursor value. However,
usability suffered in the third case (text selection) due to it requiring more information than
just the application memory. A useful improvement might be a way to relate function calls to
user actions similar to programs such as IDA Pro when the user is faced with more advanced
problems that our current memory tool cannot solve.
4.4 Solitaire
Microsoft Solitaire is a single player game based off the classical card game solitaire, shipped
with the Windows Operating System since version 3.0 in 1990 [5]. The game starts by cre-
ating seven columns of face down cards, with an increasing number of cards per column,
starting with one card in the left column and ending with seven in the rightmost one. Only
the first card of each column is revealed and the value of all others, including 24 cards in the
’deck’ column, are hidden from the player. The win condition of the game is to accumulate
all cards, sorted by suit, in ascending order in four ’foundation’ columns above the playing
area. Players may drag cards to move them between different columns subject to certain
restrictions such as:
• A card may be moved to another column in the playing area only if the card above
it is exactly one higher and of a different color. For example, a Queen of Spades may
be moved below a King of Hearts, but not a King of Clubs (same color) nor a Ace of
Hearts (Queen is not one lower than an Ace).
42
The objective of the game is to correctly sort all the cards into the four foundation columns
in the least number of moves. While game is lost when no legal moves are possibles, the
Microsoft version of solitaire allows the player to undo moves, thus making losses less likely.
As it was originally designed to help familiarise Windows users with a mouse [5], all actions
in the game involve moving cards by using the mouse to either click on the card and then
clicking its target destination or simply dragging the card to its destination.
For card games of this type, typical data structures used to store cards include:
• Simple arrays for card stacks on the playing field. More than one card may be re-
vealed at once in a pile and arrays provide access to each element in order to allow
this gameplay.
• Linked lists for the deck and foundation pile. As only one card at a time is accessible in
either pile, a linked list may be used to facilitate the player accessing the top card. Fur-
thermore, because the cards in the deck are saved in a set order upon game creation, a
linked list would efficiently preserve this order.
4.4.2 Method
To perform this experiment, we ran the Microsoft Solitaire game alongside our memory tool
and used Ollydbg to verify results. The Windows XP version was used again as Microsoft
no longer includes Solitaire with newer version of Windows. We started off with a fresh
game of solitaire that had an available move on the game board and then performed the
following actions to gather results:
• Perform a legal action by moving a card (to either another pile on the playing field or
foundation pile)
The undo button was important as it allowed us to reset the game to the exact state at the
start of game without randomising any cards that a ’new game’ would cause.
Similarly, we were able to analyse unrevealed cards by playing the game until we were
in a game state with a revealable card as shown in Figure 4.3. Then we took snapshots using
the following process:
43
Figure 4.3: Game states used to analyse unrevealed cards. State 1 contains a revealable card
(rightmost column), State 2 shows the game after this card is revealed
• Use memory tool to take a snapshot of the current game state (State 1 in Figure 4.3)
• Use memory tool to take a snapshot of the new game state (State 2 in Figure 4.3)
4.4.3 Results
It was found that, similar to Minesweeper, the cards in both the foundation and playing field
columns were arranged sequentially with some delimiting data between each card. When a
card is revealed, 8 bytes are changed consistently in memory, corresponding to the location
of the card. For example, revealing a card in the rightmost stack always causes 8 bytes of
memory starting at 0x100719C to be set to 0x00 as shown in Figures 4.5, 4.6 and 4.7. This
location was consistently identified by the memory tool across multiple experiment runs,
and did not change even after restarting the game.
This reveals information about the memory structure of Solitaire. First of all it appears
that the game always keeps track of the ’next unrevealed’ card in a pile with a separate
array data structure that contains the cards in play. As shown in Figures 4.5, 4.6 and 4.7, the
memory space goes through a number of changes as the game is played :
• Card is unrevealed. 0x100719C - 0x10071A3 is populated with the data of a card (Two
of Hearts)
• Two of Hearts is played, resulting in a new unrevealed card present at the top of the
pile. 0x100719C - 0x10071A3 is populated with the data of a new card (Five of Dia-
monds)
44
Figure 4.4: In both experiments, the action being tested is revealing the 6th card on the 7th
column. However, the same user action under different initial game states (compare game
1 versus game 2) will produce different changes in game state and memory
45
Figure 4.5: Diagram showing changes in memory data and game state. This is the state
before revealing the card on the rightmost column. The relevant memory location is high-
lighted in red
46
Figure 4.6: Diagram showing changes in memory data and game state. This is the state after
revealing the card on the rightmost column
47
Figure 4.7: Diagram showing changes in memory data and game state. This is the state after
moving the Two of Hearts, resulting in a different card being at the top of the rightmost
column. This new card’s memory data is highlighted in red
48
The contents of the key memory location are consistently different depending on what
card is unrevealed. Around 20 games of Solitaire later, another game arose where the Two of
Hearts was the next unrevealed card in the rightmost pile. In this game, memory locations
0x100719C - 0x10071A3 contained the same data as shown above: the values ”E2 FF FF FF
BB FF FF FF”. This showed that we are able to see cards before they are played because each
pile has a single memory region for its unrevealed card, and there is a one-to-one relation
between the card that is unrevealed and the data stored in the memory location.
Unfortunately, we were not able to use the tool to locate and analyse changes to the
foundation piles caused by the user action of moving a card from the playing area to the
foundation pile. The tool did not locate any memory locations that changed consistently.
This is possibly because unlike the static cards generated by the game in the playing field,
the foundation piles use a dynamically allocated data structure such as a linked list to handle
cards that have been placed there. As a result because the size of the data structure may
expand over the course of the game via malloc operations, the memory tool was unable to
find a consistently changing region of memory.
4.4.4 Conclusion
The tool was useful and helped to pinpoint the critical memory locations that enabled fur-
ther analysis to understand the Solitaire memory structure. As a result we were able to see
cards before they were revealed.
However, the memory tool did have a limitation as it did not fully support analysis on
events that were dependant on an initial game state. As a result, while it correctly obtained
memory information regarding what locations were affected by the user action of moving
a card, there was no way to encode the prior game state into the snapshots being stored on
disk. While this did not adversely affect the analysis as I could use my own knowledge of
the game state (such as what card was moved and what cards were in play), it meant that
we could not realistically perform analysis on these memory dumps at a later time if we
needed to.
49
50
Chapter 5
Conclusion
The difficulty of reverse engineering has increased as modern software has become more
sophisticated. To ensure reverse engineering is still possible, the development of new tools
and techniques have been necessary. We aimed to simplify the reverse engineering process
by removing noise from samples of program memory, thus leaving more useful data for
the reverse engineer to examine. In order to achieve this goal, we created a tool that took
snapshots of the target program’s memory while a user interacted with it in a consistent way.
Upon completion of a series of snapshots, the memory tool then performed an intersection
operation to identify regions of consistent change which are hypothesized to be as a result
of the user action. The user can then examine these regions of consistent change instead of
parsing the entire memory space.
Overall, the memory tool has been helpful in assisting reverse engineering of applica-
tions. It was effective in quickly locating important sections of memory being affected by
user actions and thus allows a reverse engineer to save time and effort when working with
more complex applications. In all three cases studied, usage of the tool was important in
achieving the goals of the analysis, but due to the prototype nature of the system it suffered
from some limitations as well.
The most important benefit of using an automatic noise reduction tool is being able to
rapidly find critical memory regions. On even small games such as Minesweeper or Soli-
taire, manually poring through megabytes of memory data is tedious and prone to human
error. With the tool, such an operation takes mere minutes to complete and the number of
memory locations that need to be manually assessed drops from megabytes to a mere few
bytes. The removal of millions of bytes of noise is greatly useful to increasing human effi-
ciency. Furthermore, the ability to curate experimental data for future use is important as
it allows more advanced analysis by future researchers or even future versions of the tool
without losing any original experiment data. By encoding the experiment conditions, such
as user action performed, into the filename we reduce the chances of confusion or mix ups
of experiment results
After three iterations, we reached a viable prototype that helps a reverse engineer anal-
yse memory by removing noise from program memory snapshots. Using this tool, we were
able to successfully obtain information from three Windows programs to illustrate the re-
verse engineering process that occurs in more complex scenarios.
51
Figure 5.1: An illustration of the changes in a program’s memory space. In this example, the
addition of a gap buffer has caused values from 33 and onwards to be moved four bytes to
the right
text in Notepad++. As the program assumes that a user action directly causes a change in a
fixed memory location, applications that do not adhere to this assumption cannot be easily
analysed using the memory tool. In order to solve this issue, we may need to bring more
advanced techniques into the tool and give it more intelligence or domain knowledge. An
example of this would be the implementation of a better intersect function in the tool, and its
effects on the results the tool produces when used on a program with a dynamically chang-
ing data structure as shown in Figure 5.1. In the current iteration of the tool, all memory
values from 0x010701A3 and onwards would be reported as changed as a result of the user
action. However in reality only 0x010701A3-0x010701A6 have been changed, and the rest
of the unchanged data has simply been shifted 4 bytes. A possible fix for issues such as
this would be the implementation of more advanced data comparison algorithms to solve
the ”longest common subsequence” problem. By doing so, we would be able to define the
longest common subsequences of memory data between two snapshots as the ”unchanged”
region, thus allowing our system to even more effectively remove noise. With such an im-
plementation, the results from the example in Figure 5.1 would change. Both the regions
0x0 - 0x2 as well as 0x7 onwards in the new data would be considered unchanged as they
match memory sequences from the old data. Only 0x3-0x6 would be considered the change
in memory, which is more likely to be what a user wants.
52
Bibliography
[1] The begining of the end for wrobot for official servers. https://wrobot.eu/articles/
news/the-begining-of-the-end-of-wrobot-for-official-servers-r124/.
[3] Dynamic Binary Instrumentation for Deobfuscation and Unpacking, IN-DEPTH SECURITY
CONFERENCE 2009 EUROPE; IN-DEPTH SECURITY CONFERENCE 2009 EUROPE,
Nov 2009, Vienne, Austria. 2009, HAL CCSD.
[7] r/overwatch - [serious] how does blizzard plan on going about anti-cheat?
https://www.reddit.com/r/Overwatch/comments/40ael4/serious_how_does_
blizzard_plan_on_going_about/cytsmuj/.
[14] A LAZAB , M., V ENKATARAMAN , S., AND WATTERS , P. Towards understanding mal-
ware behaviour by the extraction of api calls. 2010 Second Cybercrime and Trustworthy
Computing Workshop (2010), pp. 52–59.
53
[15] B ACCI , A., B ARTOLI , A., M ARTINELLI , F., M EDVET, E., M ERCALDO , F., AND V ISAG -
GIO , C. A. Impact of code obfuscation on android malware detection based onstatic
and dynamic analysis. Proceedings of the 4th International Conference on Information Sys-
tems Security and Privacy (2018), pp. 379385.
[17] B ROLL , D. W. Expert opinion on the influences of bots on the economy and gaming
enjoyment in mmorpgs. Bossland GmbH (29 March 2012) . https://www.honorbuddy.
com/Expert_opinion_bots.pdf.
[18] B RUENING , D., Z HAO , Q., AND A MARASINGHE , S. P. Transparent dynamic instru-
mentation. In Proceedings of the 8th International Conference on Virtual Execution Envi-
ronments, VEE 2012, London, UK, March 3-4, 2012 (co-located with ASPLOS 2012) (2012),
S. Hand and D. D. Silva, Eds., ACM, pp. 133–144.
[19] C HIKOFSKY, E. J., AND II, J. H. C. Reverse engineering and design recovery: A taxon-
omy. IEEE Software 7, 1 (1990), 13–17.
[21] D AVID , R., B ARDIN , S., TA , T. D., M OUNIER , L., F EIST , J., P OTET , M., AND M AR -
ION , J. Binsec/se: A dynamic symbolic execution toolkit for binary-level analysis. In
2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering
(SANER) (2016), vol. 1, pp. 653–656.
[26] E MMERIK , M. V., AND WADDINGTON , T. Using a decompiler for real-world source
recovery. In Working Conference on Reverse Engineering (2004), IEEE Computer Society,
pp. 27–36.
[27] F EDERICO , A., PAYER , M., AND A GOSTA , G. rev.ng: a unified binary analysis frame-
work to recover cfgs and function boundaries. pp. 131–141.
[28] GNU Emacs Manual, Emacs Version 19. Free Software Foundation, 59 Temple Place, Suite
330, Boston, MA 02111-1307, USA. Available on-line with the GNU Emacs distribution.
[29] H AWKINS , W., H ISER , J. D., N GUYEN -T UONG , A., C O , M., AND D AVIDSON , J. W.
Securing binary code. IEEE Security Privacy 15, 6 (2017), 77–81.
54
[30] H AWKINS , W., H ISER , J. D., N GUYEN -T UONG , A., C O , M., AND D AVIDSON , J. W.
Securing binary code. IEEE Security Privacy 15, 6 (2017), 77–81.
[31] H AWKINS , W. H., H ISER , J. D., C O , M., N GUYEN -T UONG , A., AND D AVIDSON , J. W.
Zipr: Efficient static binary rewriting for security. In 2017 47th Annual IEEE/IFIP Inter-
national Conference on Dependable Systems and Networks (DSN) (2017), pp. 559–566.
[32] I NSTITUTE , A.-T. T. I. I.-S. Malware statistics and trends report. https://www.
av-test.org/en/statistics/malware/.
[34] J UNG , M., K IM , S., H AN , H., C HOI , J., AND C HA , S. K. B2r2: Building an efficient
front-end for binary analysis.
[35] K RAMER , S., AND B RADFIELD , J. C. A general definition of malware. Journal in Com-
puter Virology (2010), pp. 105–114.
[37] L I , X., S HAN , Z., L IU , F., C HEN , Y., AND H OU , Y. A consistently-executing graph-
based approach for malware packer identification. IEEE Access 7 (2019), 51620–51629.
[38] L UK , C.-K., C OHN , R., M UTH , R., PATIL , H., K LAUSER , A., L OWNEY, G., WALLACE ,
S., R EDDI , V. J., AND H AZELWOOD , K. Pin: building customized program analysis
tools with dynamic instrumentation. ACM SIGPLAN Notices 40, 6 (June 2005), 190–200.
[41] M ENG , X., AND M ILLER , B. P. Binary code is not easy. In Proceedings of the 25th Inter-
national Symposium on Software Testing and Analysis (New York, NY, USA, 2016), ISSTA
2016, Association for Computing Machinery, p. 2435.
[42] M ITTERHOFER , S., K RUEGEL , C., K IRDA , E., AND P LATZER , C. Server-side bot detec-
tion in massively multiplayer online games. IEEE Security & Privacy 7, 3 (May 2009),
29–36.
[43] N AGARAKATTE , S., Z HAO , J., M ARTIN , M. M. K., AND Z DANCEWIC , S. Softbound:
highly compatible and complete spatial memory safety for c. In Proceedings of the
2009 ACM SIGPLAN Conference on Programming Language Design and Implementation,
PLDI 2009, Dublin, Ireland, June 15-21, 2009 (2009), M. Hind and A. Diwan, Eds., ACM,
pp. 245–258.
[44] N ECULA , G. C., M C P EAK , S., AND W EIMER , W. CCured: Type-safe retrofitting of
legacy code. In Conference Record of POPL’02: The 29th ACM SIGPLAN-SIGACT Sym-
posium on Principles of Programming Languages (Portland, Oregon, Jan. 16–18, 2002),
pp. 128–139.
55
[45] O’K ANE , P., S EZER , S., AND M C L AUGHLIN , K. Obfuscation: The hidden malware.
IEEE Security & Privacy 9, 5 (2011), 41–47.
[46] O LSZEWSKI , M., A NSEL , J., AND A MARASINGHE , S. P. Kendo: efficient deterministic
multithreading in software. In Proceedings of the 14th International Conference on Ar-
chitectural Support for Programming Languages and Operating Systems (14th ASPLOS’09)
(Washington, DC, USA, Mar. 2009), ACM, pp. 97–108.
[49] P OLAKOWSKI , M., AND S TEWART, D. J. Warships of the first punic war: An archaeo-
logical investigation and contributory reconstruction of the egadi 10 warship from the
battle of the egadi islands (241 b.c.). Master’s thesis, East Carolina University, 2016.
[51] R EISS , S. P., AND R ENIERIS , M. Languages for dynamic instrumentation. http://cs.
brown.edu/~spr/research/bloom/dyninst.pdf, Dec. 23 2003.
[52] R OCCIA , T. Malware packers use tricks to avoid analysis, detection. McAfee Blogs Oct
(2018).
[53] R OSU , G., S CHULTE , W., AND S ERBANUTA , T.-F. Runtime verification of c memory
safety. In Runtime Verification, 9th International Workshop, RV 2009, Grenoble, France, June
26-28, 2009. Selected Papers (2009), S. Bensalem and D. A. Peled, Eds., vol. 5779 of Lecture
Notes in Computer Science, Springer, pp. 132–151.
[55] S ANTOS , I., B REZO , F., N IEVES , J., P ENYA , Y. K., S ANZ , B., L AORDEN , C., AND
B RINGAS , P. G. Idea: Opcode-sequence-based malware detection. In Engineering Secure
Software and Systems, Second International Symposium, ESSoS 2010, Pisa, Italy, February 3-
4, 2010. Proceedings (2010), F. Massacci, D. S. Wallach, and N. Zannone, Eds., vol. 5965
of Lecture Notes in Computer Science, Springer, pp. 35–43.
[57] S HACHAM , H., PAGE , M., P FAFF , B., G OH , E.-J., M ODADUGU , N., AND B ONEH , D.
On the effectiveness of address-space randomization. Report, Department of Computer
Science, Stanford University, Stanford, CA, USA, Sept. 2007.
[58] S HAHZAD , R. K., L AVESSON , N., AND J OHNSON , H. Accurate adware detection using
opcode sequence extraction. In ARES (2011), IEEE Computer Society, pp. 189–195.
[59] S HARIF, M. I., Y EGNESWARAN , V., S A ÏDI , H., P ORRAS , P. A., AND L EE , W. Eureka:
A framework for enabling static malware analysis. In ESORICS (2008), S. Jajodia and
J. López, Eds., vol. 5283 of Lecture Notes in Computer Science, Springer, pp. 481–500.
56
[60] S IMS , S. Introduction to reverse engineering for penetration testers. https://www.
sans.org/summit-archives/file/summit-archive-1510603949.pdf.
[61] T HOMAS M ANDL , U LRICH B AYER , F. N. Anubis analyzing unknown binaries the au-
tomatic way. Virus Bulletin Conference 2009, Geneva, 2009.
[63] VASUDEVAN , A., AND Y ERRABALLI , R. Cobra: Fine-grained malware analysis using
stealth localized-executions. In IEEE Symposium on Security and Privacy (2006), IEEE
Computer Society, pp. 264–279.
[64] V IDYARTHI , D., C HOUDHARY, S. P., R AKSHIT, S., AND K UMAR , C. R. S. Malware
detection by static checking and dynamic analysis of executables. International Journal
of Information Security and Privacy 11, 3 (2017), 29–41.
[67] YAN , W., Z HANG , Z., AND A NSARI , N. Revealing packed malware. IEEE Security &
Privacy 6, 5 (2008), 65–69.
[68] YANG , W., Z HANG , Y., L I , J., S HU , J., L I , B., H U , W., AND G U , D. Appspear: Bytecode
decrypting and dex reassembling for packed android malware. Research in Attacks,
Intrusions, and Defenses Lecture Notes in Computer Science (2015), pp. 359–381.
[69] Z HANG , F., L EACH , K., S TAVROU , A., AND WANG , H. Towards transparent debug-
ging. IEEE Trans. Dependable Sec. Comput 15, 2 (2018), 321–335.
[70] Z HAO , Q., R ABBAH , R. M., A MARASINGHE , S. P., R UDOLPH , L., AND W ONG , W.-F.
How to do a million watchpoints: Efficient debugging using dynamic instrumentation.
In CC (2008), L. J. Hendren, Ed., vol. 4959, Springer, pp. 147–162.
[71] Z HIVICH , M., AND L EEK , T. Dynamic buffer overflow detection, Jan. 14 2012.
57