Debugging PDF
Debugging PDF
Debugging PDF
1 Introduction
The first point we should notice about debuggers is the shape of their modeliza-
tion, which is adapted for development but not for reverse engineering. Espe-
cially, both forensic analysis and advanced machine code analysis are required
for the analysis of malwares, protected software, and the control of systems in
hostile configurations.
Encrypted binaries, rootkits and malwares are often coming without section
table information, without symbols, and are compiled statically so that the pro-
gram cannot be debugged through the interactions it has with other component
libraries. Well often they are encrypted, so it is necessary to retrieve their mem-
ory image while being executed instead of trying to perform a static analysis on
the binary file.
Additionally, the use of the classic debugging API usually involves heavy
communications between the debugger and the debuggee process. While this is
not a problem when doing development debugging, more elaborated analysis
techniques such as tracing or fuzzy testing requires more fluidity, as existing
tools [4] [5] really lack a more real-time answer.
Finally, classic debuggers make the assumption that source code is available
most of the time and they don’t take advantage of internal binary format in-
formation. The most basic reverse engineering is made harsh on raw assembly
code without code analysis techniques, fingerprinting primitives, and the lack
of a language adapted to the discovery of information in raw disk or memory
dumps.
Our debugger is made for reverse engineers on the ELF format. While the global
architecture does not implies a scope limited to one binary format, we have
focused on the standard used on almost all Operating Systems (both free and
commercial), except Microsoft Windows and the OsX from Apple. Our choice
of the ELF format has been historically to fill the gap between the analysis
software already available on those commercial OS’s, and the poor playground
3
of the UNIX world, after the remark that an hegemony of GNU tools for those
OS’s would not adapt to reverse engineering.
The ELF shell project [1] started 6 years ago and implemented an inter-
active and scriptable machine for manipulation of on-disk ELF binaries. After
some first insights in this binary format, we implemented novel techniques for
program analysis only using on-disk modifications [7]. At this time, malwares
for UNIX were quite primitive and protections on top of ELF files were almost
absent, despite the release of burneye [8] binary encryptor, following the way of
the UPX packer [9]. Our platform was already compatible with PaX systems for
multiple architectures and we succesfully instrumented a wide amount of binary
programs for auditing and hooking purposes.
After experiments in the real world, we concluded that control over analyzed
binaries was insuficient, and adding runtime capable analysis, among other ad-
ditional improvements such as different types of control flow redirections, and
partial relinking of missing resources on both dynamic and static binaries [10]
had become necessary. We managed to reuse our whole API using an inspired
abstraction of data accesses for selecting source and destination buffers, depend-
ing on the choice of on-disk or in-memory requests. We also added at that time
the first support of major techniques for other architectures such as ALPHA
or MIPS, and improved the scripting langage to make it nearer a real reverse
engineering oriented interpreter.
Today, we bring one more layer of techniques for more advanced handling of
the mentioned problematic systems. Our framework is more intuitive to use in
the everyday life of reverse engineers and forensic analysts on UNIX platforms.
Our internal representations have been formalized to a type system adapted for
the inference of information, using a lazy way to abstract and concretize data
object types as we need to manipulate them, that make the interpreted language
more flexible to the addition of features of interactions. We standardize our re-
constructed information into a debugging format which can also benefit from
the information of other debugging formats like DWARF or STABS, if those are
available in analyzed binaries by any chance. The debugger has become com-
patible with multithreaded programs and also keep memory unintrusiveness by
proxying all allocations and disallocations happening in the debuggee. Thus, we
provide a real world debugging environment for hostile systems which does not
suffer from performance penalties due to the use of debug interfaces provided by
the Operating System.
2 Contributions
The ERESI framework [6] brings a new environment for reverse engineering on
UNIX operating systems. In this section we will introduce it with a high-level
4
perspective. In the next parts, we will enter more and more in details about
each component, starting with in-depth explanations of the used programming
techniques that made it possible.
Let’s understand this better, starting from the highest level components to
the lowest level components :
– e2dbg stands for the Embedded ELF debugger. It can hook processes with-
out the need of OS-level debugging API.
– etrace stands for the Embedded ELF tracer. It can trace processes at the
normal execution frequency.
– elfsh stands for the ELF shell. This is the ondisk analysis tool of the frame-
work.
– libedfmt is the library that deal with the ERESI debug format. It can con-
vert stabs and dwarf formats to the eresi format, and make the framework
aware of program types as indicated by debug information.
– libelfsh is the binary manipulation library for the ELF format. It can run
both ondisk and embedded in the debugger using a unified interface. Libelfsh
is currently the biggest and oldest component of the framework.
– libaspect handles the vectors and hash tables which allow for reflection of the
whole framework. It also provide the type system for the ERESI language.
The idea to implement the type system in such a low-level component make
it possible for handling types in an unified way in both the analyzed program
and all components of the framework itself.
As you can notice, the vector data structure is central in our framework.
Part 6 of this article is dedicated to the explanation of it. Before entering such
6
level of details, we will sum up the various contributions that make the ERESI
framework a unique environment for efficient analysis of hardened programs.
The first contribution of our framework is the ability to debug and trace pro-
grams efficiently. The usual debugging framework suffers from a lack of integra-
tion with the debuggee program. Because the normal architecture of a debugger
is to be a separate component, there is the need of context switching each time
the debugger wants to access variables of the debuggee program. The worse case
arises when the debugger is made scriptable. In that case, the number of context
switching is proportional to the number of variable access made by the script.
As this might not seem like a limitation, this make automated analysis much
slower. In our case, the interpreter of the ERESI language in which the scripts
are made is mapped directly in the debuggee program address space, so there is
no need of any context switching during debugging. Thus, our framework is much
more adapted for the automated runtime analysis of program. As an example,
advanced fuzzy testing techniques that make use of feedback information [11]
to optimize the choice of analyzed program paths is made a lot more efficient
without the use of OS-level API such as ptrace.
Another consequence of not using OS-level API is the ability to trace (and
debug under certain conditions) programs even if debug API is disabled. For
instance, our tracer and our base debugger are not blocked because of grsecurity,
when systems comes with the ptrace system call disabled. Because we inject
the debugger in the debuggee process, we are also not restricted for reading
and writing the memory content of such programs once the library is injected.
Additional memory protection can be provided by kernel patches such as PaX to
counter buffer overflow attempts or injection of malware. This protection makes
the executable part of the memory not writable in runtime, but injecting our
debugger is not a problem on those systems.
Because we use multiple techniques for data and code injection [7], we never
have to write in unauthorized areas when we do static injections. Nevertheless,
in some cases it happens that we may need to write in the code sections of the
program while debugging, for instance when we want to install breakpoints by
the special processor opcode, or by function hijacking. For this, we use a special
technique that is not blocked by the PaX kernel patch, even when all options
of it are enabled. The technique simply consists in remapping the memory area
from userland instead of trying to change their rights. This is smarter than just
disabling the mprotect PaX option for the desired binary (which is possible as
well in case the binary is read-accessible and PaX is compiled in soft mode. As
this is the general case, we do not restrict our analysis to this situation).
7
For the moment libedfmt supports stabs and dwarf2. This made us realize
that a different parsing engine had to be done for each debugging format. Stabs
manages types by identifiants without any reference to the position in the debug
section. You have to keep in memory all elements to be able to parse the infor-
mation. Dwarf2 contains more information and you cannot store all of it without
wasting a big part of your memory. At least, it contains a clear reference system
and you can find a dependence without parsing everything. Even in libedfmt,
you cannot read stabs and dwarf2 the same way, and each transformation has
to be implemented differently.
update a hash table for each type with the list of all variables typed as such. The
combination of ERESI and libedfmt creates a powerful debugging and reversing
environnement that allows for saving and retrieving the types information.
In the future, libedfmt will be able to parse more debugging formats and
should be used in e2dbg to display information that we cannot rebuild without
source code, such as current file lines of the debuggee program.
Finally, this feature leads to a very promising runtime reflection of the an-
alyzed program data structures, as typed objects of the debuggee are automat-
ically bound in the language. For instance, hooking allocation functions such
as malloc and free make it possible to inform a type about a precise variable
9
(given its address). In that particular case, we can simply define the heap chunk
type in the ERESI language and inform the chunk type of each address that is
returned by the malloc function, so that the list of chunks is accessible through
the chunk type hash table of variables directly from the language, opening the
door to an advanced analysis of the heap evolution directly in the ERESI lan-
guage. Obviously, ad-hoc types recovery is not limited to heap variables, but this
example was chosen for pedagogical purposes as it is really easy to hook those
functions and automatically know the exact list of legit heap chunks available in
the analyzed program in a sound way, at any moment of the execution.
3 Programming techniques
3.1 Aspects weaving
Features of the framework are modularized in a way that their interface is ac-
cessible to the user. This allows to refine analysis in runtime, when features are
implemented differently depending on user-definable criterions, or when feature
needs runtime updates. Those concepts originally comes from the notion of as-
pect oriented programming [15] which was developped some years ago to fill the
lack of flexibility of object oriented designs.
A new feature that comes with the current version of the libasm is its vector-
based architecture. This allows to overload the handling of disassembled instruc-
10
tion, which has a potentially wide amount of applications, such as dataflow anal-
ysis or opcode tracing. Let’s detail a little bit more our advanced disassembling
interface.
Type Description
IMPBRANCH Imperative branch (jump)
CONDBRANCH Conditional branch
CALLPROC Call to a procedure
RETPROC Return from a procedure
ARITH Arithmetic or logic operations
LOAD Memory data load
STORE Memory data store
ARCH Architecture-dependent instruction
FLAG Flag-modifier instruction
INT Interrupt or call-gate instruction
ASSIGN Assignment instruction
TEST Comparison or test instruction
NONE Instruction that does not fit any of the above
Table 2. Libasm instructions types
Another very interesting aspect that libasm features is the fact that every
opcode handling function written is stored inside a vector of the kind provided
by libaspect. When disassembling an instruction, libasm retrieves the correct
handler by querying this vector on its 4 dimensions: 1 regarding the machine type
and 3 about opcode information (including architecture-specific requirements,
such as SPARC’s secondary opcode). Storing the addresses to these functions in
this vector brings to the user the advantages of being able to dump and modify
vectors from inside ERESI’s scripting language. So, in practice, the function that
does the job of disassembling a given instruction can be replaced in runtime by
other code of the user’s choice, allowing for easy opcode tracing, among other
applications.
The use of containers abstracts type information, thus giving the possibility
to write analysis routines that work at this higher level of abstraction, walking
through the graph of containers. Currently we only store blocks and functions
inside containers, what suffices the needs of control flow analysis for these en-
tities. In the future, we may store data nodes inside containers too, in order to
12
perform data flow analysis. Finally, there is also the idea of having containers
of containers, providing a ”zoomed out” view of other graphs, eg. the linking
between modules as a more abstract view of the linking between functions.
The heap separation is more subtle, as the memory allocation is done entirely
in runtime, unlike the stack allocations which are partially realized at compi-
lation time (so we just have to take care about stack-related registers in runtime).
Our technique was named after the Syscall Proxying [14] idea which is mostly
useful for the writing of vulnerability exploits in order to simulate remote execu-
tion when some particular required system calls are not available on the target
machine, making it possible to execute entire binaries (such as a UNIX shell) but
only executing some particular syscalls (like file systems accesses) on the target
machine. However, heap separation cannot be implemented simply by alloca-
tion function proxying, since the returned values of those functions are memory
pointers that are accessed for reading and writing in the subsequent code of the
debugger. Using an external allocator would also require to proxy the accesses
of all memory zones allocated in such way.
Our choice was to isolate a dedicated portable heap management system for
the debugger, that allocate and destroy all its memory chunks in a single mapped
memory region. We then garantee that the heap allocator is totally unintrusive,
modulo the fact that a certain memory range will not be allocatable for the legit
13
heap. The choice of the base address for this zone should adapted from OS to
OS and from architecture to architecture, or even from debugged program to
debugged program. A judicious default choice is the one of a very low virtual
address (in the very first pages of the address spaces) that is rarely used, unless
special modification of the OS or behavior of the debuggee program implies so.
The challenge that represented allocation proxying is more than the choice of
the alternative heap base address. Indeed, as the analysis module of the frame-
work is mapped inside the debuggee process itself, it can shows the need of a
dynamic allocation before the minimal debugger environment is set up. This case
can happens in 2 situations : when the debugger thread is not yet created (as
we rely on the POSIX thread library of each operating system), or when the de-
bugger thread is created (so it already installed its own allocation handlers since
the debugger is mapped in first in the program using the LD LIBRARY PATH
enviroment variable.
Once the proxy allocator takes control in the proxyfied {m,c,re}alloc and
free functions, we can face two situations. As all dependences of the program are
already loaded when the first debuggee calls to malloc happens, the debugger
code and data are already mapped in both of these situations. The first initial
situation is when the debugger thread has not been created yet. In that case,
the debugger memory mapping i already done (so we can use allocation-free
functions inside the debugger), but the initialization of the debugger structures
(that requires allocations) did not happen yet. Our solution is simply to keep
a variable initialized to 0 to hold the thread id of the debugger. In case this
variable is 0 or equals to the thread idea of the debugger, we call the separate
heap implementation. In the other case, we call the legit allocations functions
for the debuggee program to allocate in its legit heap.
Is the technique acheived ? Not yet. Indeed, we still had to make sure that
we were able to resolve allocations functions without using dynamic memory
allocations. This is realizable using a lookup of the linkmap linked list content,
whoose first element is pointed by the second entry of the Global Offset Ta-
ble section. As each element of the linked list contains a cache of the exported
symbol table (the ELF .dynsym and .dynstr sections), we can resolve symbols
and guess base address of objects without any external allocations. The cost of
14
this technique is a small function that has to lookup the Global Offset Table
address statically in the ondisk file using data statically allocated, in order to
know where the linkmap base address stands (as specified by the ELF reference).
4 Applications
Thanks to the modular approach, we can derive multiple tools for debugging,
static and runtime analysis of programs. In that section, we present the archi-
tecture of our embedded debugger, the algorithm of our embedded tracer, and
an example of modular graph-based analysis taking advantages of our generic
containers data structures.
The first innovation that comes with this tool is about its architecture. The
fact that the debugger is made bipartite allows for a very flexible handling of
targets. For now, only the userland debugging is available, but in the future we
can imagine any kind of targets (embedded systems, kernel-level code..) being
supported by this framework without changing anything in the modelisation.
Of course, the way to handle breakpoints, stepping, or other debugging events
varies from target to target, but the vector system of the ERESI framework al-
lows for a very fine grained implementation of features, so that most of the code
is reusable without any change.
The embedded debugger was not created with this modelisation since the
beginning [10] but is an evolution after a lot of experiments with the debugging
of multithreads programs in an embedded context. We have been testing vari-
ous ways to keep a high performance while hooking programs, staying portable
and stealth for OS-level protections, and reusing the available code as much as
possible since our early development [7] about on-disk analysis of programs.
15
1. We first tried to map the complete ELF code as a new dependence library
of the program beeing debugged. This worked well, as we just had to cre-
ate an intermediate function that was responsible for accessing the data of
the manipulated object, but keeping the same manipulation code for on-
disk and runtime structures modifications. This was the core of our powerful
embedded debugger idea. However, this solution was not taking care of un-
trusiveness in the host process memory as we used the same heap than the
debuggee, which made the debugger useless for heap-related debugging, such
as heap overflow exploits. Additionally, this first implementation was mono-
lithic and not capable to handle multithreads programs.
16
stored function prototypes list. Despite the fact that it provides a correct pro-
totype on those functions, you cannot deal with unknown functions. Etrace is a
tracer built to deal with every functions. It means you do not have to create a
function prototypes database.
The previous algorithm shows how our tracer takes advantage of the ELFsh
framework features. The CFLOW and ALTPLT techniques were already de-
scribed in [10]. They allow for redirecting calls on our generated functions. Etrace
reduces architecture dependences by generating and compiling a .c file which
prints information on every redirected functions.
An example of the output for sshd can be found on section A page 27.
Additionally, our tracer provides a way to group functions. That allows for
the user to decide which pool of functions he wants to trace. For now, this
support is only static, as the tracer is not entirely interfaced with the debugger
at the moment of writing this article. However, the extension of this feature will
make possible to decide in runtime which function to continue tracing or which
function to remove from the inspection list as the analysis is going.
In this part of the article, we will go one step more abstract, by describing the
internals of the reverse engineering language of our framework. First, we detail
how the variables of ERESI programs change their type when it is necessary
(that could be labeled as weak typing) and how types are enforced when we
cannot do in another way (to avoid unrequested behavior to happen). At the
same time, we introduce our main internal data structure : the reflective vector.
The language of the framework was made primary with reflection and modularity
in mind. The concept of a reverse engineering type system follows the idea that
manual or automatic type information can be adapted to the required format of
data. Thus, each data object in the language does not have a determined type
as it evolves as you use the variable. For instance, writing an array of integers in
memory would first need to convert this array as a raw data buffer, then write
its contents at the desired address. Inversly, extracting data from raw sections
must allow for extraction in any desired format.
The language base types can be ordered as a semi-lattice, that is adapted for
allowing or refusing type conversion from a given type to another. We separate
simple types (including uniform function types) and container types such as hash
tables and vectors. Those last types are combining aspect oriented advantages
with the flexibility requirement of reverse engineering, as the framework objects
are contained in vectors and hash tables, so it is possible to change pieces of
the framework itself in runtime and overload it on demand. For instance, adding
new fingerprinting discriminants or additional binary analysis can be performed
by modifying the vectors of libmjollnir or libasm. We have not detailed the fin-
gerprinting capabilities of libmjollnir in that article as we plan to dedicate a
complete paper about it in the future.
21
– Simple types are denoted by ψ : { char, short, int, long, str, func } ⊂ Ψ
lookup : Γ × Ψ n 7−→ Σ
reif y : Λ 7−→ Σ
ref lect : Σ 7−→ Λ
convert : Ψ 7−→ Ψ
In order to understand better this structure, we give additional hints for using
this system :
3. The lookup and reflect operations combine as a control flow aspect when
ref lect(lookup(V ect × Ψ n )) 7−→ F ct. Features of the framework can be
hooked in runtime using such objects in the language.
4. The convert operation only applies to simple types and does not require to
enter the reflection and reification cycle. Obvious conversions are done by a
convert : ψ 7−→ Raw or convert : Raw 7−→ ψ . We distinguish particulary
some interresting conversions. convert : F unc 7−→ Raw as it allows for con-
verting a function object to a raw data object being the code of the function.
6 Related work
In this section, we briefly discuss the work related to modular debugging. Sev-
eral of our features are available in other debugging framework, such as remote
debugging, language based debugging, or graph based debugging. However, none
of the existing frameworks manage to provide a unified interface for all of those
features. Additionally, the innovation of our work resides not only in new fea-
tures available nowhere else (such as reflection on binary programs or on-the-fly
typing) but also on the way they are implemented in an embedded framework,
without any debug API at the OS level.
As this feature makes gdb attractive for porting on any kind of systems,
the stub has to be programmed in C language and linked with the server side
debugger. The vector system of ERESI provide a similar feature than Gdb stubs,
but additionally the vector can be overloaded using code in ERESI language
itself, so the porting of our debugger to additional targets is made easier than
gdb stubs. However, there is a wide amount of available gdb stubs, whereas our
framework is fresh and we still lack of experience on porting it to exotic systems
to conclude anything yet.
The reflection in the ERESI language can only handle the control structure
of binary program until now. We can think about handling data-level reflection
as well but it is not obvious how to acheive this on hardened systems without
relocation information in the binary files (as reflection on data variables would
be done using inline patching of the code that access those variables, which
would result in changing the location of assembly instructions, which is not al-
ways possible without information on how to relocate the code, or with a perfect
data flow analysis engine for binary code. For instance, instructions performing
relative memory accesses cannot be executed in another location unless they are
reencoded to do so).
There are multiple reasons for this. First, our language is dedicated to reverse
engineering and its primitives directly include hash tables of existing blocks and
functions, and handle regular expression in an attractive syntax. Additionally,
25
those tools are based on the IDA code base written in C language, so it comes
as an additional layer on top of an existing tool. On the opposite, our language
is integrated directly in the framework and we can adapt its syntax for our own
needs. The reflexivity of the framework data structures make possible to access
such computed information without any additional interfacing.
Finally, we do not need any general purpose virtual machine for such a com-
plete language as python or ruby, but we rely on the minimalistic vm of ERESI,
which is able to scale for the analysis of large programs even when embedded
in-process. We are not trying to argue that our language is better than python,
but that it is more dedicated to the task of reverse engineering, and the language
itself (and not only the framework features) can be adapted to our need as we
get inspiration from other programming languages. For instance, features such
as program transformation or type-based decompilation will be integrated in
the future using 2 simple additional commands (match and transform) whereas
similar features in python would turn into developping a library and provide
additional functions to the reverse engineer, which would lack the beauty and
the intuitive aspect of the language.
7 Conclusion
We provide an alternative framework for debugging programs in a hostile envi-
ronment with the perspective of reverse engineering. This novel model of analy-
sis allows for high performance, unintrusive, multiarchitecture analysis of binary
26
programs without needing the source code. Our implementation includes a dis-
assembly engine, a binary manipulation library, a fingerprinting library, and the
support for a new powerful debug format that keep compatibility with exist-
ing ones. We rely on a minimal aspect oriented interface for modularity and we
propose a practical scripting language and its ad-hoc type system. Thanks to
this fine-grained architecture, we manage to provide the base interpreter func-
tionalities in a small and extensible core for rapid development of specialized
interpreters. We gave three examples of such instances : the ELF shell for ondisk
analysis, the Embedded ELF debugger for runtime analysis, and the ELF tracer
that combine both facets.
References
1. The ELF shell crew, The ELF shell
http : //elf sh.asgardlabs.org/
10. Embedded ELF Debugging, The ELF shell crew, Phrack Magazine issue 63
http : //phrack.org/archives/63/p63 − 0x09 Embedded Elf Debugging.txt
v
13. Alan Mycroft, Type-based decompilation
European Symposium and Programming (ESOP99)
18. Ilfak Guilfanov and the datarescue team - The Interactive Disassembler
http : //www.datarescue.com/idabase/
A Etrace example