Reverse Engineering Malware: Hassen Saidi
Reverse Engineering Malware: Hassen Saidi
Reverse Engineering Malware: Hassen Saidi
Reverse Engineering
Malware
Hassen Saidi
Computer Science Laboratory
SRI International
Hassen.Saidi@sri.com
The Growth of a Network
The Growth of a Threat
The Growth of a Threat
Mass email campaign: Love letter, Melissa
Multiple vectors of infection, attacks against AV software,
Combined infection vectors, dangerous payloads: Code Red, Nimda
01001010100101010
10101010011010101
01001010100101010
10101010011010101
• What does the malware do
01001010100101010 • How does it do it
10101010011010101
01001010100101010
• identify triggers
10101010011010101 • What is the purpose of the
10101010011010101
01001010100101010
malware
10101010011010101 • is this an instance of a known
.exe threat or a new malware
• who is the author
Typically a stripped
•…
binary with no
debugging information.
• Dynamic Analysis
– Techniques that profile actions of binary at
runtime
– More popular
• CWSandbox, TTAnalyze, multipath exploration
• Only provides partial ``effects-oriented profile’’ of
malware potential
• Static Analysis
– Can provide complementary insights
– Potential for more comprehensive assessment
Malware Evasions and Obfuscations
Unpacking
Example of Packed Code
The Eureka Framework
• Observations
– Statistical properties of packed executable differ
from unpacked exectuable
– As malware executes code-to-data ratio increases
• Complications
– Code and data sections are interleaved in PE
executables
– Data directories(import tables) look similar to data
but are often found in code sections
– Properties of data sections vary with packers
Statistics-based Unpacking (2)
• Our Approach
– Model statistical properties of unpacked code
• Estimating unpacked code
– N-gram analysis to look for frequent instructions
– We use bi-grams (2-grams) because x-86 opcodes are 1 or 2
bytes
– Extract subroutine code from 9 benign executables
– FF 15 (call), FF 75 (push), E8 _ _ _ ff (call), E8 _ _ _ 00 (call)
Evaluation (ASPack)
Evaluation (MoleBox)
Evaluation (Armadillo)
Systematic Approach to Code Deobfuscation:
Unpacking
2. Runtime unpacking
3. Jump to OEP
Phase 3: Fixing the Disassembled Code
==================================================
Function Name : ADSICloseSearchHandle
Address : 0x76e3050a
Relative Address : 0x0002050a
Ordinal : 143 (0x8f)
Filename : adsldpc.dll
Full Path : c:\WINDOWS\system32\adsldpc.dll
Type : Exported Function
==================================================
==================================================
Function Name : ADSICreateDSObject
Address : 0x76e30447
Relative Address : 0x00020447
Ordinal : 144 (0x90)
Filename : adsldpc.dll
Full Path : c:\WINDOWS\system32\adsldpc.dll
Type : Exported Function
==================================================
Using Dataflow Analysis
• Identify register based indirect calls
GetEnvironmentStringW
def
use
Handling Dynamic Pointer Updates
• Identify register based indirect calls
def
use
Leveraging Standard API Address Loading is not enough
==================================================
Function Name : ADSICloseDSObject
Address : 0x76e30826
Relative Address : 0x00020826
Ordinal : 142 (0x8e)
Filename
Full Path
: adsldpc.dll
: c:\WINDOWS\system32\adsldpc.dll
There are many indirect ways to load
Type : Exported Function
==================================================
And call a Windows API:
• access to list of loaded DLLs
==================================================
Function Name : ADSICloseSearchHandle • access to a loaded DLL and use of
Address : 0x76e3050a
Relative Address : 0x0002050a GetModulHandle() + offset
Ordinal
Filename
: 143 (0x8f)
: adsldpc.dll
•…
Full Path : c:\WINDOWS\system32\adsldpc.dll
Type : Exported Function
==================================================
==================================================
Function Name : ADSICreateDSObject
Address : 0x76e30447
Relative Address : 0x00020447
Ordinal : 144 (0x90)
Filename : adsldpc.dll
Full Path : c:\WINDOWS\system32\adsldpc.dll
Type : Exported Function
==================================================
Consequence of Failure to Identify APIs
...
.text:004011A7 push offset unk_40A2DC ; arg 1 Name of a library
.text:004011AC xor ebx, ebx
.text:004011AE call dword ptr unk_40A0E4 .data:0040A0E4 00000000 Load library call (LoadLibrary)
.text:004011B4 mov edi, eax
.text:004011B6 cmp edi, ebx
.text:004011B8 jz short loc_401211
.text:004011BA push esi
.text:004011BB mov esi, dword ptr unk_40A0E8 Name of the library function
.text:004011C1 push offset unk_40A2C4 ; arg 2
.text:004011C6 push edi ; arg 1
Name of the library
.text:004011C7 call esi ; unk_40A0E8 .data:0040A0E8 00000000 API call to get the address
.text:004011C9 push offset unk_40A2AC Of the loaded library function
.text:004011CE push edi (GetProcAddress)
.text:004011CF mov dword_433480, eax
...
...
.text:00401132 lea eax, [ebp+var_4]
.text:00401135 push eax
.text:00401136 push ebx
.text:00401137 push 0
.text:00401139 mov [ebp+var_4], esi
.text:0040113C call dword_433480 library function call
.text:00401142 test eax, eax
…
Failure to Perform Control Flow Analysis
• CreateThread
.data:009A3939 xxxxxxx
• Starting Services
• Thread synchronization
• Critical sections
• Callback functions
Advanced API Resolution
struct sockaddr_in {
short sin_family;
u_short sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
Type propagation and matching
• Type propagation using dataflow analysis
• Propagation of return values and arguments of functions
.text:00403710 push edi ; arg 6 type(f,6) = type (edi) __in DWORD dwDesiredAccess,
.text:00403711 push 4 ; arg 5 type(f,5) = union(int,char) __in DWORD dwShareMode,
.text:00403713 push ebx ; arg 4 type(f,4) = type (ebx) __in LPSECURITY_ATTRIBUTES lpSecurityAttributes,
.text:00403714 push 2 ; arg 3 type(f,3) = union(int,char) __in DWORD dwCreationDisposition,
.text:00403716 push 2 ; arg 2 type(f,2) = union(int,char)
__in DWORD dwFlagsAndAttributes,
.text:00403718 push [ebp+arg_4] ; arg 1 type(f,1) = type([ebp+arg_4])
__in HANDLE hTemplateFile
.text:0040371B mov [ebp+var_18], ebx
);
.text:0040371E call esi ; dword_40A06C type(ret(f)) = type(eax)
...
Advantages of type Inference Analysis
• Programmers data structures and types are going to be based on
known data structures and types provided by the libraries
• Identifying API calls and type information help capture better the
semantics of the program execution
Compiler Unpacking
if ( a1 == 43200 ) result = 1;
else result = off_9BAAA5();
return result;
}
Systematic Approach to Code Deobfuscation:
Binary Rewriting
• Dechunking: The control flow of Conficker's P2P module has been significantly
obfuscated to hinder its disassembly and decompilation. Specifically, the contents of code blocks
from each subroutine have been extracted and relocated throughout different portions of the
executable. These different blocks (or chunks) are then referenced through unconditional and
conditional jump instructions. In effect, the logical control flow of the P2P module has been
obscured (spaghetti-code) to a degree that the module cannot be decompiled into coherent C-like
code, which typically drives more in-depth and accurate code interpretation. Move all blocks to a
contiguous memory block.
The P2P protocol was not just a mechanism for distributing PE executable
files but also digitally signed sets of x86 instructions that are executed in a
separate thread and take as argument the IP address of the sender. This
would provide a hot patch mechanism for all data manipulated by Conficker:
list of peers, encryption/decryption keys, the Conficker code it self, etc.
Stuxnet: Keeping it “relatively” simple
• http://mtc.sri.com/