A Crash Course On x86 Disassembly

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

A Crash Course on x86 Disassembly

Chapter
4
Levels of Abstraction
 Computer systems: several levels of
abstractions – hide implementation
details.
 Six levels of abstractions:
Hardware: digital logic gates (AND,
OR, XOR, NOT)
Microcode: firmware – interface
with hardware
Machine Code: opcodes, hex digits
(after compile)
Low-level languages: instruction set,
human readable
High-level language: C/C++ ->
compiled into machine code
Interpreted languages: C#, Java
->translated into bytecode (translated
into machine code)
PC Architectures
Focus: x86 32-bit, Intel IA-32
Other architectures: x64, MIPS,
ARM

• CPU: executes the code


• RAM: stores data and code
• I/O: interface with
hardware (keyboard,
monitors, mouse)
• Control unit: gets instructions
to execute from RAM using a
register.
• Register: basic data storage
units.
Main Memory

• Data: contains values


when a program is
initially loaded
(static/global values).
• Code: instructions fetched
by CPU to execute the
program.
• Heap: dynamic memory;
allocate new values and free
values.
• Stack: local variables
and parameters.
Distinguish Data from Code
 Can we distinguish data from
code ?

Incorrect specification will lead to errors, and the program is most likely
to crash.

*Sometimes, IDAPro may have difficulties as well (described later in the


course).
Common Data Types
 Bytes—8 bits. Examples: AL, BL, CL
 Word—16 bits. Examples: AX, BX, CX
 Double word—32 bits. Examples: EAX, EBX,
ECX
 Quad word—64 bits. (x86 does not have 64-bit,
usually combines two registers)
Instruction

• Move 0x42 into register


ecx

• 0xB9 -> move ecx


• 42 00 00 00 -> 0x42
• Memory address: denoted by value, register or
equation between brackets, e.g. [eax]
• 0x30004040 -> immediate number (like constant)
• [0x4000349e] -> immediate hard-coded address
Registers
Small amount of data storage available to
CPU
Accessed more quickly than storage
elsewhere

AX: reference the


lower 16 bits of the
EAX

AL: lower 8
bits AH:
Registers
• EAX, EDX for multiplication and division
 Multiplies the unsigned operand by EAX and stores the result in a 64-
bit value in EDX:EAX. EDX:EAX means that the low (least
significant) 32 bits are stored in EAX and the high (most significant)
32 bits are stored in EDX.
 Division: divides 64 bits across EDX and EAX by value. Result
stored in EAX, remainder in EDX.
Use of registers follow certain conventions
E.g. EAX generally contains return value for function calls.
Important for malware analyst to know conventions to
examine the code quickly
Flags
EFLAGS register: status register (32 bit, each bit is
a flag). Some important flags
ZF: zero flag, set if operation is zero
CF: carry flag, set if operation is too large for
destination operand
SF: sign flag set if operation is negative
TF: trap flag used for debugging (x86 execute one
instruction at a time if set)
Extended Instruction Pointer (EIP)
EIP: a register contains the memory address of
the next instruction to be executed
Tell the processor what to do next
If EIP is corrupted, points to a memory address that
is not legit, program will crash
Attackers controls EIP through exploitation – have
attack code in memory, then change EIP to point to
that code to exploit a system
Buffer overflow attacks
Instructions
Mov: mov destination, source – move data into registers or
RAM

• Lea: load effective address – put a memory address into the destination, e.g. lea eax,
[ebx+8] -
> put EBX+8 into EAX
• Mov eax, [ebx+8] -> loads the data at memory address specified by EBX+8
• Lea eax,[ebx+8] = mov eax, ebx+8
Arithmeti
c Addition: add destination, value
Subtraction: sub destination, value (ZF set if zero;
CF set if destination < value)
Inc/Dec: increment or decrement a register by one
Multiply and division: act on predefined
registers
Mul value : multiplies EAX by value.
Results stored as 64-bit value: EDX and EAX.
EDX most significant 32 bits, EAX least
significant 32 bits
Div value: divides the 64 bits across EDX
and EAX by
value. Results stored in EAX, remainder in
EDX.
Logical Operations
Or, AND, XOR: xor eax, eax -> set EAX
to zero (optimization for clear register)
33 C0 xor eax eax; B8 01 00 00 00 mov eax,1
-> 2 bytes vs. 5 bytes
shr/shl: shift register to right/left.
Shr destination, count
Bits shifted beyond boundary are first shifted
into CF.
ror/rol: rotate – no fall off, bits shift to the
other side
Shifting: an optimization of multiplication
-> each shift left-
> multiples by two; n bits -> ?
Stack
Last In First Out (LIFO)
What are stored in a stack ?
Functions, local variables, flow controls
ESP and EBP registers
ESP -> stack pointer (memory adrs top of the stk)
EBP-> base pointer (stays consistent within a given
function -> for keeping track of local )
variables/parameters
Short-term storage
Addrs “grows” from high to low
Function Calls
Prologue: prepares the stack and
registers
Epilogue: restore the stack and
registers
1.Arguments are pushed on the stack
2.Function is called using
call memory_location
(contents of the EIP
register) is pushed onto
stack
EIP set to memory_location
(the start of the function –
for return)
4. Finish: ESP is adjusted to Pusha: push 16-bit registers in order: AX, CX, DX, BX,
free local
3.Space variables;
allocated EBP is
for local SP, BP, SI, DI (compilers rarely use; shellcode stores
restored ; EBP
variables and pops return
is pushed intermediate states) Pushad: push 32-bit registers in order:
address
onto the off the stack into
stack. EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI
EIP (for next instructions)
Example (function call)

1. Push base pointer


2. Set stack point to base pointer
3. Reduce stack pointer by 28h
(Why ?)
4. Perform add, call printf
5. Leave- Rtn
Conditionals
test: identical to and (operands are not set, only flags)
Test against itself
is to check for NULL values: test eax,
eax -> compare EAX to zero but requires less CPU
cycles.
cmp: identical to sub (operands are not set, only the
flags)
Branching
Jump instructions: causes the next instruction to be
executed.
Rep Instructions
Manipulating data buffers – in the form of an array
of bytes (single or double words).
Movsx, cmpsx, stosx, scasx, x = b, w, d (byte, word,
double word)
Movsb -> move only a single byte, from ESI to EDI
(source and destination registers); rep prefix is commonly
used with movsb to copy a seq. of bytes, with size defined
by ECX.
Main Method in C
int main (int argc, char ** argv)
Argc – number of arguments
Argv – pointer to an array of strings with
command-line arguments
Main Method in C – In Class Homework

You might also like