Chap-3 (Malware Analysis) (Sem-5)
Chap-3 (Malware Analysis) (Sem-5)
Chap-3 (Malware Analysis) (Sem-5)
• Basic static techniques are like looking at the outside of a body during an autopsy. You can use static analysis to
draw some preliminary conclusions, but more in-depth analysis is required to get the whole story.
• For example, you might find that a particular function is imported, but you won’t know how it’s used or whether
it’s used at all.
LEVELS OF ABSTRACTION
• In traditional computer architecture, a computer system can be represented as several levels of abstraction that
create a way of hiding the implementation details.
• For example, you can run the Windows OS on many different types of hardware, because the underlying
hardware is abstracted from the OS.
• There are three coding levels involved in malware analysis. Malware authors create programs at the high-level
language and use a compiler to generate machine code to be run by the CPU. Malware analysts and reverse
engineers operate at the low-level language; we use a disassembler to generate assembly code that we can read
and analyze to figure out how a program operates.
LEVELS OF ABSTRACTION
• Computer systems are generally described with the following six different levels of abstraction.
• Hardware
• Microcode
• Machine Code
• Low-Level Languages
• High-level Languages
• Interpreted languages
HARDWARE
• The hardware level, the only physical level, consists of electrical circuits that implement complex combinations
of logical operators such as XOR, AND, OR, and NOT gates, known as digital logic.
• Because of its physical nature, hardware cannot be easily manipulated by software.
MICROCODE
• The machine code level consists of opcodes, hexadecimal digits that tell the processor what you want it to do.
• Machine code is typically implemented with several microcode instructions so that the underlying hardware can
execute the code.
• Machine code is created when a computer program written in a high-level language is compiled.
LOW-LEVEL LANGUAGES
• Most computer programmers operate at the level of high-level languages. High-level languages provide strong
abstraction from the machine level and make it easy to use programming logic and flow-control mechanisms.
• High-level languages include C, C++, and others.
• These languages are typically turned into machine code by a compiler through a process known as compilation.
INTERPRETED LANGUAGES
• When malware is stored on a disk, it is typically in binary form at the machine code level.
• When we disassemble malware, we take the malware binary as input and generate assembly language code as
output, usually with a disassembler.
• x86 is by far the most popular architecture for PCs.
• Most 32-bit personal computers are x86.
• most AMD64 or Intel 64 architectures running Windows support x86 32-bit binaries. For this reason, most
malware is compiled for x86.
THE X86 ARCHITECTURE
• The internals of most modern computer architectures (including x86) follow the Von Neumann architecture.
• It has three hardware components:
• The central processing unit (CPU) executes code.
• The main memory of the system (RAM) stores all data and code.
• An input/output system (I/O) interfaces with devices such as hard drives, keyboards, and monitors.
THE X86 ARCHITECTURE
• The control unit gets instructions to execute from RAM using a register (the instruction pointer), which stores
the address of the instruction to execute.
• Registers are the CPU’s basic data storage units and are often used to save time so that the CPU doesn’t need
to access RAM.
• The arithmetic logic unit (ALU) executes an instruction fetched from RAM and places the results in registers or
memory.
• The process of fetching and executing instruction after instruction is repeated as a program runs.
MAIN MEMORY(RAM)
MAIN MEMORY (RAM)
• Data - This term can be used to refer to a specific section of memory called the data section, which contains
values that are put in place when a program is initially loaded. These values are sometimes called static values
because they may not change while the program is running, or they may be called global values because they are
available to any part of the program.
• Code - Code includes the instructions fetched by the CPU to execute the program’s tasks. The code controls
what the program does and how the program’s tasks will be orchestrated.
• Heap - The heap is used for dynamic memory during program execution, to create (allocate) new values and
eliminate (free) values that the program no longer needs. The heap is referred to as dynamic memory because its
contents can change frequently while the program is running.
• Stack - The stack is used for local variables and parameters for functions, and to help control program flow.
INSTRUCTIONS
• Instructions are the building blocks of assembly programs. In x86 assembly, an instruction is made of a mnemonic
and operands.
• The mnemonic is a word that identifies the instruction to execute, such as mov, which moves data.
• Operands are typically used to identify information used by the instruction, such as registers or data.
OPCODES AND ENDIANNESS
• Each instruction corresponds to opcodes (operation codes) that tell the CPU which operation the program
wants to perform.
• Disassemblers translate opcodes into human-readable instructions.
• you can see that the opcodes are B9 42 00 00 00 for the instruction mov ecx, 0x42. The value 0xB9
corresponds to mov ecx, and 0x42000000 corresponds to the value 0x42.
OPERANDS
• Operands are used to identify the data used by an instruction. Three types of operands can be used:
• Immediate operands are fixed values, such as the 0x42.
• Register operands refer to registers, such as ecx.
• Memory address operands refer to a memory address that contains the value of interest, typically denoted by a value,
register, or equation between brackets, such as [eax].
REGISTERS
• A register is a small amount of data storage available to the CPU, whose contents can be accessed more quickly
than storage available elsewhere, x86 processors have a collection of registers available for use as temporary
storage or workspace.
• Which fall into the following four categories:
• General Registers are used by the CPU during execution
• Segment Registers are used to track sections of memory.
• Status flags are used to make decisions.
• Instruction Pointers are used to keep track of the next instruction to execute.
TOOLS IN UNIT 3
• IDA Pro
• OllyDbg
• WinDBG
• Volatility
• No need to refer them for now. Videos explaining each tool will be shared with you soon.