Chap-3 (Malware Analysis) (Sem-5)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

ADVANCED STATIC ANALYSIS

LIMITATION OF BASIC STATIC ANALYSIS

• Basic static techniques are like looking at the outside of a body during an autopsy. You can use static analysis to
draw some preliminary conclusions, but more in-depth analysis is required to get the whole story.
• For example, you might find that a particular function is imported, but you won’t know how it’s used or whether
it’s used at all.
LEVELS OF ABSTRACTION

• In traditional computer architecture, a computer system can be represented as several levels of abstraction that
create a way of hiding the implementation details.
• For example, you can run the Windows OS on many different types of hardware, because the underlying
hardware is abstracted from the OS.
• There are three coding levels involved in malware analysis. Malware authors create programs at the high-level
language and use a compiler to generate machine code to be run by the CPU. Malware analysts and reverse
engineers operate at the low-level language; we use a disassembler to generate assembly code that we can read
and analyze to figure out how a program operates.
LEVELS OF ABSTRACTION

• Computer systems are generally described with the following six different levels of abstraction.
• Hardware
• Microcode
• Machine Code
• Low-Level Languages
• High-level Languages
• Interpreted languages
HARDWARE

• The hardware level, the only physical level, consists of electrical circuits that implement complex combinations
of logical operators such as XOR, AND, OR, and NOT gates, known as digital logic.
• Because of its physical nature, hardware cannot be easily manipulated by software.
MICROCODE

• The microcode level is also known as firmware.


• Microcode operates only on the exact circuitry for which it was designed.
• It contains microinstructions that translate from the higher machine-code level to provide a way to interface
with the hardware.
• When performing malware analysis, we usually don’t worry about the microcode because it is often specific to
the computer hardware for which it was written.
MACHINE CODE

• The machine code level consists of opcodes, hexadecimal digits that tell the processor what you want it to do.
• Machine code is typically implemented with several microcode instructions so that the underlying hardware can
execute the code.
• Machine code is created when a computer program written in a high-level language is compiled.
LOW-LEVEL LANGUAGES

• A low-level language is a human-readable version of a computer architecture’s instruction set.


• The most common low-level language is assembly language.
• Malware analysts operate at the low-level languages because the machine code is too difficult for a human to
comprehend.
• Malware analyst use a disassembler to generate low-level language text, which consists of simple mnemonics
such as mov and jmp. Many different dialects of assembly language exist.
HIGH LEVEL LANGUAGES

• Most computer programmers operate at the level of high-level languages. High-level languages provide strong
abstraction from the machine level and make it easy to use programming logic and flow-control mechanisms.
• High-level languages include C, C++, and others.
• These languages are typically turned into machine code by a compiler through a process known as compilation.
INTERPRETED LANGUAGES

• Interpreted languages are at the top level.


• Many programmers use interpreted languages such as C#, Perl, .NET, and Java.
• The code at this level is not compiled into machine code; instead, it is translated into bytecode.
• Bytecode is an intermediate representation that is specific to the programming language.
• Bytecode executes within an interpreter, which is a program that translates bytecode into executable machine
code on the fly at runtime.
• An interpreter provides an automatic level of abstraction when compared to traditional compiled code, because
it can handle errors and memory management on its own, independent of the OS.
REVERSE-ENGINEERING

• When malware is stored on a disk, it is typically in binary form at the machine code level.
• When we disassemble malware, we take the malware binary as input and generate assembly language code as
output, usually with a disassembler.
• x86 is by far the most popular architecture for PCs.
• Most 32-bit personal computers are x86.
• most AMD64 or Intel 64 architectures running Windows support x86 32-bit binaries. For this reason, most
malware is compiled for x86.
THE X86 ARCHITECTURE

• The internals of most modern computer architectures (including x86) follow the Von Neumann architecture.
• It has three hardware components:
• The central processing unit (CPU) executes code.
• The main memory of the system (RAM) stores all data and code.
• An input/output system (I/O) interfaces with devices such as hard drives, keyboards, and monitors.
THE X86 ARCHITECTURE

• The control unit gets instructions to execute from RAM using a register (the instruction pointer), which stores
the address of the instruction to execute.
• Registers are the CPU’s basic data storage units and are often used to save time so that the CPU doesn’t need
to access RAM.
• The arithmetic logic unit (ALU) executes an instruction fetched from RAM and places the results in registers or
memory.
• The process of fetching and executing instruction after instruction is repeated as a program runs.
MAIN MEMORY(RAM)
MAIN MEMORY (RAM)

• Data - This term can be used to refer to a specific section of memory called the data section, which contains
values that are put in place when a program is initially loaded. These values are sometimes called static values
because they may not change while the program is running, or they may be called global values because they are
available to any part of the program.
• Code - Code includes the instructions fetched by the CPU to execute the program’s tasks. The code controls
what the program does and how the program’s tasks will be orchestrated.
• Heap - The heap is used for dynamic memory during program execution, to create (allocate) new values and
eliminate (free) values that the program no longer needs. The heap is referred to as dynamic memory because its
contents can change frequently while the program is running.
• Stack - The stack is used for local variables and parameters for functions, and to help control program flow.
INSTRUCTIONS

• Instructions are the building blocks of assembly programs. In x86 assembly, an instruction is made of a mnemonic
and operands.
• The mnemonic is a word that identifies the instruction to execute, such as mov, which moves data.
• Operands are typically used to identify information used by the instruction, such as registers or data.
OPCODES AND ENDIANNESS

• Each instruction corresponds to opcodes (operation codes) that tell the CPU which operation the program
wants to perform.
• Disassemblers translate opcodes into human-readable instructions.
• you can see that the opcodes are B9 42 00 00 00 for the instruction mov ecx, 0x42. The value 0xB9
corresponds to mov ecx, and 0x42000000 corresponds to the value 0x42.
OPERANDS

• Operands are used to identify the data used by an instruction. Three types of operands can be used:
• Immediate operands are fixed values, such as the 0x42.
• Register operands refer to registers, such as ecx.
• Memory address operands refer to a memory address that contains the value of interest, typically denoted by a value,
register, or equation between brackets, such as [eax].
REGISTERS

• A register is a small amount of data storage available to the CPU, whose contents can be accessed more quickly
than storage available elsewhere, x86 processors have a collection of registers available for use as temporary
storage or workspace.
• Which fall into the following four categories:
• General Registers are used by the CPU during execution
• Segment Registers are used to track sections of memory.
• Status flags are used to make decisions.
• Instruction Pointers are used to keep track of the next instruction to execute.
TOOLS IN UNIT 3

• IDA Pro
• OllyDbg
• WinDBG
• Volatility

• No need to refer them for now. Videos explaining each tool will be shared with you soon.

You might also like