Format-String Vulnerability
Instructor: Fengwei Zhang
SUSTech CS 315 Computer Security 1
Outline
● Format String
● Access optional arguments
● How printf() works
● Format string attack
● How to exploit the vulnerability
● Countermeasures
2
Format String
● printf()- To print out a string according to a
format.
int printf(const char *format,
…);
● The argument list of printf() consists of :
○ One concrete argument format
○ Zero or more optional arguments
● Hence, compilers don’t complain if less arguments
are passed to printf() during invocation.
3
Access Optional Arguments
● myprint() shows how printf()
actually works.
● Consider myprintf() is invoked in
line 7.
● va_list pointer (line 1) accesses
the optional arguments.
● va_start() macro (line 2)
calculates the initial position of
va_list based on the second
argument Narg (last argument
before the optional arguments
begin)
4
Access Optional Arguments
● va_start() macro gets the start
address of Narg, finds the size
based on the data type and sets the
value for va_list pointer.
● va_list pointer advances using
va_arg() macro.
● va_arg(ap, int) : Moves the ap
pointer (va_list) up by 4 bytes.
● When all the optional arguments
are accessed, va_end() is called.
5
How printf() Access Optional Arguments
● Here, printf() has three optional arguments. Elements starting with “%” are
called format specifiers.
● printf() scans the format string and prints out each character until “%” is
encountered.
● printf() calls va_arg(), which returns the optional argument pointed by va_list
and advances it to the next argument.
6
How printf() Access Optional Arguments
● When printf() is invoked, the
arguments are pushed onto the
stack in reverse order.
● When it scans and prints the
format string, printf() replaces
%d with the value from the first
optional argument and prints
out the value.
● va_list is then moved to the
position 2.
7
Missing Optional Arguments
● va_arg() macro doesn’t
understand if it reached the
end of the optional argument
list.
● It continues fetching data from
the stack and advancing
va_list pointer.
8
Format String Vulnerability
● In these three examples,
user’s input (user_input)
becomes part of a format
string.
What will happen if
user_input contains format
specifiers?
9
Vulnerable Code
10
Vulnerable Program’s Stack
Inside printf(), the starting
point of the optional
arguments (va_list pointer) is
the position right above the
format string argument.
11
What Can We Achieve?
● Attack 1 : Crash program
● Attack 2 : Print out data on the stack
● Attack 3 : Change the program’s data in the
memory
● Attack 4 : Change the program’s data to specific
value
● Attack 5 : Inject Malicious Code
12
Attack 1 : Crash Program
● User input: %s%s%s%s%s%s%s%s
● printf() parses the format string.
● For each %s, it fetches a value where va_list points to
and advances va_list to the next position.
● As we give %s, printf() treats the value as address and
fetches data from that address. If the value is not a
valid address, the program crashes.
13
Attack 2 : Print Out Data on the Stack
● Suppose a variable on the stack contains a secret
(constant) and we need to print it out.
● Use user input: %x%x%x%x%x%x%x%x
● printf() prints out the integer value pointed by va_list
pointer and advances it by 4 bytes.
● Number of %x is decided by the distance between the
starting point of the va_list pointer and the variable. It
can be achieved by trial and error.
14
Attack 3: Change Program’s Data in Memory
Goal: change the value of var variable from 0x11223344 to some other
value.
● %n: Writes the number of characters printed out so
far into memory.
● printf(“hello%n”,&i) ⇒ When printf() gets to %n, it
has already printed 5 characters, so it stores 5 to
the provided memory address.
● %n treats the value pointed by the va_list pointer
as a memory address and writes into that location.
● Hence, if we want to write a value to a memory
location, we need to have it’s address on the stack.
15
Attack 3: Change Program’s Data in Memory
Assuming the address of var is 0xbffff304 (can be obtained using gdb)
● The address of var is given in the beginning of the input so that it is
stored on the stack.
● $(command): Command substitution. Allows the output of the command
to replace the command itself.
● “\x04” : Indicates that “04” is an actual number and not as two ascii
characters.
16
Attack 3: Change Program’s Data in Memory
● var’s address (0xbffff304)
is on the stack.
● Goal : To move the va_list
pointer to this location and
then use %n to store some
value.
● %x is used to advance the
va_list pointer.
● How many %x are
required?
17
Attack 3: Change Program’s Data in Memory
● Using trial and error, we check how many %x are needed to print out
0xbffff304.
● Here we need 6 %x format specifiers, indicating 5 %x and 1 %n.
● After the attack, data in the target address is modified to 0x2c (44 in
decimal).
● Because 44 characters have been printed out before %n.
18
Attack 4: Change Program’s Data to a Specific Value
Goal: To change the value of var from 0x11223344 to
0x9896a9
printf() has already printed out 41 characters before %.10000000x, so,
10000000+41 = 10000041 (0x9896a9) will be stored in 0xbffff304.
Precision modifier : Controls the minimum number of digits to print.
printf(“%.5d”, 10) prints number 10 with 5 digits: “00010”
19
Attack 4 : A Faster Approach
%n : Treats argument as a 4-byte integer
%hn : Treats argument as a 2-byte short integer. Overwrites only 2 significant bytes of the argument.
%hhn : Treats argument as a 1-byte char type. Overwrites the least significant byte of the argument.
20
Attack 4 : A Faster Approach
Goal: change the value of var to 0x66887799
● Use %hn to modify the var variable two bytes at a time.
● Break the memory of var into two parts, each with two
bytes.
● Most computers use the Little-Endian architecture
○ The 2 least significant bytes (0x7799) are stored at address
0xbffff304
○ The 2 significant bytes (0x6688) are stored at 0xbffff306
● If the first %hn gets value x, and before the next %hn, t
more characters are printed, the second %hn will get
value x+t.
21
Attack 4 : A Faster Approach
● Overwrite the bytes at 0xbffff306 with 0x6688.
● Print some more characters so that when we reach
0xbffff304, the number of characters will be increased to
0x7799.
22
Attack 4 : A Faster Approach
● Address A : first part of address of var ( 4 chars )
● Address B : second part of address of var ( 4 chars)
● 4 %.8x : To move va_list to reach Address 1 (Trial and error, 4x8=32)
● @@@@ : 4 chars
● 5 _ : 5 chars
● Total : 12+5+32 = 49 chars
23
Attack 4 : A Faster Approach
● To print 0x6688 (26248), we need 26248 - 49 = 26199
characters as precision field of %x.
● If we use %hn after first address, va_list will point to the
second address and same value will be stored.
● Hence, we put @@@@ between two addresses so that we
can insert one more %x and increase the number of printed
characters to 0x7799.
● After first %hn, va_list pointer points to @@@@, the
pointer will advance to the second address. Precision field
is set to 4368 =30617 - 26248 -1 in order to print 0x7799
(30617) when we reach second %hn.
24
Attack 5: Inject Malicious Code
Goal : To modify the return address of the vulnerable code
and let it point it to the malicious code (e.g., shellcode to
execute /bin/sh) . Get root access if vulnerable code is a SET-
UID program.
Challenges :
● Inject Malicious code in the stack
● Find starting address (A) of the injected code
● Find return address (B) of the vulnerable code
● Write value A to B
25
Attack 5 : Inject Malicious Code
● Using gdb to get the return address and start address of
the malicious code.
● Assume that the return address is 0xbffff38c
● Assume that the start address of the malicious code is
0xbfff358
Goal : Write the value 0xbffff358 to address 0xbffff38c
Steps :
● Break 0xbffff38c into two contiguous 2-byte memory
locations : 0xbffff38c and 0xbffff38e.
● Store 0xbfff into 0xbffff38e and 0xf358 into
0xbffff38c
26
Attack 5: Inject Malicious Code
● Number of characters printed before first
%hn = 12 + (4x8) + 5 + 49102 = 49151
(0xbfff).
● After first %hn, 13144 + 1 =13145 are
printed
● 49151 + 13145 = 62296 (0xbffff358) is
printed on 0xbffff38c
27
Run the Exploit Code
● Compile the vulnerable code with executable stack.
● Make the vulnerable code as a Set-UID program.
● Switch off the address randomization.
● Run the vulnerable program with our input payload
28
Run the Exploit Code
We couldn’t get the shell using the malicious
shell to execute /bin/sh.
Hypothesis :
● We direct the standard input to a file called input while
running the vul program.
● When /bin/sh is triggered from the input file, it inherits the
standard input.
● But as we reach the end of the file, there is no more input
for the shell program and hence it exits.
● So, the shell program is triggered but exits too quickly
before we can see.
29
A Solution
● Create /tmp/bad as follows :
It runs /bin/sh and redirect the standard input (file
descriptor 0) so that the standard output (file
descriptor 1), which is the terminal, is also used as
the standard input.
30
Countermeasures: Developer
● Avoid using untrusted user inputs for format strings
in functions like printf, sprintf, fprintf, vprintf, scanf,
vfscanf.
31
Countermeasures: Compiler
Compilers can detect potential format string vulnerabilities
● Use two compilers to
compile the program:
gcc and clang.
● We can see that there
is a mismatch in the
format string.
32
Countermeasures: Compiler
● With default settings, both compilers gave warning for the first printf().
● No warning was given out for the second one.
33
Countermeasures: Compiler
● On giving an option -wformat=2, both compilers give warnings for both
printf statements stating that the format string is not a string literal.
● These warnings just act as reminders to the developers that there is a potential
problem but nevertheless compile the programs.
34
Countermeaseures
● Address randomization: Makes it difficult for the attackers to
guess the address of the address of the target memory (
return address, address of the malicious code)
● Non-executable Stack/Heap: This will not work. Attackers
can use the return-to-libc technique to defeat the
countermeasure.
● StackGuard: This will not work. Unlike buffer overflow, using
format string vulnerabilities, we can ensure that only the
target memory is modified; no other memory is affected.
35
Summary
● How format string works
● Format string vulnerability
● Exploiting the vulnerability
● Injecting malicious code by exploiting the
vulnerability
36