3 Programs & Programming.pptx
3 Programs & Programming.pptx
Introduction
• Programs are just strings of 0s and 1s, representing elementary machine
commands such as move one data item, compare two data items, or
branch to a different command.
• The Intel 32- and 64-bit instruction set has about 30 basic primitives (such
as move, compare, branch, increment and decrement, logical
operations, arithmetic operations, trigger I/O, generate and service
interrupts, push, pop, call, and return) and specialized instructions to
improve performance on computations such as floating-point operations
or cryptography.
• Program flaws can have two kinds of security implications: They can
cause integrity problems leading to harmful output or action, and they
offer an opportunity for exploitation by a malicious actor.
3
Memory
• Memory is a limited but flexible resource; any memory location can hold any
piece of code or data. To make managing computer memory efficient,
operating systems jam one data element next to another, without regard for
data type, size, content, or purpose.
• Users and programmers seldom know, much less have any need to know,
precisely which memory location a code or data item occupies.
• Instructions and data are all binary strings; only the context of use says a byte,
for example, 0x41 represents the letter A, the number 65, or the instruction to
move the contents of register 1 to the stack pointer.
• If you happen to put the data string “A” in the path of execution, it will be
executed as if it were an instruction.
4
Code and data separated, with the heap The same hex value in the same spot in
growing up toward high addresses & the stack memory can either be a meaningful data
growing down from the high addresses. value or a meaningful instruction depending
on whether the computer treats it as code or
Instructions move from the bottom (low data.
addresses) of memory up; left unchecked,
execution would proceed through the local
data area and into the heap and stack.
5
Buffer Overflows
• Occur when data is written beyond the space allocated for it,
such as a 10th byte in a 9-byte array
char sample[10];
int i;
This is a very simple buffer overflow.
for (i=0; i<=9; i++) Character B is placed in memory
sample[i] = ‘A’; that wasn’t allocated by or for this
procedure.
sample[10] = ‘B’;
6
Memory Organization
The operating system’s code and
data coexist with a user’s code and
data. The heavy line between
system and user space is only to
indicate a logical separation
between those two areas; in
practice, the distinction is not so
solid.
Buffer Overflow
• An intruder may wander into an area called the stack and heap. Subprocedure calls
are handled with a stack, a data structure in which the most recent item inserted is the
next one removed (last arrived, first served).
• This structure works well because procedure calls can be nested, with each return
causing control to transfer back to the immediately preceding routine at its point of
execution.
• Each time a procedure is called, its parameters, the return address (the address
immediately after its call), and other local values are pushed onto a stack. An old stack
pointer is also pushed onto the stack, and a stack pointer register is reloaded with the
address of these new values. Control is then transferred to the sub-procedure.
• As the sub-procedure executes, it fetches parameters that it finds by using the address
pointed to by the stack pointer. Typically, the stack pointer is a register in the processor.
Therefore, by causing an overflow into the stack, the attacker can change either the
old stack pointer (changing the context for the calling procedure) or the return
address (causing control to transfer where the attacker intends when the
sub-procedure returns).
• Changing the context or return address allows the attacker to redirect execution to
code written by the attacker.
• The assailant must experiment a little to determine where the overflow is and how to
control it
8
A buffer (or array or string) is a space in which data can be held. A buffer resides in memory.
Because memory is finite, a buffer’s capacity is finite. For this reason, in many programming languages the
programmer must declare the buffer’s maximum size so that the compiler can set aside that amount of space.
9
In this way, when procedure B is finished running, it can get popped off
the stack, and procedure A will just continue executing where it left off.
Compromised Stack
The problem that may result, as it does in this example, occurs when
data are being merged into a buffer that includes the contents of
another buffer, such that the space needed exceeds the space
available.
For the first run, the value read is small enough that the merged
response didn’t corrupt the stack frame.
For the second run, the supplied input was much too large. However,
because a safe input function was used, only 15 characters were read,
as shown in the following line. When this was then merged with the
response string, the result was larger than the space available in the
destination buffer.
It overwrote the saved frame pointer, but not the return address. So the
function returned, as shown by the message printed by the main()
function. But when main() tried to return, because its stack frame had
been corrupted and was now some random value, the program
jumped to an illegal address and crashed.
In this case the combined result was not long enough to reach the
return address, but this would be possible if a larger buffer size had
been used.
Unsafe C Standard Library Routines
This shows that when looking for buffer overflows, all possible places where externally
sourced data are copied or merged have to be located.
Note that these do not even have to be in the code for a particular program, they can
(and indeed do) occur in library routines used by programs, including both standard
libraries and third-party application libraries.
Thus, for both attacker and defender, the scope of possible buffer overflow locations is
very large.
A list of some of the most common unsafe standard C Library routines is given in Table
10.2 . These routines are all suspect and should not be used without checking the total
size of data being transferred in advance, or better still by being replaced with safer
alternatives.
Machine code
•Specific to processor and operating system. Traditionally needed good assembly language skills to create
Metasploit Project
•Provides useful information to people who perform penetration and exploit research
This is the major reason why buffer overflow attacks are usually targeted at
a specific piece of software running on a specific operating system.
However, more recently a number of sites and tools have been developed
that automate this process thus making the development of shellcode
exploits available to a much larger potential audience
Stack Overflow Variants
Commonly used library code Break out of a chroot (restricted execution) environment, giving full access to
the system
The targeted program need not be a trusted system utility. Another possible target is a program providing a
network service; that is, a network daemon. A common approach for such programs is listening for connection
requests from clients and then spawning a child process to handle that request. The child process typically has the
network connection mapped to its standard input and output. This means the child program’s code may use the
same type of unsafe input or buffer copy code as we’ve seen already.
This was the case with the stack overflow attack used by the Morris Worm in 1988. It targeted the use of gets() in the
fingerd daemon handling requests for the UNIX finger network service (which provided info system users).
Yet another possible target is a program, or library code, which handles common document formats (e.g., the
library routines used to decode and display GIF or JPEG images). In this case, the input is not from a terminal or
network connection, but from the file being decoded and displayed.
It can be triggered as the file contents are read, with the details encoded in a specially corrupted image. This
attack file would be distributed via e-mail, instant messaging, or as part of a Web page. The shellcode would
typically open a network connection back to a system under the attacker’s control, to return information and
possibly receive additional commands to execute.
Packetstorm
Packet Storm includes a large collection of packaged shellcode, including code that can
1. Dehashed—View leaked credentials. 16. URL Scan—Free service to scan and analyse websites.
2. SecurityTrails—Extensive DNS data. 17. Vulners—Search vulnerabilities in a large database.
3. DorkSearch—Really fast Google dorking. 18. WayBackMachine—View content from deleted
4. ExploitDB—Archive of various exploits. websites.
5. ZoomEye—Gather information about targets. 19. Shodan—Search for devices connected to the internet.
6. Pulsedive—Search for threat intelligence. 20. Netlas—Search and monitor internet connected assets.
7. GrayHatWarfare—Search public S3 buckets. 21. CRT sh—Search for certs that have been logged by CT.
8. PolySwarm—Scan files and URLs for threats. 22. Wigle—Database of wireless networks, with statistics.
9. Fofa—Search for various threat intelligence. 23. PublicWWW—Marketing and affiliate marketing
10. LeakIX—Search publicly indexed research.
information. 24. Binary Edge—Scans the internet for threat intelligence.
11. DNSDumpster—Search for DNS records. 25. GreyNoise—Search for devices connected to the
12. FullHunt—Search & discovery attack internet.
surfaces. 26. Hunter—Search for email addresses belonging to a
13. AlienVault—Extensive threat intelligence website.
feed. 27. Censys—Assessing attack surface for connected
14. ONYPHE—Collects cyber-threat devices.
intelligence. 28. IntelligenceX—Search Tor, I2P, leaks, domains, and
15. Grep App—Search a half million git repos. emails.
29. Packet Storm Security—Browse vulnerabilities and
exploits.
30. SearchCode—Search 75 billion lines of code 40m
projects.
Buffer Overflow Defenses
Two broad
defense
approaches
Compile-time Run-time
• Compile-time defenses, which aim to harden programs to resist attacks in new programs
• Run-time defenses, which aim to detect and abort attacks in existing programs
Run-time defenses can be deployed as operating systems and updates and can provide some
protection for existing vulnerable programs.
Compile-time defenses aim to prevent or detect buffer overflows by instrumenting programs when
they are compiled. The possibilities for doing this range from choosing a high-level language that
does not permit buffer overflows, to encouraging safe coding standards, using safe standard
libraries, or including additional code to detect corruption of the stack frame.
Compile-Time Defenses: Programming
Language
• Use a modern
high-level language Disadvantages
• Not vulnerable to buffer •Additional code must be executed at run time to impose checks
• Compiler enforces range •Distance from the underlying machine language and architecture means that access
to some instructions and hardware resources is lost
checks and permissible
operations on variables •Limits their usefulness in writing code, such as device drivers, that must interact with
such resources
• Requires support from memory management • Support for executable stack code
unit (MMU) • Special provisions are needed
• Long existed on SPARC / Solaris systems
• Recent on x86 Linux/Unix/Windows systems
Shadow Stack
• In computer security, a shadow stack is a mechanism for protecting a procedure's stored return
address such as from a stack buffer overflow.
• The shadow stack itself is a second, separate stack that "shadows" the program call stack. In the
function prologue, a function stores its return address to both the call stack and the shadow stack.
• In the function epilogue, a function loads the return address from both the call stack and the shadow
stack, and then compares them. If the two records of the return address differ, then an attack is
detected; the typical course of action is simply to terminate the program or alert system
administrators about a possible intrusion attempt.
• A shadow stack is similar to stack canaries in that both mechanisms aim to maintain the control-flow
integrity of the protected program by detecting attacks that tamper the stored return address by an
attacker during an exploitation attempt.
• Shadow stacks can be implemented by recompiling programs with modified prologues and
epilogues, by dynamic binary rewriting techniques to achieve the same effect, or with hardware
support. Unlike the call stack, which also stores local program variables, passed arguments, spilled
registers and other data, the shadow stack typically just stores a second copy of a function's return
address.
• Shadow stacks provide more protection for return addresses than stack canaries, which rely on the
secrecy of the canary value and are vulnerable to non-contiguous write attacks. Shadow stacks
themselves can be protected with guard pages or with information hiding, such that an attacker
would also need to locate the shadow stack to overwrite a return address stored there.
Input Size & Buffer Overflow
• Programmers often make assumptions about the maximum expected size
of input
• Allocated buffer size is not confirmed
• Resulting in buffer overflow
• The most notable software using OpenSSL are the open-source web servers like
Apache and nginx. The combined market share of just those two out of the active
sites on the Internet was over 66% at the time.
• This compromises the secret keys used to identify service providers & to encrypt
traffic, the names & passwords of users & actual content. This allows attackers to
eavesdrop on communications, steal data directly from the services & to
impersonate services and users.
• OpenSSL was patched quickly but OS vendors and appliance vendors, independent
software vendors have to adopt the fix and notify their users.
• This bug has left large amount of private keys & other secrets exposed on Internet.
THE CODE SEGMENT
The Heartbleed bug is in OpenSSL’s TLS heartbeat implementation.
When a TLS heartbeat is sent, it comes with a couple notable pieces of information:
• Some arbitrary payload data. This is intended to be repeated back to the sender so the
sender can verify the connection is still alive and the right data is being transmitted through
the communication channel and the length of that data, in bytes - len_payload.
• Send the heartbeat response (with len_payload bytes) happily back to the original sender.
The problem is that the OpenSSL implementation never bothered to check that len_payload
is actually correct, and that the request actually has that many bytes of payload. So, a malicious
person could send a heartbeat request indicating a payload length of up to 2^16 (65536), but
actually send a shorter payload.
What happens in this case is that memcpy ends up copying beyond the bounds of the
payload into the response, giving up to 64k of OpenSSL’s memory contents to an attacker.
Countermeasures
• Staying within bounds
• Virus
• A program that can replicate itself and pass on malicious
code to other nonmalicious programs by modifying them
• Worm
• A program that spreads copies of itself through a network
• Trojan horse
• Code that, in addition to its stated effect, has a second,
nonobvious, malicious effect
29
Malware Activation
• One-time execution (implanting)
• Boot sector viruses
• Memory-resident viruses
• Application files
• Code libraries
31
Virus Effects
32
• Detection mechanisms:
• Known string patterns in files or memory
• Execution patterns
• Storage patterns
Virus Signatures
SQL Injection
• Consider excerpt of PHP code from a CGI script shown which takes a name provided as input to the script,
typically from a form field
• It uses this value to construct a request to retrieve the records relating to that name from the database. The
vulnerability in this code is very similar to that in the command injection example.
• The difference is that SQL metacharacters are used, rather than shell metacharacters. If a suitable name is
provided, for example, Bob, then the code works as intended, retrieving the desired record.
• However, an input such as Bob'; drop table suppliers results in the specified record being retrieved, followed by
deletion of the entire table.
• To prevent this type of attack, the input must be validated before use. Any metacharacters must either be
escaped, canceling their effect, or the input rejected entirely.
Code Injection
Common variant where the input includes code that is executed by the attacked system.
Figure 11.4a shows start of a vulnerable PHP calendar script. The flaw results from the use of a variable to construct the name
of a file that is then included into the script. Note that this script was not intended to be called directly. Rather, it is a
component of a larger, multifile program. The main script set the value of the $path variable to refer to the main directory
containing the program and all its code and data files.
Using this variable elsewhere in the program meant that customizing and installing the program required changes to just a few
lines. Unfortunately, attackers do not play by the rules. Just because a script is not supposed to be called directly does not
mean it is not possible. The access protections must be configured in the Web server to block direct access to prevent this.
Otherwise, if direct access to such scripts is combined with two other features of PHP, a serious attack is possible.
The first is that PHP originally assigned the value of any input variable supplied in the HTTP request to global variables with the
same name as the field. This made the task of writing a form handler easier for inexperienced programmers. Unfortunately,
there was no way for the script to limit just which fields it expected. Hence a user could specify values for any desired global
variable and they would be created and passed to the script. In this example, the variable $path is not expected to be a form
field. The second PHP feature concerns the behavior of the include command. Not only could local files be included, but if a
URL is supplied, the included code can be sourced from anywhere on the network. Combine all of these elements, and the
attack may be implemented using a request similar to that shown in Figure 11.4b .
This results in the $path variable containing the URL of a file containing the attacker’s PHP code. It also defines another
variable, $cmd, which tells the attacker’s script what command to run. In this example, the extra command simply lists files in
the current directory. However, it could be any command the Web server has the privilege to run. This specific type of attack
is known as a PHP remote code injection or PHP file inclusion vulnerability.
Cross Site Scripting (XSS) Attacks
Commonly seen in scripted Web
applications XSS reflection vulnerability
Attacks where input • Vulnerability involves the inclusion of script Exploit assumption that all content • Attacker includes the malicious script
provided by one user is code in the HTML content from one site is equally trusted content in data supplied to a site
• Script code may need to access data
subsequently output to associated with other pages
and hence is permitted to interact
another user • Browsers impose security checks and with other content from the site
restrict data access to pages originating
from the same site
Given that the programmer cannot control the content of input data, it is necessary to ensure that such
data conform with any assumptions made about the data before subsequent use.
If the data are textual, these assumptions may be that the data contain only printable characters, have
certain HTML markup, are the name of a person, a userid, an e-mail address, a filename, and/or a URL.
Alternatively, the data might represent an integer value. A program using such input should confirm that it
meets these assumptions. Iinput data should be compared against what is wanted, accepting only valid
input. The alternative is to compare the input data with known dangerous values.
The problem with this approach is that new problems and methods of bypassing existing checks continue to
be discovered. By trying to block known dangerous input data, an attacker using a new encoding may
succeed. By only accepting known safe data, the program is more likely to remain secure.
This type of comparison is commonly done using regular expressions. It may be explicitly coded by the
programmer or may be implicitly included in a supplied input processing routine. A regular expression is a
pattern composed of a sequence of characters that describe allowable input variants.
If the input data fail the comparison, a suitable error message should be sent to the source of the input to
allow it to be corrected and reentered. Alternatively, the data may be altered to conform. This generally
involves escaping metacharacters to remove any special interpretation, thus rendering the input safe.
Alternate Encodings
The issue of alternative encodings of the input data could occur because the data are encoded
in HTML or some other structured encoding that allows multiple representations of characters.
Growing requirement to support users around the globe, the Unicode 16-bit character set is now
widely used. It is the native character set used in the Java language & Windows OS.
Programs, databases, and other applications assume an 8-bit character representation, with the
first 128 values corresponding to ASCII. To accommodate this, a Unicode character can be
encoded as a 1- to 4-byte sequence using the UTF-8 encoding.
Any specific character is supposed to have a unique encoding. However, if the strict limits in the
specification are ignored, common ASCII characters may have multiple encodings. E.g. the
forward slash character “/”, used for directories has hexadecimal value “2F” in ASCII and UTF-8.
Consider the consequences of multiple encodings when validating input. There is a class of attacks
that attempt to supply an absolute pathname for a file to a script that expects only a simple local
filename. The common check to prevent this is to ensure that the supplied filename does not start
with “/” and does not contain any “../” parent directory references.
If this check only assumes the correct, shortest UTF-8 encoding of slash, then an attacker using one
of the longer encodings could avoid this check. It was used against Microsoft’s IIS Web server in
the late 1990s.
Canonicalization - Transforming input data into a single, standard, minimal representation. Once
this is done the data can be compared with a single representation of acceptable input values
Input Fuzzing
• Software testing technique that uses randomly
generated data as inputs to a program
• Range of inputs is very large
• Intent is to determine if the program or function correctly handles
abnormal inputs
• Simple, free of assumptions, cheap
• Assists with reliability as well as security
Privilege escalation
•Exploit of flaws may give attacker
greater privileges
Least privilege
•Run programs with least privilege needed to
complete their function