0% found this document useful (0 votes)
1 views42 pages

3 Programs & Programming.pptx

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views42 pages

3 Programs & Programming.pptx

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

1

Programs and Programming


2

Introduction
• Programs are just strings of 0s and 1s, representing elementary machine
commands such as move one data item, compare two data items, or
branch to a different command.

• The Intel 32- and 64-bit instruction set has about 30 basic primitives (such
as move, compare, branch, increment and decrement, logical
operations, arithmetic operations, trigger I/O, generate and service
interrupts, push, pop, call, and return) and specialized instructions to
improve performance on computations such as floating-point operations
or cryptography.

• Security failures can result from intentional or non-malicious causes; both


can cause harm.

• Program flaws can have two kinds of security implications: They can
cause integrity problems leading to harmful output or action, and they
offer an opportunity for exploitation by a malicious actor.
3

Memory
• Memory is a limited but flexible resource; any memory location can hold any
piece of code or data. To make managing computer memory efficient,
operating systems jam one data element next to another, without regard for
data type, size, content, or purpose.

• Users and programmers seldom know, much less have any need to know,
precisely which memory location a code or data item occupies.

• Computers use a pointer or register known as a program counter that indicates


the next instruction. As long as program flow is sequential, hardware bumps up
the value in the program counter to point just after the current instruction as
part of performing that instruction.

• Instructions and data are all binary strings; only the context of use says a byte,
for example, 0x41 represents the letter A, the number 65, or the instruction to
move the contents of register 1 to the stack pointer.

• If you happen to put the data string “A” in the path of execution, it will be
executed as if it were an instruction.
4

Memory Allocation Data vs. Instructions

Code and data separated, with the heap The same hex value in the same spot in
growing up toward high addresses & the stack memory can either be a meaningful data
growing down from the high addresses. value or a meaningful instruction depending
on whether the computer treats it as code or
Instructions move from the bottom (low data.
addresses) of memory up; left unchecked,
execution would proceed through the local
data area and into the heap and stack.
5

Buffer Overflows
• Occur when data is written beyond the space allocated for it,
such as a 10th byte in a 9-byte array

• In a typical exploitable buffer overflow, an attacker’s inputs


are expected to go into regions of memory allocated for
data, but those inputs are instead allowed to overwrite
memory holding executable code

• The trick for an attacker is finding buffer overflow opportunities


that lead to overwritten memory being executed, and finding
the right code to input

char sample[10];

int i;
This is a very simple buffer overflow.
for (i=0; i<=9; i++) Character B is placed in memory
sample[i] = ‘A’; that wasn’t allocated by or for this
procedure.
sample[10] = ‘B’;
6

Memory Organization
The operating system’s code and
data coexist with a user’s code and
data. The heavy line between
system and user space is only to
indicate a logical separation
between those two areas; in
practice, the distinction is not so
solid.

Remember that every program is


invoked by an operating system
that may run with higher privileges
than those of a regular program.

Thus, if the attacker can gain


control by masquerading as the
operating system, the attacker can
execute commands in a powerful
role.
7

Buffer Overflow
• An intruder may wander into an area called the stack and heap. Subprocedure calls
are handled with a stack, a data structure in which the most recent item inserted is the
next one removed (last arrived, first served).

• This structure works well because procedure calls can be nested, with each return
causing control to transfer back to the immediately preceding routine at its point of
execution.

• Each time a procedure is called, its parameters, the return address (the address
immediately after its call), and other local values are pushed onto a stack. An old stack
pointer is also pushed onto the stack, and a stack pointer register is reloaded with the
address of these new values. Control is then transferred to the sub-procedure.

• As the sub-procedure executes, it fetches parameters that it finds by using the address
pointed to by the stack pointer. Typically, the stack pointer is a register in the processor.
Therefore, by causing an overflow into the stack, the attacker can change either the
old stack pointer (changing the context for the calling procedure) or the return
address (causing control to transfer where the attacker intends when the
sub-procedure returns).

• Changing the context or return address allows the attacker to redirect execution to
code written by the attacker.

• The assailant must experiment a little to determine where the overflow is and how to
control it
8

Where a Buffer Can Overflow


(a) If the extra character overflows into
the user’s data space, it simply overwrites
an existing variable value (or it may be
written into an as-yet unused location),
perhaps affecting the program’s result
but affecting no other program or data.

(b) In the second case, the ‘B’ goes


into the user’s program area. If it
overlays an already executed
instruction (which will not be executed
again), the user should perceive no
effect. If it overlays an instruction that is
not yet executed, the machine will try
to execute an instruction with
operation code 0x42, the internal
code for the character ‘B’. If there is
no instruction with operation code
0x42, the system will halt on an illegal
instruction exception.

A buffer (or array or string) is a space in which data can be held. A buffer resides in memory.

Because memory is finite, a buffer’s capacity is finite. For this reason, in many programming languages the
programmer must declare the buffer’s maximum size so that the compiler can set aside that amount of space.
9

The Stack after Procedure Calls

When procedure A calls procedure B, procedure B gets added to the


stack along with a pointer back to procedure A.

In this way, when procedure B is finished running, it can get popped off
the stack, and procedure A will just continue executing where it left off.
Compromised Stack

Instead of pointing at procedure B in this case, the program counter is


pointing at code that’s been placed on the stack as a result of an
overflow.
Harm from Buffer Overflows
• Overwrite:
• Another piece of your program’s data
• An instruction in your program
• Data or code belonging to another program
• Data or code belonging to the operating system

• Overwriting a program’s instructions gives attackers


that program’s execution privileges

• Overwriting operating system instructions gives


attackers the operating system’s execution privileges

• Also, can be used as a form of DOS


Stack Buffer Overflows - Shellcode
Note in this program that the buffers are both the same size. This is a
quite common practice in C programs.

Standard C IO library has a defined constant BUFSIZ, which is the


default size of the input buffers it uses.

The problem that may result, as it does in this example, occurs when
data are being merged into a buffer that includes the contents of
another buffer, such that the space needed exceeds the space
available.

For the first run, the value read is small enough that the merged
response didn’t corrupt the stack frame.

For the second run, the supplied input was much too large. However,
because a safe input function was used, only 15 characters were read,
as shown in the following line. When this was then merged with the
response string, the result was larger than the space available in the
destination buffer.

It overwrote the saved frame pointer, but not the return address. So the
function returned, as shown by the message printed by the main()
function. But when main() tried to return, because its stack frame had
been corrupted and was now some random value, the program
jumped to an illegal address and crashed.

In this case the combined result was not long enough to reach the
return address, but this would be possible if a larger buffer size had
been used.
Unsafe C Standard Library Routines
This shows that when looking for buffer overflows, all possible places where externally
sourced data are copied or merged have to be located.

Note that these do not even have to be in the code for a particular program, they can
(and indeed do) occur in library routines used by programs, including both standard
libraries and third-party application libraries.

Thus, for both attacker and defender, the scope of possible buffer overflow locations is
very large.

A list of some of the most common unsafe standard C Library routines is given in Table
10.2 . These routines are all suspect and should not be used without checking the total
size of data being transferred in advance, or better still by being replaced with safer
alternatives.

Some Common Unsafe C Standard Library Routines


Shellcode
Code supplied by attacker
•Often saved in buffer being overflowed. Traditionally transferred control to a user command-line interpreter (shell)

Machine code
•Specific to processor and operating system. Traditionally needed good assembly language skills to create

Metasploit Project
•Provides useful information to people who perform penetration and exploit research

There are several generic restrictions on the content of shellcode….it has to


be position independent . That means it cannot contain any absolute
address referring to itself, because the attacker generally cannot determine
in advance exactly where the targeted buffer will be located in the stack
frame of the function in which it is defined.

This means shellcode is specific to a processor architecture, and indeed


usually to an OS, as it needs to run on targeted system and interact with its
system functions.

This is the major reason why buffer overflow attacks are usually targeted at
a specific piece of software running on a specific operating system.

Because shellcode is machine code, writing it traditionally required a good


understanding of the assembly language & operation of targeted system.

However, more recently a number of sites and tools have been developed
that automate this process thus making the development of shellcode
exploits available to a much larger potential audience
Stack Overflow Variants

Target program can be: Shellcode functions


Launch a remote shell when connected to
A trusted system utility
Create a reverse shell that connects back to the hacker

Network service daemon Use local exploits that establish a shell

Flush firewall rules that currently block other attacks

Commonly used library code Break out of a chroot (restricted execution) environment, giving full access to
the system

The targeted program need not be a trusted system utility. Another possible target is a program providing a
network service; that is, a network daemon. A common approach for such programs is listening for connection
requests from clients and then spawning a child process to handle that request. The child process typically has the
network connection mapped to its standard input and output. This means the child program’s code may use the
same type of unsafe input or buffer copy code as we’ve seen already.

This was the case with the stack overflow attack used by the Morris Worm in 1988. It targeted the use of gets() in the
fingerd daemon handling requests for the UNIX finger network service (which provided info system users).

Yet another possible target is a program, or library code, which handles common document formats (e.g., the
library routines used to decode and display GIF or JPEG images). In this case, the input is not from a terminal or
network connection, but from the file being decoded and displayed.

It can be triggered as the file contents are read, with the details encoded in a specially corrupted image. This
attack file would be distributed via e-mail, instant messaging, or as part of a Web page. The shellcode would
typically open a network connection back to a system under the attacker’s control, to return information and
possibly receive additional commands to execute.
Packetstorm

Packet Storm includes a large collection of packaged shellcode, including code that can

• Set up a listening service to launch a remote shell when connected to.


• Create a reverse shell that connects back to the hacker.
• Use local exploits that establish a shell or execve a process.
• Flush firewall rules (such as IPTables and IPChains) that currently block other attacks.
• Break out of a chrooted (restricted execution) environment, giving full access to the system.
17

30 cybersecurity search tools

1. Dehashed—View leaked credentials. 16. URL Scan—Free service to scan and analyse websites.
2. SecurityTrails—Extensive DNS data. 17. Vulners—Search vulnerabilities in a large database.
3. DorkSearch—Really fast Google dorking. 18. WayBackMachine—View content from deleted
4. ExploitDB—Archive of various exploits. websites.
5. ZoomEye—Gather information about targets. 19. Shodan—Search for devices connected to the internet.
6. Pulsedive—Search for threat intelligence. 20. Netlas—Search and monitor internet connected assets.
7. GrayHatWarfare—Search public S3 buckets. 21. CRT sh—Search for certs that have been logged by CT.
8. PolySwarm—Scan files and URLs for threats. 22. Wigle—Database of wireless networks, with statistics.
9. Fofa—Search for various threat intelligence. 23. PublicWWW—Marketing and affiliate marketing
10. LeakIX—Search publicly indexed research.
information. 24. Binary Edge—Scans the internet for threat intelligence.
11. DNSDumpster—Search for DNS records. 25. GreyNoise—Search for devices connected to the
12. FullHunt—Search & discovery attack internet.
surfaces. 26. Hunter—Search for email addresses belonging to a
13. AlienVault—Extensive threat intelligence website.
feed. 27. Censys—Assessing attack surface for connected
14. ONYPHE—Collects cyber-threat devices.
intelligence. 28. IntelligenceX—Search Tor, I2P, leaks, domains, and
15. Grep App—Search a half million git repos. emails.
29. Packet Storm Security—Browse vulnerabilities and
exploits.
30. SearchCode—Search 75 billion lines of code 40m
projects.
Buffer Overflow Defenses
Two broad
defense
approaches

Compile-time Run-time

Aim to harden Aim to detect and


programs to abort attacks in
resist attacks in existing
new programs programs

Broadly classified into two categories:

• Compile-time defenses, which aim to harden programs to resist attacks in new programs

• Run-time defenses, which aim to detect and abort attacks in existing programs

Run-time defenses can be deployed as operating systems and updates and can provide some
protection for existing vulnerable programs.

Compile-time defenses aim to prevent or detect buffer overflows by instrumenting programs when
they are compiled. The possibilities for doing this range from choosing a high-level language that
does not permit buffer overflows, to encouraging safe coding standards, using safe standard
libraries, or including additional code to detect corruption of the stack frame.
Compile-Time Defenses: Programming
Language
• Use a modern
high-level language Disadvantages
• Not vulnerable to buffer •Additional code must be executed at run time to impose checks

overflow attacks •Flexibility and safety comes at a cost in resource use

• Compiler enforces range •Distance from the underlying machine language and architecture means that access
to some instructions and hardware resources is lost
checks and permissible
operations on variables •Limits their usefulness in writing code, such as device drivers, that must interact with
such resources

Compile-Time Defenses: Safe\ Coding Techniques


• C designers placed much more emphasis on space efficiency and
performance considerations than on type safety
• Assumed programmers would exercise due care in writing code
• Programmers need to inspect the code and rewrite any unsafe coding
• An example of this is the OpenBSD project
• Programmers have audited the existing code base, including the
operating system, standard libraries, and common utilities
• This has resulted in what is widely regarded as one of the safest operating systems in widespread use
Compile-Time Defenses: Language Extensions/Safe
Libraries
• Handling dynamically allocated memory is more problematic because the size information is
not available at compile time
• Requires an extension and the use of library routines
• Programs and libraries need to be recompiled
• Likely to have problems with third-party applications

• Concern with C is use of unsafe standard library routines


• One approach has been to replace these with safer variants
• Libsafe is an example
• Library is implemented as a dynamic library arranged to load before the existing standard libraries

Compile-Time Defenses: Stack Protection


• Add function entry and exit code to check stack for signs of corruption

• Use random canary


o Value needs to be unpredictable
o Should be different on different systems

• Stackshield and Return Address Defender (RAD)


o GCC extensions that include additional function entry and exit code
• Function entry writes a copy of the return address to a safe region of memory
• Function exit code checks the return address in the stack frame against the saved copy
• If change is found, aborts the program
Run-Time Defenses: Executable Address Space Protection
Use virtual memory support to make some
Issues
regions of memory non-executable

• Requires support from memory management • Support for executable stack code
unit (MMU) • Special provisions are needed
• Long existed on SPARC / Solaris systems
• Recent on x86 Linux/Unix/Windows systems

Run-Time Defenses: Address Space Randomization


• Manipulate location of key data structures
o Stack, heap, global data - Using random shift for each process
o Large address range on modern systems means wasting some has negligible impact
• Randomize location of heap buffers
• Random location of standard library functions

Run-Time Defenses: Guard Pages


• Place guard pages between critical regions of memory
o Flagged in MMU as illegal addresses
o Any attempted access aborts process
• Further extension places guard pages Between stack frames and heap buffers
o Cost in execution time to support the large number of page mappings necessary
22

Shadow Stack
• In computer security, a shadow stack is a mechanism for protecting a procedure's stored return
address such as from a stack buffer overflow.

• The shadow stack itself is a second, separate stack that "shadows" the program call stack. In the
function prologue, a function stores its return address to both the call stack and the shadow stack.

• In the function epilogue, a function loads the return address from both the call stack and the shadow
stack, and then compares them. If the two records of the return address differ, then an attack is
detected; the typical course of action is simply to terminate the program or alert system
administrators about a possible intrusion attempt.

• A shadow stack is similar to stack canaries in that both mechanisms aim to maintain the control-flow
integrity of the protected program by detecting attacks that tamper the stored return address by an
attacker during an exploitation attempt.

• Shadow stacks can be implemented by recompiling programs with modified prologues and
epilogues, by dynamic binary rewriting techniques to achieve the same effect, or with hardware
support. Unlike the call stack, which also stores local program variables, passed arguments, spilled
registers and other data, the shadow stack typically just stores a second copy of a function's return
address.

• Shadow stacks provide more protection for return addresses than stack canaries, which rely on the
secrecy of the canary value and are vulnerable to non-contiguous write attacks. Shadow stacks
themselves can be protected with guard pages or with information hiding, such that an attacker
would also need to locate the shadow stack to overwrite a return address stored there.
Input Size & Buffer Overflow
• Programmers often make assumptions about the maximum expected size
of input
• Allocated buffer size is not confirmed
• Resulting in buffer overflow

• Testing may not identify vulnerability


• Test inputs are unlikely to include large enough inputs to trigger the overflow

• Safe coding treats all input as dangerous

Interpretation of Program Input


• Program input may be binary or text
o Binary interpretation depends on encoding and is usually application specific

• There is an increasing variety of character sets being used


o Care is needed to identify just which set is being used and what characters are being read

• Failure to validate may result in an exploitable vulnerability

• 2014 Heartbleed OpenSSL bug is a recent example of a failure to check the


validity of a binary input value…….next
Heartbeat Protocol
• A periodic signal generated by hardware or software
to indicate normal operation or to synchronize other
parts of a system
• Typically used to monitor the availability of a protocol
entity
• Runs on top of the TLS Record Protocol
• Use is established during Phase 1 of the Handshake
Protocol
• Each peer indicates whether it supports heartbeats
• Serves two purposes:
• Assures the sender that the recipient is still alive
• Generates activity across the connection during idle periods
Heartbleed
• The Heartbleed bug allows anyone on the Internet to read the memory of the systems
protected by the vulnerable versions of the OpenSSL software.

• OpenSSL is the most popular open-source cryptographic library and TLS


implementation used to encrypt traffic on the Internet.

• The most notable software using OpenSSL are the open-source web servers like
Apache and nginx. The combined market share of just those two out of the active
sites on the Internet was over 66% at the time.

• This compromises the secret keys used to identify service providers & to encrypt
traffic, the names & passwords of users & actual content. This allows attackers to
eavesdrop on communications, steal data directly from the services & to
impersonate services and users.

• OpenSSL was patched quickly but OS vendors and appliance vendors, independent
software vendors have to adopt the fix and notify their users.

• This bug has left large amount of private keys & other secrets exposed on Internet.
THE CODE SEGMENT
The Heartbleed bug is in OpenSSL’s TLS heartbeat implementation.

When a TLS heartbeat is sent, it comes with a couple notable pieces of information:

• Some arbitrary payload data. This is intended to be repeated back to the sender so the
sender can verify the connection is still alive and the right data is being transmitted through
the communication channel and the length of that data, in bytes - len_payload.

The OpenSSL implementation is used to do the following:

• Allocate a heartbeat response, using len_payload as the intended payload size

• memcpy() len_payload bytes from the payload into the response.

• Send the heartbeat response (with len_payload bytes) happily back to the original sender.

The problem is that the OpenSSL implementation never bothered to check that len_payload
is actually correct, and that the request actually has that many bytes of payload. So, a malicious
person could send a heartbeat request indicating a payload length of up to 2^16 (65536), but
actually send a shorter payload.

What happens in this case is that memcpy ends up copying beyond the bounds of the
payload into the response, giving up to 64k of OpenSSL’s memory contents to an attacker.
Countermeasures
• Staying within bounds

• Check lengths before writing


• Confirm that array subscripts are within
limits
• Double-check boundary condition
code for off-by-one errors
• Limit input to the number of
acceptable characters
• Limit programs’ privileges to reduce
potential harm

• Many languages have overflow


protections

• Code analyzers can identify many


overflow vulnerabilities

• Canary values in stack to signal


modification
Malware
• Programs planted by an agent with malicious intent
to cause unanticipated or undesired effects

• Virus
• A program that can replicate itself and pass on malicious
code to other nonmalicious programs by modifying them

• Worm
• A program that spreads copies of itself through a network

• Trojan horse
• Code that, in addition to its stated effect, has a second,
nonobvious, malicious effect
29

Harm from Malicious Code


• Harm to users and systems:
• Sending email to user contacts
• Deleting or encrypting files
• Modifying system information, such as the Windows registry
• Stealing sensitive information, such as passwords
• Attaching to critical system files
• Hide copies of malware in multiple complementary locations

• Harm to the world:


• Some malware has been known to infect millions of systems,
growing at a geometric rate
• Infected systems often become staging areas for new
infections
30

Transmission and Propagation


• Setup and installer program
• Attached file
• Document viruses
• Autorun
• Using non-malicious programs:
• Appended viruses
• Viruses that surround a program

Malware Activation
• One-time execution (implanting)
• Boot sector viruses
• Memory-resident viruses
• Application files
• Code libraries
31

Virus Effects
32

Countermeasures for Users


• Use software acquired from reliable sources
• Test software in an isolated environment
• Only open attachments when you know them to be safe
• Treat every website as potentially harmful
• Create and maintain backups

• Virus scanners look for signs of malicious code infection using


signatures in program files and memory

• Traditional virus scanners have trouble keeping up with new


malware—detect about 45% of infections

• Detection mechanisms:
• Known string patterns in files or memory
• Execution patterns
• Storage patterns
Virus Signatures
SQL Injection
• Consider excerpt of PHP code from a CGI script shown which takes a name provided as input to the script,
typically from a form field

• It uses this value to construct a request to retrieve the records relating to that name from the database. The
vulnerability in this code is very similar to that in the command injection example.

• The difference is that SQL metacharacters are used, rather than shell metacharacters. If a suitable name is
provided, for example, Bob, then the code works as intended, retrieving the desired record.

• However, an input such as Bob'; drop table suppliers results in the specified record being retrieved, followed by
deletion of the entire table.

• To prevent this type of attack, the input must be validated before use. Any metacharacters must either be
escaped, canceling their effect, or the input rejected entirely.
Code Injection
Common variant where the input includes code that is executed by the attacked system.

Figure 11.4a shows start of a vulnerable PHP calendar script. The flaw results from the use of a variable to construct the name
of a file that is then included into the script. Note that this script was not intended to be called directly. Rather, it is a
component of a larger, multifile program. The main script set the value of the $path variable to refer to the main directory
containing the program and all its code and data files.

Using this variable elsewhere in the program meant that customizing and installing the program required changes to just a few
lines. Unfortunately, attackers do not play by the rules. Just because a script is not supposed to be called directly does not
mean it is not possible. The access protections must be configured in the Web server to block direct access to prevent this.
Otherwise, if direct access to such scripts is combined with two other features of PHP, a serious attack is possible.

The first is that PHP originally assigned the value of any input variable supplied in the HTTP request to global variables with the
same name as the field. This made the task of writing a form handler easier for inexperienced programmers. Unfortunately,
there was no way for the script to limit just which fields it expected. Hence a user could specify values for any desired global
variable and they would be created and passed to the script. In this example, the variable $path is not expected to be a form
field. The second PHP feature concerns the behavior of the include command. Not only could local files be included, but if a
URL is supplied, the included code can be sourced from anywhere on the network. Combine all of these elements, and the
attack may be implemented using a request similar to that shown in Figure 11.4b .

This results in the $path variable containing the URL of a file containing the attacker’s PHP code. It also defines another
variable, $cmd, which tells the attacker’s script what command to run. In this example, the extra command simply lists files in
the current directory. However, it could be any command the Web server has the privilege to run. This specific type of attack
is known as a PHP remote code injection or PHP file inclusion vulnerability.
Cross Site Scripting (XSS) Attacks
Commonly seen in scripted Web
applications XSS reflection vulnerability
Attacks where input • Vulnerability involves the inclusion of script Exploit assumption that all content • Attacker includes the malicious script
provided by one user is code in the HTML content from one site is equally trusted content in data supplied to a site
• Script code may need to access data
subsequently output to associated with other pages
and hence is permitted to interact
another user • Browsers impose security checks and with other content from the site
restrict data access to pages originating
from the same site

Cross-site scripting attacks exploit this assumption and


attempt to bypass the browser’s security checks to gain
elevated access privileges to sensitive data belonging to
another site. These data can include page contents,
session cookies & other objects.

The attacker includes the malicious script content in data


supplied to a site. If this content is subsequently displayed
to other users without sufficient checking, they will execute
the script assuming it is trusted to access any data
associated with that site.

Consider the widespread use of guestbook allowing


comments, which are subsequently viewed by other users.

Unless the contents of these comments are checked and


any dangerous code removed, the attack is possible.
Validating Input Syntax
It is necessary to
By only accepting
ensure that data Alternative is to
Input data should be known safe data the
conform with any compare the input
compared against program is more
assumptions made data with known
what is wanted likely to remain
about the data before dangerous values
secure
subsequent use

Given that the programmer cannot control the content of input data, it is necessary to ensure that such
data conform with any assumptions made about the data before subsequent use.

If the data are textual, these assumptions may be that the data contain only printable characters, have
certain HTML markup, are the name of a person, a userid, an e-mail address, a filename, and/or a URL.

Alternatively, the data might represent an integer value. A program using such input should confirm that it
meets these assumptions. Iinput data should be compared against what is wanted, accepting only valid
input. The alternative is to compare the input data with known dangerous values.

The problem with this approach is that new problems and methods of bypassing existing checks continue to
be discovered. By trying to block known dangerous input data, an attacker using a new encoding may
succeed. By only accepting known safe data, the program is more likely to remain secure.

This type of comparison is commonly done using regular expressions. It may be explicitly coded by the
programmer or may be implicitly included in a supplied input processing routine. A regular expression is a
pattern composed of a sequence of characters that describe allowable input variants.

If the input data fail the comparison, a suitable error message should be sent to the source of the input to
allow it to be corrected and reentered. Alternatively, the data may be altered to conform. This generally
involves escaping metacharacters to remove any special interpretation, thus rendering the input safe.
Alternate Encodings
The issue of alternative encodings of the input data could occur because the data are encoded
in HTML or some other structured encoding that allows multiple representations of characters.

Growing requirement to support users around the globe, the Unicode 16-bit character set is now
widely used. It is the native character set used in the Java language & Windows OS.

Programs, databases, and other applications assume an 8-bit character representation, with the
first 128 values corresponding to ASCII. To accommodate this, a Unicode character can be
encoded as a 1- to 4-byte sequence using the UTF-8 encoding.

Any specific character is supposed to have a unique encoding. However, if the strict limits in the
specification are ignored, common ASCII characters may have multiple encodings. E.g. the
forward slash character “/”, used for directories has hexadecimal value “2F” in ASCII and UTF-8.

Consider the consequences of multiple encodings when validating input. There is a class of attacks
that attempt to supply an absolute pathname for a file to a script that expects only a simple local
filename. The common check to prevent this is to ensure that the supplied filename does not start
with “/” and does not contain any “../” parent directory references.

If this check only assumes the correct, shortest UTF-8 encoding of slash, then an attacker using one
of the longer encodings could avoid this check. It was used against Microsoft’s IIS Web server in
the late 1990s.

Canonicalization - Transforming input data into a single, standard, minimal representation. Once
this is done the data can be compared with a single representation of acceptable input values
Input Fuzzing
• Software testing technique that uses randomly
generated data as inputs to a program
• Range of inputs is very large
• Intent is to determine if the program or function correctly handles
abnormal inputs
• Simple, free of assumptions, cheap
• Assists with reliability as well as security

• Can also use templates to generate classes of known


problem inputs
• Disadvantage is that bugs triggered by other forms of input would
be missed
• Combination of approaches is needed for reasonably
comprehensive coverage of the inputs
Use of Least Privilege

Privilege escalation
•Exploit of flaws may give attacker
greater privileges

Least privilege
•Run programs with least privilege needed to
complete their function

Determine appropriate user and


group privileges required
•Decide whether to grant extra user or
just group privileges

Ensure that privileged program can


modify only those files and
directories necessary
Today’s Lab – Web Application
Vulnerabilities and Passwords
Summary
• Buffer overflow attacks can take advantage of the fact that
code and data are stored in the same memory in order to
maliciously modify executing programs

• Programs can have a number of other types of vulnerabilities,


including off-by-one errors, incomplete mediation, and race
conditions

• Malware can have a variety of harmful effects depending on its


characteristics, including resource usage, infection vector, and
payload

• Developers can use a variety of techniques for writing and


testing code for security

You might also like