Unit V
Unit V
Unit V
Email (SMTP, MIME, IMAP, POP3) – HTTP – DNS- SNMP – Telnet – FTP – Security – PGP -
SSH
SMTP: Simple Mail Transfer Protocol is used to exchange electronic mail.
MIME (Multipurpose Internet Mail Extensions) define the format of email messages.
Internet message access protocol (IMAP) is one of the two most prevalent Internet standard protocols for
e-mail retrieval, the other being the Post Office Protocol (POP).
RFC-822: The internet standard format for electronic mail message headers.
Email works as
1. To distinguish the user interface (i.e., your mail reader) from the underlying message transfer
protocol (in this case, SMTP), and.
2. To distinguish between this transfer protocol and a companion protocol (RFC 822 and MIME) that
defines the format of the messages being exchanged.
Message Format
RFC 822 defines messages to have two parts: a header and a body. Both parts are represented in ASCII
text.
The message header is a series of <CRLF>-terminated lines. (<CRLF> stands for carriage-return + line-
feed, which are a pair of ASCII control characters often used to indicate the end of a line of text.)
Many of these header lines are familiar to users since they are asked to fill them out when they compose
an email message.
For example, the To: header identifies the message recipient, and the Subject: header says something
about the purpose of the message. Other headers are filled in by the underlying mail delivery system.
Examples include Date: (when the message was transmitted), From: (what user sent the message), and
Received: (each mail server that handled this message).
RFC 822 was extended in 1993 (and updated again in 1996) to allow email messages to carry many
different types of data: audio, video, images,Word documents,and so on.
The first piece is a collection of header lines that augment the original set defined by RFC 822.
These header lines describe, in various ways, the data being carried in the message body.
They include
The second piece is definitions for a set of content types (and subtypes).
For example, MIME defines two different still image types, denoted image/gif and image/jpeg,
each with the obvious meaning.
As another example, text/plain refers to simple text you might find in a vanilla 822-style message,
while text/richtext denotes a message that contain ―marked upǁ text (e.g., text using special fonts,
italics, etc.).
As a third example, MIME defines an application type, where the subtypes correspond to the output of
different application programs (e.g. application/postscript and application/msword).
MIME also defines a multipart type that says how a message carrying more than one data type is
structured. This is like a programming language that defines both base types (e.g., integers and
floats) and compound types (e.g., structures and arrays).
5.1.1 MIME
MIME defines five headers that can be added to the original e-mail header section to define
thetransformation parameters.
1. MIME-Version
2. Content-Type
3. Content-Transfer-Encoding
4. Content-Id
5. Content-Description
MIME header
MIME-Version This header defines the version of MIME used. The current version is 1.1.
Content-Type This header defines the type of data used in the body of the message. The content type
and the content subtype are separated by a slash. Depending on the subtype, the header may contain other
parameters.
Content-Id This header uniquely identifies the whole message in a multiple-message environment.
Content-Description This header defines whether the body is image, audio, or video.
Content-Description: <description>
5.1.2 SMTP
Message Transfer
SMTP—the protocol used to transfer messages from one host to another.
First, users interact with a mail reader when they compose, file, search, and read their email.
Most Web browsers now include a mail reader.
Second, there is a mail daemon (or process) running on each host.
this process as playing the role of a post office:
Mail readers give the daemon messages they want to send to other users, the daemon uses SMTP
running over TCP to transmit the message to a daemon running on another machine, and the daemon
puts incoming messages into the user’s mailbox (where that user’s mail reader can later find it).
While it is certainly possible that the sendmail program on a sender’s machine establishes an
SMTP/TCP connection to the sendmail program on the recipient’s machine, in many cases the mail
traverses one or more mail gateways on its route from the sender’s host to the receiver’s host.
Like the end hosts, these gateways also run a sendmail process. It’s not an accident that these
intermediate nodes are called ―gatewaysǁ since their job is to store and forward email messages,
much like an ―IP gatewayǁ (which we have referred to as a router) stores and forwards IP datagrams.
Sequence of mail gateways store and forward email messages
The forwarding gateway maintains a database that maps users into the machine on which they
currently want to receive their mail; the sender need not be aware of this specific name.
Another reason is that the recipient’s machine may not always be up, in which case the mail
gateway holds the message until it can be delivered.
Each SMTP session involves a dialog between the two mail daemons, with one acting as the client
and the other acting as the server. Multiple messages might be transferred between the two hosts during
a single session.
Mail Reader
The final step is for the user to actually retrieve his or her messages from the mailbox read them,
reply to them, and possibly save a copy for future reference. The user performs all these actions by
interacting with .a mail reader.
In many cases, this reader is just a program running on the same machine as the user’s mailbox resides,
in which case it simply reads and writes the file that implements the mailbox.
In other cases, the user accesses his or her mailbox from a remote machine using yet another
protocol, such as the Post Office Protocol (POP) or the Internet Message Access Protocol (IMAP).
The actual mail transfer is done through message transfer agents. To send mail, a system must have the
client MTA, and to receive mail, a system must have a server MTA.
The formal protocol that defines the MTA client and server in the Internet is called the Simple Mail
Transfer Protocol (SMTP).
SMTP range
SMTP simply defines how commands and responses must be sent back and forth
Commands and Responses
SMTP uses commands and responses to transfer messages between an MTA client and an MTA server.
Each command or reply is terminated by a two-character (carriage return and line feed) end-of-line
token.
SMTP defines 14 commands. The first five are mandatory; every implementation must support these
five commands. The next three are often used and highly recommended.
Responses: Responses are sent from the server to the client. A response is a three digit code that may be
followed by additional textual information.
Mail Transfer Phases The process of transferring a mail message occurs in three phases: connection
establishment, mail transfer, and connection termination.
5.1.3 MESSAGE ACCESS AGENT: POP AND IMAP
They are called a pull protocol; the client must pull messages from the server.
The direction of the bulk data is from the server to the client. The third stage uses a message access
agent
Currently two message access protocols are available: Post Office Protocol, version 3 (POP3) and
Internet Mail Access Protocol, version 4 (IMAP4).
IMAP
IMAP is similar to SMTP in many ways.
It is a client/server protocol running over TCP, where the client (running on user’s desktop
machine) issues commands in the form of <CRLF>-terminated ASCII text lines and the mail server
(running on the machine that maintains the user’s mailbox) responds in kind.
In this diagram, LOGIN, AUTHENTICATE, SELECT, EXAMINE, CLOSE, and LOGOUT are example
commands that the client can issue, while OK is one possible server response.
Other common commands include FETCH, STORE, DELETE, and EXPUNGE, with the obvious
meanings. Additional server responses include NO (client does not have permission to perform that
operation) and BAD (command is ill formed).
When the user asks to FETCH a message, the server returns it in MIME format and the mail reader
decodes it. In addition to the message itself, IMAP also defines a set of message attributes that are
exchanged as part of other commands, independent of transferring the message itself.
Message attributes include information like the size of the message, but more interestingly, various flags
associated with the message (e.g., Seen, Answered, Deleted, and Recent).
• These flags are used to keep the client and server synchronized; that is, when the user deletes a message
in the mail reader, the client needs to report this fact to the mail server.
• Later, should the user decide to expunge all deleted messages, the client issues an EXPUNGE command
to the server,which knows to actually remove all earlier deleted messages from the mailbox.
IMAP4
Another mail access protocol is Internet Mail Access Protocol, version 4 (IMAP4).
IMAP4 is similar to POP3, but it has more features; IMAP4 is more powerful and more complex.
POP3 is deficient in several ways. It does not allow the user to organize her mail on the server; the
user cannot have different folders on the server. (Of course, the user can create folders on her own
computer.).
In addition, POP3 does not allow the user to partially check the contents of the mail before
downloading.
A user can search the contents of the e-mail for a specific string of characters prior to downloading.
A user can partially download e-mail. This is especially useful if bandwidth is limited and the e-mail
contains multimedia with high bandwidth requirements.
POP3
Mail access starts with the client when the user needs to download e-mail from the mailbox on the
mail server.
It then sends its user name and password to access the mailbox.
The user can then list and retrieve the mail messages, one by one.
POP3 has two modes: the delete mode and the keep mode.
In the delete mode, the mail is deleted from the mailbox after each retrieval.
In the keep mode, the mail remains in the mailbox after retrieval. The delete mode is normally used
when the user is working at her permanent computer and can save and organize the received mail after
reading or replying.
The keep mode is normally used when the user accesses her mail away from her primary computer (e.g., a
laptop). The mail is read but kept in the system for later retrieval and organizing.
Most files on the Web contain images and text, and some have audio and video clips.
They also include URLs that point to other files, and your Web browser will have some way in which
you can recognize URLs and ask the browser to open them. These embedded URLs are called hypertext
links.
When you select to view a page, your browser (the client) fetches the page from the server using HTTP
running over TCP. LikeSMTP,
Instead of giving the full set of possible header types, though, we just give a handful of representative
examples.
Finally, after the blank line comes the contents of the requested
message (MESSAGE BODY); this part of the message is typically empty for request messages.
Request Messages
The first line of an HTTP request message specifies three things: the operation to be performed, the
Web page the operation should be performed on, and the version of HTTP being used.
The former is obviously used when your browser wants to retrieve and display a Web page.
The latter is used to test the validity of a hypertext link or to see if a particular page has been modified
since the browser last fetched it.
Host: www.cs.princeton.edu
Response Messages
Like request messages, response messages begin with a single START LINE.
In this case, the line specifies the version of HTTP being used, a three-digit code indicating whether or
not the request was successful, and a text string giving the reason for the response.
TCP Connections
The original version of HTTP (1.0) established a separate TCP connection for each data item retrieved
from the server.
It’s not too hard to see how this was a very inefficient mechanism: Connection setup and teardown
messages had to be exchanged between the client and server even if all the client wanted to do was verify
that it had the most recent copy of a page.
Thus, retrieving a page that included some text and a dozen icons or other small graphics would result
in 13 separate TCP connections being established and closed.
The most important improvement in the latest version of HTTP (1.1) is to allow persistent connections
—the client and server can exchange mult iple request/response messages over the same TCP connection.
First, they obviously eliminate the connection setup overhead, thereby reducing the load on the server,
the load on the network caused by the additional TCP packets, and the delay perceived by the user.
Second, because a client can send multiple request messages down a single TCP connection, TCP’s
congestion window mechanism is able to operate more efficiently.
Caching
From the server’s perspective, having a cache intercept and satisfy a request reduces the load on the
server.
Caching can be implemented in many different places:
a user’s browser can cache recently accessed pages, and simply display the cached copy if the user
visits the same page again.
The machine is caching pages on behalf of the site, and they configure their browsers to connect
directly to the caching host. This node is sometimes called a proxy.
ISP router
This router can peek inside the request message and look at the URL for the requested page. If it has the
page in its cache, it returns it.
If not, it forwards the request to the server and watches for the response to fly by in the other direction.
When it does, the router saves a copy in the hope that it can use it to satisfy a future request.
cache needs to make sure it is not responding with an out-of-date version of the page.
the server assigns an expiration date (the Expires header field) to each page it sends back to the client
(or to a cache between the server and client).
The cache remembers this date and knows that it need not re verify the page each time it is requested
until after that expiration date has passed.
After that time (or if that header field is not set) the cache can use the HEAD or conditional GET
operation (GET with If-Modified-Since header line) to verify that it has the most recent copy of the page.
More generally, there is a set of ―cache directives that must be obeyed by all caching mechanisms
along the request/response chain.
These directives specify whether or not a document can be cached, how long it can be cached, how
fresh a document must be,and so on.
Naming service- can be developed to map user-friendly names into router-friendly addresses.
Name services are sometimes called middleware because they fill a gap between applications and the
underlying network.
Host names differ from host addresses in two important ways. First, they are usually of variable length
and mnemonic, thereby making them easier for humans to remember.
(In contrast, fixed-length numeric addresses are easier for routers to process.)
Second, names typically contain no information that helps the network locate (route packets toward)
the host. Addresses, in contrast, sometimes have routing information embedded in them; flat addresses
(those not divisible into component parts) are the exception.
A name space can be either flat (names are not divisible into components) or hierarchi-cal (Unix file
names are the obvious example). Second, the naming system maintains a collection of bindings of names to
values.
The value can be anything we want the naming system to return when presented with a name; in many
cases it is an address.
a resolution mechanism is a procedure that, when invoked with a name, returns the corresponding
value. A name server is a specific implementation of a resolution mechanism that is available on a network
and that can be queried by sending it a message.
The Internet has a particularly well-developed naming system—the domain name system (DNS).
Because of its large size, the Internet has a particularly well-developed naming system in place— the
domain name system (DNS).
Early in its history, when there were only a few hundred hosts on the Internet, a central authority called
the Network Information Center (NIC) maintained a flat table of name-to-address bindings; this table was
called hosts.txt.
Whenever a site wanted to add a new host to the Internet, the site administrator sent email to the NIC
giving the new host’s name/address pair.
This information was manually entered into the table, the modified table was mailed out to the various
sites every few days, and the system administrator at each site installed the table on every host at the site.
It should come as no surprise that the hosts.txt approach to naming did not work well as the number of
hosts in the Internet started to grow.
DNS employs a hierarchical name space rather than a flat name space, and the table of bindings that
implements this name space is partitioned into disjoint pieces and distributed throughout the Internet.
Names translated into addresses, where the numbers 1–5 show the se-quence of steps in the
process.
Domain Hierarchy
DNS names are processed from right to left and use periods as the separator.
the DNS hierarchy can be visualized as a tree, where each node in the tree corresponds to a domain and
the leaves in the tree correspond to the hosts being named.
There ―big six domains for each country : edu, com, gov, mil, org, and net.
The first step is to partition the hierarchy into subtrees called zones.
Each zone can be thought of as corresponding to some administrative authority that is responsible for
that portion of the hierarchy.
Specifically, the information contained in each zone is implemented in two or more name servers.
Each name server, in turn, is a program that can be accessed over the Internet.
Clients send queries to name servers, and name servers respond with the requested information.
For example, the top level of the hierarchy forms a zone that is managed by the NIC.
Sometimes the response contains the final answer that the client wants, and sometimes theresponse
contains a pointer to another server that the client should query next.
Hierarchy of name servers
Each name server implements the zone information as a collection of resource records. In essence, a
resource record is a name-to-value binding, or more specifically, a 5-tuple that contains the following
fields:
Name, Value, Type, Class, TTL
The Name and Value fields are exactly what you would expect, while the Type field specifies how the
Value should be interpreted.
For example, Type = A indicates that the Value is an IP address. Thus, A records implement the name-
to-address mapping we have been assuming.
NS: The Value field gives the domain name for a host that is running a name server that knows how to
resolve names within the specified domain.
CNAME: The Value field gives the canonical name for a particular host; it is used to define aliases.
MX: The Value field gives the domain name for a host that is running a mail server that accepts
messages for the specified domain.
The Class field was included to allow entities other than the NIC to define useful record types.
To date, the only widely used Class is the one used by the Internet; it is denoted IN. Finally, the TTL
field shows how long this resource record is valid.
It is used by servers that cache resource records from other servers; when the TTL expires, the server
must evict the record from its cache.
First, the root name server contains an NS record for each second-level server. It also has an A record
that translates this name into the corresponding IP address.
Taken together, these two records effectively implement a pointer from the root name server to each of
the second-level servers.
Note that some of these records give the final answer (e.g., the address for host
saturn.physics.princeton.edu), while others point to third-level name servers.
Name Resolution
Resolving a name actually involves a client querying the local server, which in turn acts as a client
that queries the remote servers on the original client’s behalf.
Name resolution in practice, where the numbers 1–8 show the sequence of steps in the process.
Advantages
All the hosts in the Internet do not have to be kept up-to-date on where the current root servers are
located; only the servers have to know about the root.
The local server gets to see the answers that come back from queries that are posted by all the local
clients. The local server caches these responses and is sometimes able to resolve future queries
without having to go out over the network.
5.4 SNMP
This means we need a protocol that allows us to read, and possibly write, various pieces of state
information on different network nodes. The most widely used protocol for this purpose is the Simple
Network Management Protocol (SNMP).
An SNMP server running on that node receives the request, locates the appropriate piece of information,
and returns it to the client program, which then displays it to the user.
There is only one complication to this , Exactly how does the client indicate which piece of information
it wants to retrieve, and likewise, how does the server know which variable in memory to read to satisfy the
request.
SNMP depends on a companion specification called the management information base (MIB).
The MIB defines the specific pieces of information— the MIB variables—that you can retrieve from a
network node .
The current version of MIB, called MIB-II, organizes variables into 10 different groups.
System: general parameters of the system (node) as a whole, including where t he node is located, how
long it has been up, and the system’s name.
Interfaces: information about all the network interfaces (adaptors) attached to this node, such as the
physical address of each interface, how many packets have been sent and received on each interface.
Address translation: information about the Address Resolution Protocol (ARP), and in particular, the
contents of its address translation table.
IP: variables related to IP, including its routing table, how many datagrams it has successfully
forwarded, and statistics about datagram reassembly.
Includes counts of how many times IP drops a datagram for one reason or another.
TCP: information about TCP connections, such as the number of passive and active opens, the number
of resets, the number of timeouts, default timeout settings, and so on.
Per-connection information persists only as long as the connection exists.
UDP: information about UDP traffic, including the total number of UDP datagrams that have been sent
and received.
First, we need a precise syntax for the client to use to state which of the MIB variables it wants to fetch.
Second, we need a precise representation for the values returned by the server. Both problems are
addressed using ASN.1.
The MIB uses this identification system to assign a globally unique identifier to each MIB variable
These identifiers are given in a ―dotǁ notation, not unlike domain names.
For example, 1.3.6.1.2.1.4.3 is the unique ASN.1 identifier for the IP-related MIB variable
ipInReceives; this variable counts the number of IP datagrams that have been received by this node. Thus,
network management works as follows. The SNMP client puts the ASN.1 identifier for the MIB variable it
wants to get into the request message, and it sends this message to the server.
Abstract Syntax Notation One (ASN.1) is an ISO standard that defines, among other things, a
representation for data sent over a network. The representation-specific part of ASN.1 is called the Basic
Encoding Rules (BER).
The server then maps this identifier into a local variable (i.e., into a memory location where the value
for this variable is stored), retrieves the current value held in this variable, and uses ASN.1 BER to encode
the value it sends back to the client.
5.5 TELNET
It is the standard TCP/IP protocol for virtual terminal service as proposed by the International
Organization for Standards (ISO).
Timesharing Environment
TELNET was designed at a time when most operating systems, such as UNIX, were operating in a
timesharing environment.
In such an environment, a large computer supports multiple users. The interaction between a user and
the computer occurs through a terminal, which is usually a combination of keyboard, monitor, and mouse.
Logging
In a timesharing environment, users are part of the system with some right to access resources.
Each authorized user has an identification and probably,a password. The user identification defines the
user as part of the system.
To access the system, the user logs into the system with a user id or log-in name.
The system also includes password checking to prevent an unauthorized user from accessing the
resources.
LOCAL LOG IN
REMOTE LOG IN
LOCAL LOG IN
When a user logs into a local timesharing system, it is called local log-in. As a user types at a terminal
or at a workstation running a terminal emulator, the keystrokes are accepted by the terminal driver.
The terminal driver passes the characters to the operating system. The operating system, in turn,
interprets the combination of characters and invokes the desired application program or utility .
REMOTE LOG IN
When a user wants to access an application program or utility located on a remote machine, then it is
called remote log-in.
Here the TELNET client and server programs come into use. The user sends the keystrokes to the
terminal driver, where the local operating system accepts the characters but does not interpret them.
The characters are sent to the TELNET client, which transforms the characters to a universal character
set called network virtual terminal (NVT) characters and delivers them to the local TCP/IP protocol stack.
The commands or text, in NVT form, travel through the Internet and arrive at the TCP/IP stack at the
remote machine.
Here the characters are delivered to the operating system and passed to the TELNET server, which
changes the characters to the corresponding characters understandable by the remote computer.
solution is to add a piece of software called a pseudo terminal driver which pretends that the characters
are coming from a terminal.
The operating system then passes the characters to the appropriate application program.
Network Virtual Terminal
We are dealing with heterogeneous systems.
If we want to access any remote computer in the world, we must first know what type of computer we
will be connected to, and we must also install the specific terminal emulator used by that computer.
TELNET solves this problem by defining a universal interface called the network virtual terminal
(NVT) character set.
(data or commands) that come from the local terminal into NVT form and delivers them to the network.
The server TELNET, on the other hand, translates data and commands from NVT form into the form
acceptable by the remote computer.
Concept of NVT
Both are 8-bit bytes. For data, NVT is an 8-bit character set in which the 7 lowest-order bits are the
same as ASCII and the highest-order bit is O.
To send control characters between computers (from client to server or vice versa), NVT uses an 8-bit
character set in which the highest-order bit is set to 1.
Embedding
TELNET uses only one TCP connection. The server uses the well-known port 23, and the client uses an
ephemeral port. The same connection is used for sending both data and control characters.
TELNET accomplishes this by embedding the control characters in the data stream. However, to distinguish
data from control characters, each sequence of control characters is preceded by a special control character
called interpret as control (lAC).
Options
TELNET lets the client and server negotiate options before or during the use of the service.
Options are extra features available to a user with a more sophisticated terminal.
Mode of Operation
TELNET operate in one of three modes: default mode, character mode, or line mode.
Default Mode The default mode is used if no other modes are invoked through option negotiation. In
this mode, the echoing is done by the client. The user types a character, and the client echoes the character
on the screen (or printer) but does not send it until a whole line is completed.
Character Mode In the character mode, each character typed is sent by the client to the server. The
server normally echoes the character back to be displayed on the client screen.
Line Mode A new mode has been proposed to compensate for the deficiencies of the default mode and the
character mode. In this mode, called the line mode, line editing (echoing, character erasing, line erasing, and so
on) is done by the client. The client then sends the whole line to the server.
5.6 FILE TRANSFER PROTOCOL(FTP):
Transferring files from one computer to another is one of the most common tasks expected from a
networking or internetworking environment. As a matter of fact, the greatest volume of data exchange in the
Internet today is due to file transfer. In this section, we discuss one popular protocol involved in transferring
files: File Transfer Protocol (FTP).
File Transfer Protocol (FTP)
File Transfer Protocol (FTP) is the standard mechanism provided by TCP/IP for copying a file from one
host to another. Although transferring files from one system to another seems simple and straightforward,
some problems must be dealt with first. For example, two systems may use different file name conventions.
Two systems may have different ways to represent text and data. Two systems may have different directory
structures. All these problems have been solved by FTP in a very simple and elegant approach.
FTP differs from other client/server applications in that it establishes two connections between the hosts.
One connection is used for data transfer, the other for control information (commands and responses).
Separation of commands and data transfer makes FTP more efficient. The control connection uses very
simple rules of communication. Wc need to transfer only a line of command or a line of response at a time.
The data connection, on the other hand, needs more complex rules due to the variety of data types
transferred. However, the difference in complexity is at the FTP level, not TCP. For TCP, both connections
are treated the same.
FTP uses two well-known TCP ports: Port 21 is used for the control connection,and port 20 is used for the
data connection.
Figure 26.21 shows the basic model of FTP. The client has three components: user interface, client control
process, and the client data transfer process. The server has two components: the server control process and
the server data transfer process. The control connection is made between the control processes. The data
connection is made between the data transfer processes.
The control connection remains connected during the entire interactive FTP session. The data connection is
opened and then closed for each file transferred. It opens each time commands that involve transferring files
are used, and it closes when the file is transferred. In other words, when a user starts an FTP session, the
control connection opens.
While the control connection is open, the data connection can be opened and closed multiple times if
several files are transferred.
Communication over Control Connection
Communication over Data Connection
ftp>quit
221 Goodbye.
1. After the control connection is created, the FIP server sends the 220 (service ready) response on the
control connection.
2. The client sends its name.
3. The server responds with 331 (user name is OK, password is required).
4. The client sends the password (not shown).
5. The server responds with 230 (user log-in is OK).
6. The client sends the list command Os reports) to find the list of files on the directory named report.
7. Now the server responds with 150 and opens the data connection.
8. The server then sends the list of the files or directories (as a file) on the data connection. When the
whole list (file) is sent, the server responds with 226 (closing data connection) over the control
connection.
9. The client now has two choices. It can use the QUIT command to request the closing of the control
connection, or it can send another command to start another activity (and eventually open another
data connection). In our example, the client sends a QUIT command.
10. After receiving the QUIT command, the server responds with 221 (service closing) and then
closes the control connection.
11. Anonymous FTP
5.7 NETWORK SECURITY
The sender applies an encryption function to the original plaintext message, the resulting ciphertext
message is sent over the network.
The receiver applies a reverse function (called decryption) to recover the original plaintext.
The encryption/decryption process generally depends on a secret key shared between the sender and the
receiver.
This familiar use of cryptography is designed to ensure privacy—preventing the unauthorized release of
information.
authentication (verifying the identity of the remote participant) and integrity (making sure that the
message has not been altered).
Three most common cryptographic algorithms: Data Encryption Standard (DES); Rivest, Shamir, and
Adleman (RSA); and Message Digest 5 (MD5).
Cryptographic Algorithms
there are three types of cryptographic algorithms
secret key algorithms,
public key algorithms,
and hashing algorithms
Secret key algorithms are symmetric in the sense that both participants in the communication share a
single key.
DES (Data Encryption Standard) is the best-known example of a secret key encryption function, while
IDEA (International Data Encryption Algorithm) is another.
Secret key encryption
In contrast to a pair of participants sharing a single secret key, public key cryptography involves each
participant having a private key that is shared with no one else and a public key that is published so
everyone knows it.
To send a secure message to this participant, you encrypt the message using the widely known public
key.
The participant then decrypts the message using his or her private key.
Rivest, Shamir, and Adleman(RSA)—is the best-known public key encryption algorithm.
The third type of cryptography algorithm is called a hash or message digest function.
the idea is to map a potentially large message into a small fixed-length number, analogous to the way a
regular hash function maps values from a large space into values from a small space.
The best way to think of a cryptographic hash function is that it computes a cryp-tographic checksum
over a message. That is, just as a regular checksum protects the receiver from accidental changes to the
message, a cryptographic checksum protects the receiver from malicious changes to the message.
The most widely used cryptographic checksum algorithm is Message Digest version 5 (MD5).
Taxonomy of network security
Requirements
The basic requirement for an encryption algorithm is that it be able to turn plaintext into
ciphertext in such a way that only the intended recipient—the holder of the decryption key—can recover
the plaintext.
The key actually contains only 56 usable bits—the last bit of each of the 8 bytes in the key is a
parity bit for that byte.
DES has three distinct phases
1. The 64 bits in the block are permuted (shuffled).
2. Sixteen rounds of an identical operation are applied to the resulting data and the key.
3. The inverse of the original permutation is applied to the result.
During each round, the 64-bit block is broken into two 32-bit halves, and a different 48 bits are selected
from the 56-bit key. If we denote the left and right halves of the block at round i as Li and Ri , respectively,
and the 48-bit keyat round i as Ki .
then these three pieces are combined during round i according to the following rule:
Li = Ri−1
Ri = Li−1 ⊕ F (Ri−1, Ki )
where F is a combiner function described below and ⊕ is the exclusive-OR (XOR) operator.
Key generation
Note that every eighth bit is ignored (i.e., bit 64 is missing from the table), reducing the key from 64
bits to 56 bits. Then for each round, the current 56 bits are divided into two 28-bit halves and each half is
independently rotated left either one or two bit positions, depending on the round.
We now need to define function F and show how each Ki is derived from the n56-bit key. We start with
the key. Initially, the 56-bit key is permuted according to the following table.
Note that every eighth bit is ignored (i.e., bit 64 is missing from the table), reducing the key from 64
bits to 56 bits. Then for each round, the current 56 bits are divided into two 28-bit halves and each half is
independently rotated left either one or two bit positions, depending on the round.
DES key rotation amount per round.
DES compression
DES compression permutation
The 56 bits that result from this shift are used both a input for the next round (i.e., the preceding shift is
repeated) and to select the 48 bits that make up the key for the current round.
First, function F expands R from 32 bits into 48 bits so that it can be combined with the 48-bit K. It does
this by breaking R into eight 4-bit chunks and expanding each chunk into 6 bits by stealing the rightmost
and leftmost bit from the left and right adjacent 4-bit chunks, respectively.
Next, the 48-bitK is divided into eight 6-bit chunks, and each chunk is XORedwith thecorresponding
chunk that resulted from the previous expansion of R. Finally, each resulting 6-bit value is fed through
something called a substitution box (S box), which reduces each 6-bit chunk back into 4 bits.
Expansion phase of DES.
Notice that the preceding description does not distinguish between encryption and decryption. One of
the nice features of DES is that both sides of the algorithm work exactly the same.
The only difference is that the keys are applied in the reverse order, that is, K16, K15, . . . , K1.
encrypt a longer message using DES, a technique known as cipher block chaining (CBC) is typically
used.
The idea of CBC is simple: The ciphertext for block i is XORed with the plaintext for block i +1 before
running it through DES.
An initialization vector (IV) is used in lieu of the nonexistent ciphertext for block 0.
This vector IV, which is a random number generated by the sender, is sent along with the message so
that the first block of plaintext can be retrieved.
Cipher block chaining (CBC) for large messages
We conclude by noting that there is no published mathematical proof that DES is secure.
What security it achieves it does through the application of two techniques: confusion and diffusion.
What we can say is that the only known way to break DES is to exhaustively search all possible 2^56
keys, although
on average you would expect to have to search only half of the key space, or 2^55 = 3.6×10^16 keys.
For this reason, many applications now use triple-DES (3DES), that is, encrypt the data three times.
This can be done with three separate keys, or with two keys: The first is used, then the second, and finally
the first key is used again.
Public Key Encryption (RSA)
it involves different keys for encryption (public key) and decryption (private key)
To do this, choose two large prime numbers p and q, and multiply them together to get n.
Given these two keys, encryption is defined by the following formula: c = me mod n
and decryption is defined by
m = cd mod n
where m is the plaintext message and c is the resulting ciphertext.
Note that m must be less than n, which means that it can be no more than the 1024 bits long.
Example
Suppose we pick p = 7 and q = 11 n = 7 × 11 = 77 and ( p − 1) × (q − 1) = 60
so we need to pick a value of e that is relatively prime to 60. We choose e = 7; 7 and 60 have no common
factor except 1.
d = 7−1 mod ((7 − 1) × (11 − 1)) which is to say 7 × d = 1 mod 60
It turns out that d = 43, since
So now we have the public key e, n = 7, 77 and the private key d, n = 43, 77.
Now consider a simple encryption operation. Suppose we wantto encrypt message containing the value 9.
Following the encryption algorithm above: c = me mod n
= 97 mod 77
= 37
So 37 is the ciphertext that we would send.
There are a number of popular message digest algorithms known as MDn for various values of n. MD5
is the most widely used at the time of writing.
The secure hash algorithm (SHA) is another well-known message digest function.
which is to compute a fixed-length cryptographic checksum from an arbitrarily long input message.
These algorithms operate on a message 512 bits at a time, so the first step is to pad the message to a
multiple of 512 bits.
Overview of message digest operation
There are a number of popular message digest algorithms known as MDn for various values of n. MD5
is the most widely used at the time of writing.
The secure hash algorithm (SHA) is another well-known message digest function.
which is to compute a fixed-length cryptographic checksum from an arbitrarily long input message.
These algorithms operate on a message 512 bits at a time, so the first step is to pad the message to a
multiple of 512 bits.
Overview of message digest operation
The second pass looks pretty much the same as the first pass (especially if your eyes are glazingover). The
differences are the following
• The constants T1 through T16 are replaced by another set (T17 through T32).
The amount of the left rotation is {5, 9, 14, 20, 5, 9, . . .} at each step.
Instead of taking the bytes of the message in order m0 through m5 , the message byte that is used at stage
i is m(5i+1)mod16.
In the third pass:
G is replaced by yet another function H, which is just the XOR of its arguments.
the amount of the left rotation is {4, 11, 16, 23, 4, 11, . . .} at each step.
The message byte that is used at stage i is the fourth pass has the following properties:
H is replaced by the function I, which is a combination of bitwise XOR, OR, and NOT on its arguments.
The amount of the left rotation is {6, 10, 16, 21, 6, 10, . . .} at each step.
m(3i+5)mod16
Authentication Protocols
This section describes three common protocols for implementing authentication.
The first two use secret key cryptography (e.g., DES), while the third uses public key cryptography
(e.g., RSA).
A simple authentication protocol is possible when the two participants who want toauthenticate each other
—think of them as a client a nd a server—already share a secret key.
Three-Way Handshake
the client first selects a random number x and encrypts it using its secret key, which we denote as
CHK (client handshake key).
The client then sends E(x, CHK), along with an identifier (ClientId), for itself to the server.
The server uses the key it thinks corresponds to client ClientId (call it SHK for server handshake
key) to decrypt the random number. The server adds 1 to the number it recovers and sends the result
back to the client.
It also sends back a random number y that has been encrypted with SHK.
Next, the client decrypts the first half of this message and if the result is 1 more than the random
number x that it sent to the server, it knows that the server possesses its secret key.
At this point, the client has authenticated the server. The client also decrypts the random number
the server sent it (this should yield y),
encrypts this number plus 1, and sends the result to the server. If the server is able to recover y + 1,
then it knows the client is legitimate.
After the third message, each side has authenticated itself to the other. The fourth message in
corresponds to the server sending the client a session key (SK) ,encrypted using SHK (which is equal
to CHK).
The one we describe is the one used in Kerberos, a TCP/IP-based security system developed at
MIT.
In the following, we denote the two participants who want to authenticate each other as A and B,
and we call the trusted authentication server S.
The Kerberos protocol assumes that A and B each share a secret key with S; we denote these two
keys as KA and KB, respectively.
Third-party authentication in Kerberos
As illustrated in Figure participant A first sends a message to server S that identifies both itself and
B.
The server then generates a timestamp T, a lifetime L, and a new session key K. Timestamp T is
going to serve much the same purpose as the random number in the simple three-way handshake
protocol given above, plus it is used in conjunction with L to limit the amount of time that session key
K is valid.
Participants A and B will have to go back to server S to get a new session key when this time
expires. The idea here is to limit the vulnerability of any one session key.
Server S then replies to A with a two-part message. The first part encrypts the three values T, L, and K,
along with the identifier for participant B, using the key that the server shares with A.
The second part encrypts the three values T, L, and K, along with participant A’s identifier, but this
time using the key that the server shares with B.
Clearly, when A receives this message, it will be able to decrypt the first part but not the second
part. A simply passes this second part on to B, along with the encryption of A and T using the new
session key K.
Finally, B decrypts the part of the message from A that was originally encrypted by S, and in so
doing, recovers T, K, and A.
It uses K to decrypt the half of the message encrypted by A and, upon seeing that A and T are
consistent in the two halves of the message, replies with a message that encrypts T + 1 using the new
session key K
A and B can now communicate with each other using the shared secret session key K to ensure
privacy.
Public Key Authentication
Our final authentication protocol uses public key cryptography.
The public key protocol is a useful one because the two sides need not share a secret key; they only
need to know the other side’s public key.
participant A encrypts a random number x using participant B’s public key, and B proves it knows
the corresponding private key by decrypting the message and sending x back to A. A could authenticate
itself to B in exactly the same way.
The plaintext message plus the MIC would be transmitted to the receiver, with the MIC acting as a sort of
checksum—if the receiver could not reproduc e the attached MIC using the secret key it shares with the
sender, then either the message was not sent by the sender, or it was modified since it was transmitted.
The second and third approaches use MD5. Digital Signature Using RSA
A digital signature is a special case of a message integrity code, where the code can have been
generated only by one participant.
since a given participant is the only one that knows its own private key, the participant uses this key to
produce the signature. Any other participant can verify this signature using the corresponding public key.
to sign a message, you encrypt it using your private key, and to verify a signature, you decrypt it using
the public key of the purported sender.
Keyed MD5
MD5 produces a cryptographic checksum for a message.
Suppose that we can arrange for the sender and receiver of a message to share a secret key k. This might be
done by preconfiguration of the key, or by some more dynamic mechanism such as Kerberos.
The sender then runs MD5 over the concatenation of the message (denoted m) and this key. In practice,
the key k is attached to the end of the message for the purpose of running MD5; k is then removed from the
message once MD5 is finished.
m+ MD5(m+ k
The sender picksk at random, encrypts it using RSA andthe receiver’s public key, and
thenencrypts the result with it own private key. The result can now be sent to the receiver along with the
original message and the MD5 checksum.
That is, the sender does not sign the entire message, it just signs the checksum. The original message,
theMD5checksum, and the RSA signature for the checksum are then transmitted.
m+ E(MD5(m), private
The receiver verifies the message by running the MD5 algorithm on the received message decrypting
the signed checksum with the sender’s public key comparing the two checksums.
If they match, this means that the message was not modified since the time the sender computedthe
MD5 checksum and signed it.
Suppose participant A wants to convey his public key to participant B. He can’t just use email or a
bulletin board to send it, because without A’s public key, B has no way to authenticate the key as having
really come from A.
Some third party could send a public key to B and claim that the message came from A.
If A and B are individuals who know each other, then they can get together in the same room and
A can give his public key to B directly, perhaps on a business card. However, there are clear shortcomings to
this approach, such as the inability to receive a key from someone unless you can be in the same room with
them.
The basic solution to the problem relies on the use of digital certificates. The following sections explain
what certificates are and some issues that arise in using them to achieve widespread key distribution.
Certificates
A digital signature proves that the data was generated by the owner of a certain key and that it has not
been modified since it was signed.
The document says, in effect, ―I certify that the public key in this document belongs to the entity
named in this document, signed X.” X in this case could be anyone with a public key.
It is commonly the case that X would be a certification authority (CA),4 that is, an administrative entity
that is in the business of issuing certificates.
It should be clear that this certificate is only useful to a participant who already holds the public key for
X because that key is needed to verify the signature.
Thus, certificates do not in themselves solve the key distribution problem, but they give us a way to
make inroads on it.
Clearly, once you have a public key for one entity X, you can start to accumulate more public keys from
other participants if those participants can get certificates issued by X.
If X certifies that a certain public key belongs to Y, and then Y goes on to certify that another public key
belongs to Z, then there exists a chain of certificates from X to Z, even though X and Z may have never met.
If Z wants to provide his public key to A, he can provide the complete chain of certificates—the
certificate for Y’s public key issued by X, and the certificate for Z’s key issued by Y.
If A has the public key for X, he can use the chain to verify that the public key of Z is legitimate.
With this idea of building chains of trust, public key distribution becomes somewhat more tractable.
There are still significant issues with building chains of trust. First of all, even if you are certain that
you have the public key of the root CA, you need to be sure that every CA from the root on down is doing
its job properly.
If some CA is willing to issue certificates to individuals without verifying their identity, then what looks
like a valid chain of certificates becomes meaningless.
A different approach to this problem, in which chains of trust form arbitrary meshes rather than a rigid
tree.
One of the major standards for certificates is known as X.509. This standard leaves a lot of details open,
but specifies a basic structure for certificates.
a digital signature
Certificate Revocation
One issue that arises with certificates is how to revoke, undo, a certificate.
The basic solution to the problem is simple enough. A certification authority can issue a certificate
revocation list (CRL), which is a digitally signed list of certificates that have been revoked.
Because it is digitally signed, it can just be posted on a bulletin board. Now, when participant A receives
a certificate for B that he wants to verify, A will first consult the latest CRL issued by the CA. As long as the
certificate has not been revoked, it is valid.
In between these are a number of protocols that operate at the transport layer, notably the IETF’s
Transport Layer Security (TLS) standard and the older protocol from which it derives, SSL (Secure Socket
Layer).
The following sections describe the salient features of each of these approaches.
PGP has become quite popular in the networking community, and PGP key signing parties are a regular
feature of IETF meetings. At these gatherings, an individual can collect public keys from others whose
identity he knows provide his public key to others get his public key signed by others, thus collecting
certificates that will be persuasive to an increasingly large set of people sign the public key of other
individuals, thus helping them build up their set of certificates that they can use to distribute their
public keys collect certificates from other individuals whom he trusts enough to sign keys Example of
how PGP works
Now suppose userA wants to send a message to user B and prove to B that it truly
camefromA.
Encryption of a message is equally straightforward and is summarized in the following ,A randomly picks a
per-message key that is used to encrypt the message using a symmetric algorithm such as DES.
The per-message key is encrypted using the public key of the recipient. PGP obtains this key from A’s key
ring and notifies A of the level of trust he has assigned to this key.
The message is encoded to prevent damage by mail gateways and sent to B. On receipt, B uses his private
key to decrypt the per-message key, and then uses the appropriate algorithm to decrypt the message.
It is intended to replace the less secure Telnet and rlogin programs used in the early days of the Internet
SSH is most often used to provide strong client/server authentication.
The SSH client runs on the user’s desktop machine and the SSH server runs on some remote machine
that the user wants to log into.
When the users login, both their passwords and all the data they send or receive potentially passes
through countless untrusted networks.
SSH provides a way to encrypt the data sent over these connections and to improve the strength of the
authentication mechanism they use to login.
Any time a user uses SSH to log onto a remote machine, the first step is to set up an SSH-
TRANS channel between those two machines.
The two machines establish this secure channel by first having the client authenticate the server
using RSA.
Once authenticated, the client and server establish a session key that they will use to encrypt
any data sent over the channel.
Also, SSH-TRANS includes a message integrity check of all data exchanged over the channel.
The server tells the client its public key at connection time.
The first time a client connects to a particular server, SSH warns the user that it has never talked
to this machine before and asks if the user wants to continue.
Although it is a risky thing to do, because SSH is effectively not able to authenticate the server,
users often say ―yesǁ to this question.
SSH then remembers the server’s public key, and the next time the user connects to that same
machine, it compares this saved key with the one the server responds with.
If they are the same, SSH authenticates the server.
If they are different, however, SSH again warns the user that something is amiss, and the user is
then given an opportunity to abort the connection
SSH-AUTH
The next step is for the user to actually log onto the machine, or more
specifically,authenticate him- or herself to the server.
Finally, SSH has proven so useful as a system for securing remote login that it has been extended to also
support other insecure TCP-based applications.
The idea is to run these applications over a secure ―SHH tunnel.ǁ This capability is called port
forwarding, and it uses the SSH-CONN protocol.
The idea is illustrated in Figure where we see a client on host A indirectly communicating with a server
on host B by forwarding its traffic through an SSH connection.
Using SSH port forwarding to secure other TCP-based applications. The mechanism is called port
forwarding because when messages arrive at the well known SSH port on the server,
SSH first decrypts the contents, and then ―forwardsǁ the data to the actual port at which the server
is listening.