CN Mod3
CN Mod3
IntroductiontoNetworkLayer
T he network layer in the TCP/IP protocol suite is responsible for the host-to-
hostdelivery of datagrams. It provides services to the transport layer and receives ser-
vices from the data-link layer. In this chapter, we introduce the general concepts and
issuesinthenetworklayer.Thischapteralsodiscussestheaddressingmechanismused
inthenetwork layer,asbrieflymentionedinChapter2.Thischapterpreparestheway
fordiscussionofothernetwork-layerissues,whichfollowsinthenextfourchapters.
Thechapterisdividedintofivesections.
❑ Thefirstsectionintroducesthenetworklayerbydefiningtheservicesprovidedby this
layer. It first discusses packetizing. It then describes forwarding and routing and
compares the two. The section then briefly explains the other services such as
flow, error, and congestion control.
❑ Thesecondsectiondiscussespacketswitching,whichoccursatthenetworklayer. The
datagram approach and the virtual-circuit approach of packet switching are
described in some detail in this section.
❑ The third section discusses network-layer performance. It describes different
delays that occur in network-layer communication. It also mentions the issue of
packet loss. Finally, it discusses the issue of congestion control at the network
layer.
❑ ThefourthsectiondiscussesIPv4addressing,probablythemostimportantissuein the
network layer. It first describes the address space. It then briefly discusses classful
addressing,whichbelongsto thepast butis usefulin understanding class- less
addressing. The section then moves to classless addressing and explains sev- eral
issues related to this topic. It then discusses DHCP, which can be used to
dynamically assign addresses in an organization. Finally, the section discusses
NAT, which can be used to relieve the shortage of addresses to some extent.
❑ Thefifthsectiondiscussesforwardingofnetwork-layerpackets.Itfirstshowshow
forwarding can be done based on the destination address in a packet. It then dis-
cusses how forwarding can be done using a label.
511
512 PARTIVNETWORKLAYER
NETWORK-LAYERSERVICES
Before discussing the network layer in the Internet today, let’s briefly discuss the
network-layer services that, in general, are expected from a network-layer protocol.
Figure 18.1 shows the communication between Alice and Bob at the network layer.
This is the same scenario we used in Chapters 3 and 9 to show the communication at
the physical and the data-link layers, respectively.
Figure18.1Communicationatthenetworklayer
SkyResearch
Alice
Application
TransportN
Alice etworkData
-
linkPhysica
l
R2
Network
Toother Data-
ISPs linkPhysi
R1 R2 cal
R4
Toother
Network
ISPs
R3 R4 Data-
linkPhysi
cal
I III II
SwitchedWAN
National ISP R5
Network
R5
Data-
linkPhysi
cal
ISP
Toother R7
ISPs Network
Data-
Legend R6 R7 linkPhysi
cal
Point-to-pointWANLANswitch
WANswitch Bob
Application
Router TransportN
etworkData
-
BobScient linkPhysica
ificBooks l
CHAPTER18INTRODUCTIONTONETWORKLAYER 513
The figure shows that the Internet is made of many networks (or links) connected
throughtheconnectingdevices.Inotherwords,theInternetisaninternetwork,a
514 PARTIVNETWORKLAYER
combinationofLANsandWANs.Tobetterunderstandtheroleofthenetworklayer(or the
internetwork layer), we need to think about the connecting devices (routers or
switches) that connect the LANs and WANs.
As the figure shows, the network layer is involved at the source host, destination
host, and all routers in the path (R2, R4, R5, and R7). At the source host (Alice), the
networklayeracceptsapacketfromatransportlayer,encapsulatesthepacketinadata- gram,
and delivers the packet to the data-link layer. At the destination host (Bob), the
datagram is decapsulated, and the packet is extracted and delivered to the correspond-
ing transport layer. Although the source and destination hosts are involved in all five
layers of the TCP/IP suite, the routers use three layers if they are routing packets only;
however, they may need the transport and application layers for control purposes. A
routerin thepathis normally shown with twodata-link layersand two physical layers,
because it receives a packet from one network and delivers it to another network.
Packetizing
Thefirstdutyofthenetworklayerisdefinitelypacketizing:encapsulatingthepayload
(datareceivedfromupperlayer)inanetwork-layerpacketatthesourceanddecapsulat- ing
the payload from the network-layer packet at the destination. In other words, one duty
of the network layer is to carry a payload from the source to the destination with- out
changing it or using it. The network layer is doing the service of a carrier such as the
postal office, which is responsible for delivery of packages from a sender to a receiver
without changing or using the contents.
The source host receives the payload from an upper-layer protocol, adds a header
that contains the source and destination addresses and some other information that is
required by the network-layer protocol (as discussed later) and delivers the packet to
the data-link layer. The source is not allowed to change the content of the payload
unless it is too large for delivery and needs to be fragmented.
The destination host receives the network-layer packet from its data-link layer,
decapsulatesthepacket,anddeliversthepayloadtothecorrespondingupper-layerpro-
tocol.Ifthepacketisfragmentedatthesourceoratroutersalongthepath,thenetwork layer is
responsible for waiting until all fragments arrive, reassembling them, and delivering
them to the upper-layer protocol.
The routers in the path are not allowed to decapsulate the packets they received
unlessthepacketsneedtobefragmented.Theroutersarenotallowedtochangesource and
destinationaddresseseither.Theyjustinspecttheaddressesfor thepurposeoffor-
wardingthepackettothenextnetworkonthepath.However,ifapacketisfragmented,
theheaderneedstobecopiedtoallfragmentsandsomechangesareneeded,aswedis- cuss in
detail later.
RoutingandForwarding
Other duties of the network layer, which are as important as the first, are routing and
forwarding, which are directly related to each other.
Routing
The network layer is responsible for routing the packet from its source to the destina-
tion. A physical network isa combination of networks (LANs and WANs) and routers
CHAPTER18INTRODUCTIONTONETWORKLAYER 515
that connect them. This means that there is more than one route from the source to the
destination.Thenetworklayerisresponsibleforfindingthebestoneamongthesepos- sible
routes. The network layer needs to have some specific strategies for defining the
bestroute.IntheInternettoday,thisisdoneby running someroutingprotocolstohelp the
routers coordinate their knowledge about the neighborhood and to come up with
consistenttablestobeusedwhenapacketarrives.Theroutingprotocols,whichwedis- cuss in
Chapters 20 and 21, should be run before any communication occurs.
Forwarding
If routing is applying strategies and running some routing protocols to create the
decision-makingtablesforeachrouter,forwardingcanbedefinedastheactionapplied
byeachrouterwhenapacketarrivesatoneofitsinterfaces.Thedecision-makingtable
arouternormallyusesforapplyingthisactionissometimescalledtheforwardingtable and
sometimes the routing table. When a router receives a packet from one of its attached
networks, it needs to forward the packet to another attached network (in
unicastrouting)ortosomeattachednetworks(inmulticastrouting).Tomakethisdeci-
sion,therouterusesapieceofinformationinthepacketheader,whichcanbethedesti- nation
address or a label, to find the corresponding output interface number in the forwarding
table. Figure 18.2 shows the idea of the forwarding process in a router.
Figure18.2Forwardingprocess
Forwarding table
Forwardingvalue
Outputinterface
Note:
AB 1
BandCcanbethesame or different.
2 the packetout of interface 2
Send
2CData
Forwardingvalue
4
B Data 1
3
Other Services
Letusbrieflydiscussotherservicesexpectedfromthenetworklayer.
ErrorControl
InChapter10,wediscussederrordetectionandcorrection.Althougherrorcontrolalso can be
implemented in the network layer, the designers of the network layer in the Internet
ignored this issue for the data being carried by the network layer. One reason for this
decision is the fact that the packet in the network layer may be fragmented at each
router, which makes error checking at this layer inefficient.
516 PARTIVNETWORKLAYER
The designers of the network layer, however, have added a checksum field to the
datagram to control any corruption in the header, but not in the whole datagram. This
checksum may prevent any changes or corruptions in the header of the datagram.
We need to mention that although the network layer in the Internet does not
directly provide error control, the Internet uses an auxiliary protocol, ICMP, that
provides some kind of error control if the datagram is discarded or has some unknown
information in the header. We discuss ICMP in Chapter 19.
FlowControl
Flowcontrolregulatestheamountofdataasourcecansendwithoutoverwhelmingthe
receiver. If the upper layer at the source computer produces data faster than the upper
layer at the destination computer can consume it, the receiver will be overwhelmed
with data. To control the flow of data, the receiver needs to send some feedback to the
sender to inform the latter that it is overwhelmed with data.
ThenetworklayerintheInternet,however,doesnotdirectlyprovideanyflowcon-
trol.Thedatagramsaresentbythesenderwhentheyareready,withoutanyattentionto the
readiness of the receiver.
Afewreasonsforthelackofflowcontrolinthedesignofthenetworklayercanbe
mentioned. First, since there is no error control in this layer, the job of the network
layer at the receiver is so simple that it may rarely be overwhelmed. Second, the upper
layers that use the service of the network layer can implement buffers to receive data
fromthenetworklayerastheyarereadyanddonothavetoconsumethedataasfastas it
isreceived.Third,flowcontrolisprovidedformostof theupper-layerprotocolsthat use the
services of the network layer, so another level of flow control makes the net- work
layer more complicated and the whole system less efficient.
CongestionControl
Another issue in a network-layer protocolis congestion control. Congestioninthenet-
work layer is a situation in which too many datagrams are present in an area of the
Internet.Congestionmayoccurifthenumberofdatagramssentbysourcecomputersis
beyond the capacity of the network or routers. In thissituation,some routersmay drop
some of the datagrams. However, as more datagrams are dropped, the situation may
become worse because, due to the error control mechanism at the upper layers, the
sender may send duplicates of the lost packets. If the congestion continues, sometimes
a situation may reach a point where the system collapses and no datagrams are deliv-
ered.Wediscusscongestioncontrolatthenetworklayerlaterinthechapteralthoughit is not
implemented in the Internet.
QualityofService
As the Internet has allowed new applications such as multimedia communication (in
particularreal-timecommunicationofaudioandvideo),thequalityofservice(QoS)of the
communication has become more and more important. The Internet has thrived by
providing better quality of service to support these applications. However, to keep the
network layer untouched, these provisions are mostly implemented in the upper layer.
We discuss this issue in Chapter 30 after we have discussed multimedia.
CHAPTER18INTRODUCTIONTONETWORKLAYER 517
Security
Another issue related to communication at the network layer is security. Security was
not a concern when the Internet was originally designed because it was used by asmall
number of users at universities for research activities; other people had noaccess to the
Internet. The network layer was designed with no security provision. Today, however,
security is a big concern. To provide security for a connectionless network layer, we
need to have another virtual level that changes the connectionless service to a
connection-oriented service. This virtual layer, called IPSec, is discussedin Chapter
32.
PACKETSWITCHING
From the discussion of routing and forwarding in the previous section, we infer that a
kindofswitchingoccursatthenetworklayer.Arouter,infact,isaswitchthatcreatesa
connection between an input port and an output port (or a set of output ports), just as
an electrical switch connects the input to the output to let electricity flow.
Although in data communication switching techniques are divided into two broad
categories, circuit switching and packetswitching, onlypacket switching isused at the
network layer because the unit of data at this layer is a packet. Circuit switching is
mostly used at the physical layer; the electrical switch mentioned earlier is a kind of
circuit switch. We discussed circuit switching in Chapter 8; we discuss packet switch-
ing in this chapter.
At the network layer, a message from the upper layer is divided into manageable
packets and each packet is sent through the network. The source of the message sends
thepacketsonebyone;thedestinationofthemessagereceivesthepacketsonebyone. The
destination waits for all packets belonging to the same message to arrive before
deliveringthemessagetotheupperlayer.Theconnectingdevicesinapacket-switched
network still need to decide how to route the packets to the final destination. Today, a
packet-switched network can use two different approaches to route the packets: the
datagramapproachandthevirtualcircuitapproach.Wediscussbothapproachesinthe next
section.
DatagramApproach:ConnectionlessService
WhentheInternetstarted,tomakeitsimple,thenetworklayerwasdesignedtoprovide a
connectionless service in which the network-layer protocol treats each packet inde-
pendently, with each packet having no relationship to any other packet. The idea was
thatthenetworklayerisonlyresponsiblefordeliveryofpacketsfromthesourcetothe
destination. In this approach, the packets in a message may or may not travel the same
path to their destination. Figure 18.3 shows the idea.
Whenthenetworklayerprovidesaconnectionlessservice,eachpackettravelingin
theInternetisanindependententity;thereisnorelationshipbetweenpacketsbelonging to the
same message. The switches in this type of network are called routers. A packet
belongingtoamessagemaybefollowedbyapacketbelongingtothesamemessageor
toadifferentmessage.Apacketmaybefollowedbyapacketcomingfromthesameor from a
different source.
518 PARTIVNETWORKLAYER
Figure18.3Aconnectionlesspacket-switchednetwork
Legend
4321Packets
Network A connectionless (datagram)packet-switchednetwork
R1R2
4321 2
Sender1
4 Network
2
3 1
3 R4
3
4 1342
R3 R5Outoforder
Receiver
Eachpacketisroutedbasedontheinformationcontainedinitsheader:sourceand
destination addresses. The destination address defines where it should go; the source
address defines where it comes from. The router in this case routes the packet based
only on the destination address. The source address may be used to send an error mes-
sage to the source if the packet is discarded. Figure 18.4 shows the forwarding process
in a router in this case. We have used symbolic addresses such as A and B.
Figure18.4Forwardingprocessinarouterwhenusedinaconnectionlessnetwork
Forwardingtable
Legend
Destination Output
address interface SA:Sourceaddress
AB 1 DA:Destinationaddress
2
H
Destination 3 Send the
addressB
packetout of
interface 2
SADA
2 SADA
Data 1 Data
3 4
Virtual-CircuitApproach:Connection-OrientedService
Inaconnection-orientedservice(alsocalledvirtual-circuitapproach),thereisarelation-
CHAPTER18INTRODUCTIONTONETWORKLAYER 519
shipbetweenallpacketsbelongingtoamessage.Beforealldatagramsinamessagecan
besent,avirtualconnectionshouldbesetuptodefinethepathforthedatagrams.After
connectionsetup,thedatagramscanallfollowthesamepath.Inthistypeofservice,not
520 PARTIVNETWORKLAYER
onlymustthepacketcontainthesourceanddestinationaddresses,itmustalsocontaina
flowlabel,avirtualcircuitidentifierthatdefinesthevirtualpaththepacketshouldfollow.
Shortly,wewillshowhowthisflowlabelisdetermined,butforthemoment,weassume that
the packet carries this label. Although it looks as though the use of the label may make
the source and destination addresses unnecessary during the data transfer phase,
partsoftheInternetatthenetworklayerstillkeeptheseaddresses.Onereasonisthatpart
ofthepacketpathmaystillbeusingtheconnectionlessservice.Anotherreasonisthatthe
protocol at the network layer is designed with these addresses, and it may take a while
before they can be changed. Figure 18.5 shows the concept of connection-oriented
service.
Figure18.5Avirtual-circuitpacket-switchednetwork
Legend
Network 432
1Packets
Aconnection-orientedpacket-switchednetwork Virtual circuit
R1R2
4 3 21
Sender 4
3 R5
2 Network
1
4321 4 321
R3R4
Receiver
Each packet is forwarded based on the label in the packet. To follow the idea of
connection-oriented design to be used in the Internet, we assume that the packet has a label
whenitreachestherouter.Figure18.6showstheidea.Inthiscase,theforwardingdeci-
sionisbasedonthevalueofthelabel,orvirtualcircuitidentifier,asitissometimescalled. To
create a connection-oriented service, a three-phase process is used: setup, data transfer,
and teardown. In the setup phase, the source and destination addresses of the sender
and receiver are used to make table entries for the connection-oriented service. Inthe
teardownphase, the source and destinationinformthe router todelete the corre-
spondingentries.Datatransferoccursbetweenthesetwophases.
SetupPhase
In the setup phase, a router creates an entry for a virtual circuit. For example, suppose
sourceAneedstocreateavirtualcircuittodestinationB.Twoauxiliarypacketsneedto
beexchangedbetweenthesenderandthereceiver:therequestpacketandtheacknowl-
edgment packet.
CHAPTER18INTRODUCTIONTONETWORKLAYER 519
Figure18.6Forwardingprocessinarouterwhenusedinavirtual-circuitnetwork
Forwarding table
Legend
Incoming Outgoing
SA:Source
Port Label Port Label addressDA:Destinationa
1 L1 2 L2 ddressL1, L2: Labels
1 2 L2 Data
L1 SADA Data SADA
Incoming 3 4
label Outgoinglabe
l
Requestpacket
Arequestpacketissentfromthesourcetothedestination.Thisauxiliarypacketcarries the
source and destination addresses. Figure 18.7 shows the process.
Figure18.7Sendingrequestpacketinavirtual-circuitnetwork
AtoB Legend
Incoming Outgoing
AtoB Requestpacket
Network Port Label Port Label
Virtualcircuit
1 14 3
1
AtoB R1 2 R2
1 4
A 3
R5 Network
2 Ato B
12 3 2 3 4
AtoB AtoB
R33 1R4 4
B
1. SourceAsendsarequestpackettorouterR1.
2. Router R1 receives the request packet. It knows that a packet going from A to B
goes out through port 3. How the router has obtained this information is a point
covered later. For the moment, assume that it knows the output port. The router
creates an entry in its table for this virtual circuit, but it is only able to fill three of
the four columns. The router assigns the incoming port (1) and chooses an avail-
ableincominglabel(14)andtheoutgoingport(3).Itdoesnotyetknowtheoutgo- ing
label, which will be found during the acknowledgment step. The router then
forwards the packet through port 3 to router R3.
3. Router R3 receives the setup request packet. The same events happen here as at
routerR1;threecolumnsofthetablearecompleted:inthiscase,incomingport(1),
incoming label (66), and outgoing port (3).
4. Router R4 receives the setup request packet. Again, three columns are completed:
incoming port (1), incoming label (22), and outgoing port (4).
5. Destination B receives the setup packet, and if it is ready to receive packets from
A, it assigns a label to the incoming packets that come from A, in this case 77, as
shown in Figure 18.8. This label lets the destination know that the packets come
from A, and not from other sources.
AcknowledgmentPacket
Aspecialpacket,calledtheacknowledgmentpacket,completestheentriesinthe switching
tables. Figure 18.8 shows the process.
Figure18.8Sendingacknowledgmentsinavirtual-circuitnetwork
AtoB Legend
Incoming Outgoing Acknowledgment packetVirtualcircuit
Network Port Label Port Label
1 14 3 66
4
R1 2
14 R2
1 4
A 3
R5
3 66 Network
12 2
2 3 1
22 77
3 1 4
R3 R4
B
1. ThedestinationsendsanacknowledgmenttorouterR4.Theacknowledgmentcar-
riestheglobalsourceanddestinationaddressessotherouterknowswhichentryin the
table is to be completed. The packet also carries label 77, chosen by the desti-
nationastheincominglabelforpacketsfrom A.RouterR4usesthislabeltocom- plete
the outgoing label column for this entry. Note that 77 is the incoming label for
destination B, but the outgoing label for router R4.
2. Router R4 sends an acknowledgment to router R3 that contains its incoming label
inthetable,choseninthesetupphase.RouterR3usesthisastheoutgoinglabelin the table.
3. Router R3 sends an acknowledgment to router R1 that contains its incoming label
inthetable,choseninthesetupphase.RouterR1usesthisastheoutgoinglabelin the table.
4. FinallyrouterR1sendsanacknowledgmenttosourceAthatcontainsitsincoming label
in the table, chosen in the setup phase.
5. The source uses this as the outgoing label for the data packets to be sent to
destination B.
Data-TransferPhase
Thesecondphaseiscalledthedata-transferphase.Afterallroutershavecreated their
forwarding table for a specific virtual circuit, then the network-layer packets belonging
to one message can be sent one after another. In Figure 18.9, we show the flow of a
single packet, but the process is the same for 1, 2, or 100 packets.
Thesourcecomputerusesthelabel14,whichithasreceivedfromrouterR1inthesetup
Figure18.9Flowofonepacketinanestablishedvirtualcircuit
AtoB
Legend
Incoming Outgoing
ABData Datagram
Network Port Label Port Label
1 14 3 66 Virtualcircuit
R1 R2
14 ABData 2
1 4
A 3
R5 Network
66 ABData
12 2 3
22 AB Data 77ABData
3 1 4
R3 R4 B
phase.RouterR1forwardsthepackettorouterR3,butchangesthelabelto66. Router R3
forwards the packet to router R4, but changes the label to 22. Finally,router R4
delivers the packet to its final destination with the label 77. All the packetsin the
message follow the same sequence of labels, and the packets arrive in order at the
destination.
TeardownPhase
In the teardown phase, source A, after sending all packets to B, sends a special packet
calledateardownpacket.DestinationBrespondswithaconfirmation packet.Allrout- ers
delete the corresponding entries from their tables.
NETWORK-LAYERPERFORMANCE
The upper-layer protocols that use the service of the network layer expect to receivean
ideal service, but the network layer is not perfect. The performance of a networkcan be
measured in terms of delay, throughput, and packet loss. Congestion control is an
issue that can improve the performance.
Delay
Allofusexpectinstantaneousresponsefromanetwork,butapacket,fromitssourceto its
destination, encounters delays. The delays in a network can be divided into four
types:transmissiondelay, propagationdelay, processingdelay,andqueuingdelay.Let
usfirstdiscusseachofthesedelaytypesandthenshowhowtocalculateapacketdelay from the
source to the destination.
TransmissionDelay
A source host or a router cannot send a packet instantaneously. A sender needs to put
thebitsinapacketonthelineonebyone.Ifthefirstbitofthepacketisputontheline
attimet1andthelastbitisputonthelineattimet2,transmissiondelayofthepacketis (t2t1).
Definitely, the transmission delay is longer for a longer packet and shorter if the
sender can transmit faster. In other words, the transmission delay is
Delaytr(Packetlength)/(Transmissionrate).
Forexample,inaFastEthernetLAN(seeChapter13)withthetransmissionrateof
100millionbitspersecondandapacketof10,000bits,ittakes(10,000)/(100,000,000) or 100
microseconds for all bits of the packet to be put on the line.
PropagationDelay
PropagationdelayisthetimeittakesforabittotravelfrompointAtopointBinthetrans- mission
media. The propagation delay for a packet-switched network depends on the
propagationdelayofeachnetwork(LANorWAN).Thepropagationdelaydependson the
propagation speed of the media, which is 3 108 meters/second in a vacuum and
normally much less in a wired medium; it also depends on the distance of the link. In
other words, propagation delay is
Delaypg(Distance)/(Propagationspeed).
528 PARTIVNETWORKLAYER
ChokePacketAchokepacketisapacketsentbyanodetothesourcetoinformitof congestion.
Note the difference between the backpressure and choke-packet methods.
Inbackpressure,thewarningisfromonenodetoitsupstreamnode,althoughthewarn-
ingmayeventuallyreachthesourcestation.Inthechoke-packetmethod,thewarningis from
the router, which has encountered congestion, directly to the source station. The
intermediate nodes through which the packet has traveled are not warned. We will see
anexampleofthistypeofcontrolinICMP(discussedinChapter19).Whenarouterin the
Internet is overwhelmed with IP datagrams, it may discard some of them, but it
informs the source host, using a source quench ICMP message. The warning message
goesdirectlytothesourcestation;theintermediateroutersdonottake anyaction.Fig- ure
18.15 shows the idea of a choke packet.
Figure18.15Chokepacket
-
Choke
packet
I II III IV
Dataflow
IPV4ADDRESSES
The identifier used in the IP layer of the TCP/IP protocol suite to identify the connec-
tion of each device to the Internet is called the Internet address or IP address. An IPv4
address is a 32-bit address that uniquely and universally defines the connection of a
host orarouterto theInternet. TheIP addressis theaddress oftheconnection,notthe
CHAPTER18INTRODUCTIONTONETWORKLAYER 529
host or the router, because if the device is moved to another network, the IP address
may be changed.
IPv4addressesareuniqueinthesensethateachaddressdefinesone,andonlyone,
connection to the Internet. If a device has two connections to the Internet, via two
networks, it has two IPv4 addresses. IPv4 addresses are universal in the sense that the
addressing system must be accepted by any host that wants to be connected to the
Internet.
AddressSpace
A protocol like IPv4 that defines addresses has an address space. An address space is
thetotalnumberofaddressesusedbytheprotocol.Ifaprotocolusesbbitstodefinean
address,theaddressspaceis2bbecauseeachbitcanhavetwodifferentvalues(0or1). IPv4uses
32-bit addresses, whichmeansthattheaddressspaceis2 32 or4,294,967,296
(morethanfourbillion).Iftherewerenorestrictions,morethan4billiondevicescould be
connected to the Internet.
Notation
There are three common notations to show an IPv4 address: binary notation (base 2),
dotted-decimal notation (base 256), and hexadecimal notation (base 16). In binary
notation, anIPv4addressisdisplayedas32bits.Tomaketheaddressmorereadable,one or
more spaces are usually inserted between each octet (8 bits). Each octet is often
referredtoasabyte.TomaketheIPv4addressmorecompactandeasiertoread,itisusu-
allywrittenindecimalformwithadecimalpoint(dot)separatingthebytes.Thisformatis
referredtoasdotted-decimalnotation.Notethatbecauseeachbyte(octet)isonly8bits,
eachnumberinthedotted-decimalnotationisbetween0and255.Wesometimesseean
IPv4addressinhexadecimalnotation.Eachhexadecimaldigitisequivalenttofourbits.
Thismeansthata32-bitaddresshas8hexadecimaldigits.Thisnotationisoftenusedin
networkprogramming.Figure18.16showsanIPaddressinthethreediscussednotations.
Figure18.16ThreedifferentnotationsinIPv4addressing
Binary 10000000000010110000001100011111
. ..
Dotted decimal128 11 3 31
Hexadecimal80 0B03 1F
HierarchyinAddressing
Inanycommunicationnetworkthatinvolvesdelivery,suchasatelephonenetworkora postal
network, the addressing system is hierarchical. In a postal network, the postal
address(mailingaddress)includesthecountry,state,city,street,housenumber,andthe
530 PARTIVNETWORKLAYER
name of the mail recipient. Similarly, a telephone number is divided into the country
code, area code, local exchange, and the connection.
A32-bitIPv4addressisalsohierarchical,butdividedonlyintotwoparts.Thefirst part of
the address, called the prefix, defines the network; the second part of the
address,calledthesuffix,definesthenode(connectionofadevicetotheInternet).Fig-
ure18.17showstheprefixandsuffixofa32-bitIPv4address.Theprefixlengthis n bits and
the suffix length is (32 n) bits.
Figure18.17Hierarchyinaddressing
32 bits
n bits (32–n)bits
Prefix Suffix
Definesnetwork
Network
Definesconnection
to the node
A prefix can be fixed length or variable length. The network identifier in the IPv4
was first designed as a fixed-length prefix. This scheme, which is now obsolete, is
referred to as classful addressing. The new scheme, which is referred to as classless
addressing, uses a variable-length network prefix. First, we briefly discuss classful
addressing; then we concentrate on classless addressing.
ClassfulAddressing
WhentheInternetstarted,anIPv4addresswasdesignedwithafixed-lengthprefix,but to
accommodate both small and large networks, three fixed-length prefixes were
designed instead of one (n 8, n 16, and n 24). The whole address space was
divided into five classes (class A, B, C, D, and E), as shown in Figure 18.18. This
scheme is referred to as classful addressing.Although classful addressing belongs to
the past, it helps us to understand classless addressing, discussed later.
In class A, the network length is 8 bits, but since the first bit, which is 0, defines
the class, we can have only seven bits as the network identifier. This means there are
only 27 128 networks in the world that can have a class A address.
In class B, the network length is 16 bits, but since the first two bits, which are
(10)2, define the class, we can have only 14 bits as the network identifier. This means
there are only 214 16,384 networks in the world that can have a class B address.
All addresses that start with (110) 2 belong to class C. In class C, the network
length is 24 bits, but since three bits define the class, we can have only 21 bits as the
networkidentifier.Thismeansthereare2212,097,152networksintheworldthatcan have a
CHAPTER18INTRODUCTIONTONETWORKLAYER 531
class C address.
532 PARTIVNETWORKLAYER
Figure18.18 Occupationoftheaddressspaceinclassfuladdressing
Addressspace:4,294,967,296 addresses
A B C DE
Class
A Prefixes
n = 8 bits First byte
0 to 127
ClassA 0Prefix Suffix
B n = 16 bits 128 to 191
ClassB 10 Prefix Suffix
110 Prefix Suffix C n = 24 bits 192 to 223
ClassC
1110 Multicast addresses D Not applicable 224 to 239
ClassD
1111 Reservedforfutureuse E Not applicable 240 to 255
Class E
organizations that need more than the 256 addresses available in a class C block. This
idea did not work either because it makes the routing of packets more difficult.
AdvantageofClassfulAddressing
Although classful addressing had several problems and became obsolete, it had one
advantage: Given an address, we can easily find the class of the address and, since the
prefixlengthforeachclassisfixed,wecanfindtheprefixlengthimmediately.Inother
words,theprefixlengthinclassfuladdressingisinherentintheaddress;noextrainfor- mation
is needed to extract the prefix and the suffix.
ClasslessAddressing
Subnetting and supernetting in classful addressing did not really solve the address
depletion problem. With the growth of the Internet, it was clear that a larger address
space was needed as a long-term solution. The larger address space, however, requires
that the length of IP addresses also be increased, which means the format of the IP
packets needs to be changed. Although the long-range solution has already been
devised and is called IPv6 (discussed later), a short-term solution was also devised to
use the same address space but to change the distribution of addresses to provide a fair
share to each organization. The short-term solution still uses IPv4 addresses, but it is
called classless addressing. In other words, the classprivilege was removed from the
distribution to compensate for the address depletion.
There was another motivation for classless addressing. During the 1990s, Internet
Service Providers (ISPs) came into prominence. An ISP is an organization that pro-
vides Internet access for individuals, small businesses, and midsize organizations that
do not want to create an Internet site and become involved in providing Internet ser-
vices (such as electronic mail) for their employees. An ISP can provide these services.
An ISP is granted a large range of addresses and then subdivides the addresses (in
groups of 1, 2, 4, 8, 16, and so on), giving a range of addresses to a household or a
small business. The customers are connected via a dial-up modem, DSL, or cable
modem to the ISP. However, each customer needs some IPv4 addresses.
In1996,theInternetauthoritiesannouncedanewarchitecturecalledclassless
addressing. In classless addressing, variable-length blocks are used that belong to no
classes.Wecanhaveablockof1address,2addresses,4addresses,128addresses,andsoon.
Inclasslessaddressing,thewholeaddressspaceisdividedintovariablelength
blocks.Theprefixinanaddressdefinestheblock(network);thesuffixdefinesthenode
(device).Theoretically,we can have a block of 20,21, 22,232 addresses. One of the
restrictions,aswediscusslater,isthatthenumberofaddressesinablockneedstobea
powerof2.Anorganizationcanbegrantedoneblockofaddresses.Figure18.19shows
thedivisionofthewholeaddressspaceintononoverlappingblocks.
Figure18.19Variable-lengthblocksinclasslessaddressing
Unlikeclassfuladdressing,theprefixlengthinclasslessaddressingisvariable.We can
have a prefix length that ranges from 0 to 32. The size of the network is inversely
proportional to the length of the prefix. A small prefix means a larger network; a large
prefix means a smaller network.
Weneedtoemphasizethattheideaofclasslessaddressingcanbeeasilyappliedto classful
addressing. An address in class A can be thought of as a classless address in which the
prefix length is 8. An address in class B can be thought of as a classless
addressinwhichtheprefixis16,andsoon.Inotherwords,classfuladdressingisaspe- cial case
of classless addressing.
PrefixLength:SlashNotation
Thefirstquestionthatweneedtoanswerinclasslessaddressingishowtofindthepre- fix
length if an address is given. Since the prefix length is not inherent in the address, we
need to separately give the length of the prefix. In this case, the prefix length, n, is
added to the address, separated by a slash. The notation is informally referred to as
slash notation and formally as classless interdomain routing or CIDR (pronounced
cider) strategy. An address in classless addressing can then be represented as shown in
Figure 18.20.
Figure18.20Slashnotation(CIDR)
Examples:
byte byte byte byte / n 12.24.76.8/8
23.14.67.92/12
Prefixlengt 220.8.24.255/25
h
Inotherwords,anaddressinclasslessaddressingdoesnot,perse,definetheblock or
network to which the address belongs; we need to give the prefix length also.
ExtractingInformationfromanAddress
Given any address in the block, we normally like to know three pieces of information
abouttheblocktowhichtheaddressbelongs:thenumberofaddresses,thefirstaddress in the
block, and the last address. Since the value of prefix length, n, is given, we can easily
find these three pieces of information, as shown in Figure 18.21.
1. ThenumberofaddressesintheblockisfoundasN232n.
2. To find the first address, we keep thenleftmost bits and set the (32 n) rightmost
bits all to 0s.
3. To find the last address, we keep the n leftmost bits and set the (32 n) rightmost
bits all to 1s.
Example18.1
A classless address is given as 167.199.170.82/27. We can find the above three pieces of infor-
mation as follows. The number of addresses in the network is 232n 25 32 addresses.
CHAPTER18INTRODUCTIONTONETWORKLAYER 535
Figure18.21Informationextractioninclasslessaddressing
n bits (32–n)bits
Numberofaddresses:N=232–n
Thefirstaddresscanbefoundbykeepingthefirst27bitsandchangingtherestofthebitsto0s. Address:
167.199.170.82/27 10100111110001111010101001010010
Firstaddress:167.199.170.64/27 10100111110001111010101001000000
Thelastaddresscanbefoundbykeepingthefirst27bitsandchangingtherestofthebits to 1s.
Address:167.199.170.82/27 10100111110001111010101001011111
Lastaddress: 167.199.170.95/2710100111110001111010101001011111
AddressMask
Anotherway to find the first and last addresses in the blockis to usethe address mask.
The address mask is a 32-bit number in which the n leftmost bits are set to 1s and the
rest of the bits (32 n) are set to 0s. A computer can easily find the address mask
becauseitisthecomplementof(232n1).Thereasonfordefiningamaskinthisway
isthatitcanbeusedbyacomputerprogramtoextracttheinformationinablock,using the three
bit-wise operations NOT, AND, and OR.
1. ThenumberofaddressesintheblockNNOT(mask)1.
2. Thefirstaddressintheblock(Anyaddressintheblock)AND(mask).
3. Thelastaddressintheblock(Anyaddressintheblock)OR[(NOT(mask)].
Example18.2
We repeatExample18.1usingthe mask.The maskin dotted-decimal notationis
256.256.256.224. The AND, OR, and NOT operations can be applied to individual bytes using
calculators and applets at the book website.
Number of addresses in the block: NNOT(mask)10.0.0.31132addresses First
address: First (address) AND (mask) 167.199.170.82
Lastaddress: Last(address)OR(NOTmask)167.199.170.255
536 PARTIVNETWORKLAYER
Example18.3
In classless addressing, an address cannot per se define the block the address belongs to. For
example, the address 230.8.24.56 can belong to many blocks. Some of them are shown below
with the value of the prefix associated with that block.
Prefixlength:16 Block: 230.8.0.0 to 230.8.255.255
Prefixlength:20 Block: 230.8.16.0 to 230.8.31.255
Prefixlength:26 Block: 230.8.24.0 to 230.8.24.63
Prefixlength:27 Block: 230.8.24.32 to 230.8.24.63
Prefixlength:29 Block: 230.8.24.56 to 230.8.24.63
Prefixlength:31 Block: 230.8.24.56 to 230.8.24.57
NetworkAddress
The above examples show that, given any address, we can find all information about
the block. The first address, the network address, is particularly important because it
is used in routing a packet to its destination network. For the moment, let us assume
that an internet is made of m networks and a router with m interfaces. When a packet
arrives at the router from any source host, the router needs to know to which network
thepacketshouldbesent:fromwhichinterfacethepacketshouldbesentout.Whenthe packet
arrives at the network, it reaches its destination host using another strategy that
wediscusslater.Figure18.22showstheidea.Afterthenetworkaddresshasbeen
Figure18.22Networkaddress
2
1 m
Router
bmcmdmemm
found, the router consults its forwarding table to find the corresponding interface from
which the packet should be sent out. The network address is actually the identifier of
the network; each network is identified by its network address.
CHAPTER18INTRODUCTIONTONETWORKLAYER 537
BlockAllocation
Thenextissueinclasslessaddressingisblockallocation.Howaretheblocksallocated? The
ultimate responsibility of block allocation is given to a global authority called the
Internet Corporation for Assigned Names and Numbers (ICANN). However, ICANN
does not normally allocate addresses to individual Internet users. It assigns a large
block of addresses to an ISP (or a larger organization that is considered an ISP in this
case). For the proper operation of the CIDR, two restrictions need to be applied to the
allocated block.
1. Thenumberofrequestedaddresses,N,needstobeapowerof2.Thereasonisthat N
232nor n 32 log2N. If N is not a power of 2, we cannot have an integer value
for n.
2. The requested block needs to be allocated where there is an adequate number of
contiguous addresses available in the address space. However, there is a restric-
tion on choosing the first address in the block. The first address needs to be
divisible by the number of addresses in the block. The reason is that the first
address needs to be the prefix followed by (32 n) number of 0s. The decimal
value of the first address is then
firstaddress(prefixindecimal)232n(prefixindecimal)N.
Example18.4
An ISP has requested a blockof 1000 addresses. Since 1000 is not a power of 2, 1024 addresses
are granted. The prefix length is calculated as n 32 log21024 22. An available block,
18.14.12.0/22, is granted to the ISP. It can be seen that the first address in decimal is
302,910,464, which is divisible by 1024.
Subnetting
More levels of hierarchy can be created using subnetting. An organization (or an ISP)
that is granted a range of addresses may divide the range into several subranges and
assigneachsubrangetoasubnetwork(orsubnet).Notethatnothingstopstheorganization
fromcreatingmorelevels.Asubnetworkcanbedividedintoseveralsub-subnetworks. A
sub-subnetwork can be divided into several sub-sub-subnetworks, and so on.
DesigningSubnets
Thesubnetworksinanetworkshouldbecarefullydesignedtoenabletheroutingofpack-
ets.WeassumethetotalnumberofaddressesgrantedtotheorganizationisN,theprefix
lengthisn,theassignednumberofaddressestoeachsubnetworkisNsub,andtheprefix
lengthforeachsubnetworkisnsub.Thenthefollowingstepsneedtobecarefullyfollowed to
guarantee the proper operation of the subnetworks.
❑ Thenumberofaddressesineachsubnetworkshouldbeapowerof 2.
❑ Theprefix length foreach subnetwork should befound using thefollowing formula:
nsub 32log2Nsub
538 PARTIVNETWORKLAYER
Example18.5
An organization is granted a block of addresses with the beginning address 14.24.74.0/24. The
organizationneedstohave3subblocksofaddressestouseinitsthreesubnets:onesubblockof10
addresses,onesubblockof60addresses,andonesubblockof120addresses.Designthesubblocks.
Solution
Thereare232–24256addressesinthisblock.Thefirstaddressis14.24.74.0/24;thelastaddress is
14.24.74.255/24. To satisfy the third requirement, we assign addresses to subblocks, starting
with the largest and ending with the smallest one.
a. The number of addresses in the largest subblock, which requires 120 addresses, is not a
powerof2.Weallocate128 addresses.Thesubnetmaskforthissubnetcanbefoundas
n132log212825. The first address in this block is 14.24.74.0/25; the last address is
14.24.74.127/25.
b. Thenumberofaddressesinthesecondlargestsubblock,whichrequires60addresses,isnot
apowerof2either.Weallocate64addresses.Thesubnetmaskforthissubnetcanbefound as
n232log26426.Thefirstaddressinthisblockis14.24.74.128/26;thelastaddress is
14.24.74.191/26.
c. The number of addresses in the smallest subblock, which requires 10 addresses, is not a
powerof2either.Weallocate16addresses.Thesubnetmaskforthissubnetcanbefoundas n332
log216 28. The first address in this block is 14.24.74.192/28; the last address is
14.24.74.207/28.
If we add all addresses in the previous subblocks, the result is 208 addresses, which
means48addressesareleftinreserve.Thefirstaddressinthisrangeis14.24.74.208.The last
address is 14.24.74.255. We don’t know about the prefix length yet. Figure 18.23
showstheconfigurationofblocks.Wehaveshownthefirstaddressineachblock.
AddressAggregation
OneoftheadvantagesoftheCIDRstrategyisaddressaggregation(sometimescalled
address summarization or route summarization). When blocks of addresses are com-
bined to create a larger block, routing can be done based on the prefix of the larger
block.ICANNassignsalargeblockofaddressestoanISP.EachISPinturndividesits assigned
block into smaller subblocks and grants the subblocks to its customers.
Example18.6
Figure 18.24 shows how four small blocks of addresses are assigned to four organizations by an
ISP.TheISP combines thesefour blocks into one single block and advertises the larger block to
therestoftheworld.AnypacketdestinedforthislargerblockshouldbesenttothisISP.Itisthe
responsibilityoftheISPtoforwardthepackettotheappropriateorganization.Thisissimilarto
CHAPTER18INTRODUCTIONTONETWORKLAYER 539
Figure18.23SolutiontoExample18.5
N = 256 addresses
n = 24
14.24.74.0/24 14.24.74.255/24
First address Lastaddress
a. Original block
N = 128 64 16 48
b. Subblocks
Figure18.24Exampleofaddressaggregation
160.70.14.0/26
Block 1 to160.70.14.63/26
All packets
withdestinationaddre
160.70.14.64/26 sses160.70.14.0/24
Block 2 to160.70.14.127/26 ISP to160.70.14.255/2
4
aresent to ISP.
Internet
160.70.14.128/26
Block 3 to160.70.14.191/26
Largerblock
160.70.14.192/26
Block 4 to160.70.14.255/26
routingwecanfindinapostalnetwork.Allpackagescomingfromoutsideacountryaresentfirst to the
capital and then distributed to the corresponding destination.
SpecialAddresses
Before finishing the topic of addresses in IPv4, we need to mention five special
addresses that are used for special purposes: this-host address, limited-broadcast
address, loopback address, private addresses, and multicast addresses.
This-hostAddress
Theonlyaddressintheblock0.0.0.0/32iscalledthethis-hostaddress.Itisusedwhen-
everahostneedstosendanIPdatagrambutitdoesnotknowitsownaddresstouseas the source
address. We will see an example of this case in the next section.
540 PARTIVNETWORKLAYER
Limited-broadcastAddress
Theonlyaddressintheblock255.255.255.255/32iscalledthelimited-broadcastaddress.
Itisusedwheneverarouterorahostneedstosendadatagramtoalldevicesinanetwork.
Theroutersinthenetwork,however,blockthepackethavingthisaddressasthedestina-
tion;thepacketcannottraveloutsidethenetwork.
LoopbackAddress
The block 127.0.0.0/8 is called the loopback address. A packet with one of the
addressesinthisblockasthedestinationaddressneverleavesthehost;itwillremainin
thehost.Anyaddressintheblockisusedtotestapieceofsoftwareinthemachine.For
example,wecanwriteaclientandaserverprograminwhichoneoftheaddressesinthe
blockisusedastheserveraddress.Wecantesttheprogramsusingthesamehosttosee if they
work before running them on different computers.
PrivateAddresses
Four blocks are assigned as private addresses: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16,
and 169.254.0.0/16. We will see the applications of these addresses when we discuss
NAT later in the chapter.
MulticastAddresses
The block 224.0.0.0/4 is reserved for multicast addresses. We discuss these addresses
later in the chapter.
DynamicHostConfigurationProtocol(DHCP)
We have seen that a large organization or an ISP can receive a block of addresses
directlyfromICANNandasmallorganizationcanreceiveablockofaddressesfroman
ISP.Afterablockofaddressesareassignedtoanorganization,thenetworkadministra-
tioncanmanuallyassignaddressestotheindividualhostsorrouters.However,address
assignment in an organization can be done automatically using the Dynamic Host
Configuration Protocol (DHCP). DHCP is an application-layer program, using the
client-server paradigm, that actually helps TCP/IP at the network layer.
DHCP has found such widespread use in the Internet that it is often called a plug-
and-playprotocol.Incanbeusedinmanysituations.Anetworkmanagercanconfigure
DHCPtoassignpermanentIPaddressestothehostandrouters.DHCPcanalsobecon- figured
to provide temporary, on demand, IP addresses to hosts. The second capability can
provide a temporary IP address to a traveller to connect her laptop to the Internet while
she is staying in the hotel. It also allows an ISP with 1000 granted addresses to provide
services to 4000 households, assuming not more than one-forth of customers use the
Internet at the same time.
InadditiontoitsIP address, acomputeralsoneedstoknow thenetworkprefix(or
address mask). Most computers also need two other pieces of information, such as the
addressofadefaultroutertobeabletocommunicatewithothernetworksandtheaddress
ofanameservertobeabletousenamesinsteadofaddresses,aswewillseeinChapter26.
Inotherwords,fourpiecesofinformationarenormallyneeded:thecomputeraddress,the
prefix,theaddressofarouter,andtheIPaddressofanameserver.DHCPcanbeusedto provide
these pieces of information to the host.
CHAPTER18INTRODUCTIONTONETWORKLAYER 541
DHCPMessageFormat
DHCP is a client-server protocol in which the client sends a request message and the
server returns a response message. Before we discuss the operation of DHCP, let us
show the general format of the DHCP message in Figure 18.25. Most of the fields are
explained in the figure, but we need to discuss the option field, which plays a very
important role in DHCP.
Figure18.25DHCPmessageformat
0 8 16 24 31
Opcode Htype HLen HCount Fields:
TransactionID Opcode:Operationcode,request(1)orreply(2)Htyp
Timeelapsed Flags e: Hardware type (Ethernet, ...)
ClientIPaddress HLen:Length ofhardware address
YourIPaddress HCount:Maximumnumber ofhopsthe packetcan travel
ServerIP address Transaction ID:An integer set by the client and repeated by the
GatewayIPaddress serverTime elapsed: The number of seconds since the client started to
Clienthardwareaddress bootFlags:Firstbitdefinesunicast(0)ormulticast(1);other15bitsnotusedCli
ent IP address: Set to 0 if the client does not know it
YourIPaddress:TheclientIPaddresssentbythe server
Servername ServerIPaddress:AbroadcastIPaddressifclientdoesnotknowitGateway IP
address: The address of default router
Bootfilename Servername:A64-byte domainname oftheserver
Bootfilename:A128-
Options bytefilenameholdingextrainformationOptions: A 64-byte field
with dual purpose described in text
The64-byteoptionfieldhasadualpurpose.Itcancarryeitheradditionalinforma- tion or
some specific vendor information. The server uses a number, called a magic
cookie,intheformatofanIPaddresswiththevalueof99.130.83.99.Whentheclientfin-
ishesreadingthemessage,itlooksforthismagiccookie.Ifpresent,thenext60bytesare
options.An optioniscomposedof three fields:a 1-bytetagfield, a 1-bytelengthfield, and a
variable-length value field. There are several tag fields that are mostly used by
vendors.Ifthetagfieldis53,thevaluefielddefinesoneofthe8messagetypesshownin
Figure18.26.WeshowhowthesemessagetypesareusedbyDHCP.
Figure18.26Optionformat
1 DHCPDISCOVER
2 DHCPOFFER 5 DHCPACK
3 DHCPREQUEST 6 DHCPNACK
7 DHCPRELEASE
4 DHCPDECLINE
8 DHCPINFORM
53 1
TagLengthValue
DHCPOperation
Figure18.27showsasimplescenario.
542 PARTIVNETWORKLAYER
Figure18.27OperationofDHCP
Client ServerIPAddress:181.14.16.170
IPAddress:?
DHCPDISCOVER Legend
TransactionID:1001Leasetime: Application
Clientaddress:
Youraddress:Serveraddress:
UDP
IP
Sourceport:68Destinationport:67 DHCPOFFER Note:
Sourceaddress:0.0.0.0 TransactionID:1001 Onlypartiali
Destinationaddress:255.255.255.255. Leasetime:3600Clientaddress: nformationi
Youraddress:181.14.16.182 sgiven.
Serveraddress:181.14.16.170
Sourceport:67Destinationport:68
DHCPREQUEST Sourceaddress:181.14.16.170
TransactionID:1001 Destinationaddress:255.255.255.255.
Leasetime:3600
Clientaddress:181.14.16.182Youraddress:
Serveraddress:181.14.16.170
Sourceport:68Destinationport:67
Sourceaddress:181.14.16.182 DHCPACK
Destinationaddress:255.255.255.255. TransactionID:1001
Leasetime:3600Clientaddress:
Youraddress:181.14.16.182
Serveraddress:181.14.16.170
Sourceport:67Destinationport:68
Sourceaddress:181.14.16.170
Destinationaddress:255.255.255.255.
Time Time
1. ThejoininghostcreatesaDHCPDISCOVERmessageinwhichonlythetransaction- ID
field is set to a random number. No other field can be set because the host has no
knowledge with which to do so. This message is encapsulated in a UDP user
datagram with the source port set to 68 and the destination port set to 67. We will
discussthereasonforusingtwowell-knownportnumberslater.Theuserdatagram is
encapsulated in an IP datagram with the source address set to 0.0.0.0 (“this host”)
and the destination address set to 255.255.255.255 (broadcast address).The reason
is that the joining host knows neither its own address nor the server address.
2. The DHCP server or servers (if more than one) responds with a DHCPOFFER
messageinwhichtheyouraddressfielddefinestheofferedIPaddressforthejoin-
inghostandtheserveraddressfieldincludestheIPaddressoftheserver.Themes- sage
also includes the lease time for which the host can keep the IP address. This
messageisencapsulatedinauserdatagramwiththesameportnumbers,butinthe reverse
order. The user datagram in turn is encapsulated in a datagram with the server
address as the source IP address, but the destination address is a broadcast
address, in which the server allows other DHCP servers to receive the offer and
give a better offer if they can.
CHAPTER18INTRODUCTIONTONETWORKLAYER 543
TransitionStates
ThepreviousscenarioswediscussedfortheoperationoftheDHCPwereverysimple.To
provide dynamic address allocation, the DHCP client acts as a state machine that
performstransitionsfromonestatetoanotherdependingonthemessagesitreceivesor
sends.Figure18.28showsthetransitiondiagramwiththemainstates.
Figure18.28FSMfortheDHCPclient
Join
INIT
_/ DHCPDISCOVER
DHCPOFFER
DHCPNACK
SELECTING
SelectOffer/DHCPREQUEST
REQUESTING
Leasetimeexpiredor
Leasetime50%expired / DHCPACK Lease cancelled/ DHCPNACK
DHCPREQUEST DHCPRELEASE
BOUND
DHCPACK DHCPACK
RENEWING REBINDING
Leasetime75%expired /
DHCPREQUEST
When the DHCP client first starts, it is in the INIT state (initializing state). The
client broadcasts a discover message. When it receives an offer, the client goes to the
SELECTINGstate.Whileitisthere,itmayreceivemoreoffers.Afteritselectsanoffer,it
sendsarequestmessageandgoestotheREQUESTINGstate.IfanACKarriveswhilethe
clientisinthisstate,itgoestotheBOUNDstateandusestheIPaddress.Whentheleaseis
50percentexpired,theclienttriestorenewitbymovingtotheRENEWINGstate.Ifthe server
renews the lease, the client moves to the BOUND state again. If the lease is not
renewedandtheleasetimeis75percentexpired,theclientmovestotheREBINDING
state.Iftheserveragreeswiththelease(ACKmessagearrives),theclientmovestothe
BOUNDstateandcontinuesusingtheIPaddress;otherwise,theclientmovestotheINIT
stateandrequestsanotherIPaddress.NotethattheclientcanusetheIPaddressonlywhen
itisintheBOUND,RENEWING,orREBINDINGstate.Theaboveprocedurerequires
thattheclientusesthreetimers:renewaltimer(setto50percentoftheleasetime),rebind-
ingtimer(setto75percentoftheleasetime),andexpirationtimer(settotheleasetime).
NetworkAddressResolution(NAT)
The distribution of addresses through ISPshascreated a new problem. Assume that an
ISP has granted a small range of addresses to a small business or a household. If the
business grows or the household needs a larger range, theISPmay not beable to grant
the demand because the addresses before and after the range may have already been
CHAPTER18INTRODUCTIONTONETWORKLAYER 545
allocatedtoothernetworks.Inmostsituations,however,onlyaportionofcomputersin
546 PARTIVNETWORKLAYER
asmallnetworkneedaccesstotheInternetsimultaneously.Thismeansthatthenumber
ofallocatedaddressesdoesnothavetomatchthenumberofcomputersinthenetwork.
Forexample,assumethatinasmallbusinesswith20computersthemaximumnumber of
computers that access the Internet simultaneously is only 4. Most of the computers are
either doing some task that does not need Internet access or communicating with each
other. This small business can use the TCP/IP protocol for both internal and uni- versal
communication. The business can use 20 (or 25) addresses from the private block
addresses (discussed before) for internal communication; five addresses for uni- versal
communication can be assigned by the ISP.
A technology that can provide the mapping between the private and universal
addresses, and at the same time support virtual private networks, which we discuss in
Chapter 32, is Network Address Translation (NAT). The technology allows a site to
use a set of private addresses for internal communication and a set of global Internet
addresses (at least one) for communication with the rest of the world. The site must
haveonlyoneconnectiontotheglobalInternetthroughaNAT-capablerouterthatruns NAT
software. Figure 18.29 shows a simple implementation of NAT.
Figure18.29NAT 172.18.3.30
172.18.3.1
200.24.5.8
172.18.3.2
Internet
NAT
router
172.18.3.20
Siteusingprivate addresses
As the figure shows, the private network uses private addresses. The router that
connects the network to the global address uses one private address and one global
address.TheprivatenetworkisinvisibletotherestoftheInternet;therestoftheInter- net sees
only the NAT router with the address 200.24.5.8.
AddressTranslation
All of the outgoing packets go through the NAT router, which replaces the source
address in the packet with the global NAT address. All incoming packets also pass
throughtheNATrouter,whichreplacesthedestinationaddressinthepacket(theNAT
routerglobaladdress)withtheappropriateprivateaddress.Figure18.30showsanexam- ple of
address translation.
TranslationTable
The reader may have noticed that translating the source addresses for an outgoing
packet is straightforward. But how does the NAT router know the destination address
forapacketcomingfromtheInternet?TheremaybetensorhundredsofprivateIP
CHAPTER18INTRODUCTIONTONETWORKLAYER 547
Figure18.30Addresstranslation
172.18.3.1
Source:172.18.3.1Source:200.24.5.8
172.18.3.2
Internet
172.18.3.20
Destination:172.18.3.1Destination:200.24.5.8
Siteusingprivate addresses
addresses,eachbelongingtoonespecifichost.TheproblemissolvediftheNATrouter has a
translation table.
UsingOneIPAddress
In its simplest form, a translation table has only two columns: the private address and
the external address (destination address of the packet). When the router translates the
source address of the outgoing packet, it also makes note of the destination address—
where the packet is going. When the response comes back from the destination, the
routerusesthe source addressofthepacket(asthe externaladdress)tofindtheprivate
address of the packet. Figure 18.31 shows the idea.
Figure18.31Translation
Privatenetwork
S: S:
172.18.3.1
172.18.3.1 2 S: S:
200.24.5.8
200.24.5.8
D: D:25.8.2.10
25.8.2.10 D: D:25.8.2.10
25.8.2.10
Data Data Data Data
Legend
1 TranslationTable S:Source address
Private Universal D:Destination address
Maketableentry
172.18.3.1 25.8.2.10
Change source address
Access table
Change destination address
4
Privatenetwork 3
S: 25.8.2.10
D:S:172.18.3.1
25.8.2.10 S:S:25.8.2.10
25.8.2.10
D: 172.18.3.1 D: 200.24.8.5
D:200.24.8.5
Data Data Data
Data
Aswewillsee,NATisusedmostlybyISPsthatassignasingleaddresstoacustomer. The
customer, however, may be a member of a private network that has many private
addresses. In this case, communication with the Internet is always initiated from the
customer site, using a client program such as HTTP, TELNET, or FTP to access the
corresponding server program. For example, when e-mail that originates from outside
the network site is received by the ISP e-mail server, it is stored in the mailbox of the
customer until retrieved with a protocol such as POP.
UsingaPoolofIPAddresses
TheuseofonlyoneglobaladdressbytheNATrouterallowsonlyoneprivate-network
hosttoaccessagivenexternalhost.Toremovethisrestriction,theNATroutercanuse a pool of
global addresses. For example, instead of using only one global address
(200.24.5.8),the NAT router can use four addresses(200.24.5.8, 200.24.5.9,
200.24.5.10,and200.24.5.11).Inthiscase,fourprivate-networkhostscancommunicate
with the same external host at the same time because each pair of addresses defines a
separate connection. However, there are still some drawbacks. Nomorethanfourcon-
nections can be made to the same destination. No private-network host can access two
external server programs (e.g., HTTP and TELNET) at the same time. And, likewise,
twoprivate-networkhostscannotaccessthesameexternalserverprogram(e.g.,HTTP or
TELNET) at the same time.
UsingBothIPAddressesandPortAddresses
To allow a many-to-many relationship between private-network hosts and external
server programs, we need more information in the translation table. For example, sup-
posetwohostsinsideaprivatenetworkwithaddresses172.18.3.1and172.18.3.2need to
access the HTTP server on external host 25.8.3.2. If the translation table has five
columns, instead of two, that include the source and destination port addresses and the
transport-layer protocol, the ambiguity is eliminated. Table 18.1 shows an example of
such a table.
Table18.1Five-columntranslationtable
Private Private External External Transport
address port address port protocol
172.18.3.1 1400 25.8.3.2 80 TCP
172.18.3.2 1401 25.8.3.2 80 TCP
Note that when the response from HTTP comes back, the combination of source
address (25.8.3.2) and destination port address (1401) defines the private network host
towhichtheresponseshouldbedirected.Notealsothatforthistranslationtowork,the
ephemeral port addresses (1400 and 1401) must be unique.
FORWARDINGOFIPPACKETS
We discussed the concept of forwarding at the network layer earlier in this chapter. In
this section, we extend the concept to include the role of IP addresses in forwarding.As
we discussed before, forwarding means to place the packet in its route toits destination.
CHAPTER18INTRODUCTIONTONETWORKLAYER 549
ForwardingBasedonDestinationAddress
We first discuss forwarding based on the destination address. This is a traditional
approach,whichisprevalenttoday.Inthiscase,forwardingrequiresahostorarouterto
haveaforwardingtable.Whenahosthasapackettosendorwhenarouterhasreceiveda
packettobeforwarded,itlooksatthistabletofindthenexthoptodeliverthepacketto.
Inclassless addressing, the whole addressspace is one entity; thereareno classes.
This means that forwarding requires one row of information for each block involved.
The table needs to be searched based on the network address (first address in the
block).Unfortunately,thedestinationaddressinthepacketgivesnoclueaboutthenet- work
address. To solve the problem, we need to include the mask (/n) in the table. In
otherwords,aclasslessforwardingtableneedstoincludefourpiecesofinformation:the mask,
the network address, the interface number, and the IP address of the next router
(needed to find the link-layer address of the next hop, as we discussed in Chapter 9).
However, we often see in the literature that the first two pieces are combined. For
example,ifnis26andthenetworkaddressis180.70.65.192,thenonecancombinethe two as
one piece of information: 180.70.65.192/26. Figure 18.32 shows a simple for- warding
module and forwarding table for a router with only three interfaces.
Figure18.32Simplifiedforwardingmoduleinclasslessaddress
m1
m0
m2
Forwardingtable
Networkaddressincluding
Next-hopIPaddress
mask Interface
Extractdestinationaddress Searchtable
Packet x0.y0.z0.t0/n0 ............. m0m1m2
x1.y1.z1.t1/n1 .............
x2.y2.z2.t2/n1 .............
Interfacenumberandnext-hopaddress
Toothermodulesorprotocols
The job of the forwarding module is to search the table, row by row. In each row,
the n leftmost bits of the destination address (prefix) are kept and the rest of the bits
(suffix) are set to 0s. If the resulting address (which we call the network address),
matcheswiththeaddressinthefirstcolumn,theinformationinthenexttwocolumnsis
550 PARTIVNETWORKLAYER
extracted;otherwisethesearchcontinues.Normally,thelastrowhas a defaultvaluein
thefirstcolumn(notshowninthefigure),whichindicatesalldestinationaddressesthat did not
match the previous rows.
Sometimes,theliteratureexplicitlyshowsthevalueofthenleftmostbitsthatshould
bematchedwiththenleftmostbitsofthedestinationaddress.Theconceptisthesame,
butthepresentationisdifferent.Forexample,insteadofgivingtheaddress-maskcombi-
nationof180.70.65.192/26,wecangivethevalueofthe26leftmostbitsasshownbelow.
10110100 01000110 01000001 11
Note that we still need to use an algorithm to find the prefix and compare it with
the bit pattern. In other words, the algorithm is still needed, but the presentation is dif-
ferent. We use this format in our forwarding tables in the exercises when we use
smaller address spaces just for practice.
Example18.7
MakeaforwardingtableforrouterR1usingtheconfigurationinFigure18.33.
Figure18.33ConfigurationforExample18.7
180.70.65.128/25
180.70.65.135/25
201.4.16.0/22 m0 201.4.22.0/24
m1m3
201.4.16.2/22 201.4.22.3/24
m2R1
180.70.65.194/26
180.70.65.192/26
180.70.65.200/26
RestoftheInternet
Solution
Table18.2showsthecorrespondingtable.
Table18.2ForwardingtableforrouterR1inFigure18.33
Networkaddress/mask Nexthop Interface
180.70.65.192/26 m2
180.70.65.128/25 m0
201.4.22.0/24 m3
201.4.16.0/22 m1
Default 180.70.65.200 m2
CHAPTER18INTRODUCTIONTONETWORKLAYER 551
Example18.8
InsteadofTable18.2,wecanuseTable18.3,inwhichthenetworkaddress/maskisgiveninbits.
Table18.3ForwardingtableforrouterR1inFigure18.33usingprefixbits
Leftmostbitsinthedestinationaddress Nexthop Interface
10110100010001100100000111 m2
1011010001000110010000011 m0
110010010000010000011100 m3
1100100100000100000100 m1
Default 180.70.65.200 m2
When a packet arrives whose leftmost 26 bits in the destination address match the
bits in the first row, the packet is sent out from interface m2. When a packet arrives
whoseleftmost25bitsintheaddressmatchthebitsinthesecondrow,thepacketissent out from
interface m0, and so on. The table clearly shows that the first row has the longest
prefix and the fourth row has the shortest prefix. The longer prefix means a smaller
range of addresses; the shorter prefix means a larger range of addresses.
Example18.9
Show the forwarding process if a packet arrives at R1 in Figure 18.33 with the destination
address 180.70.65.140.
Solution
Therouterperformsthefollowingsteps:
1. Thefirstmask(/26)isappliedtothedestinationaddress.Theresultis180.70.65.128,which does
not match the corresponding network address.
2. The second mask (/25) is applied to the destination address. The result is 180.70.65.128,
which matches the corresponding network address. The next-hop address and the interface
number m0 are extracted for forwarding the packet.
AddressAggregation
When we use classful addressing, there is only one entry in the forwarding table for
each site outside the organization. The entry defines the site even if that site is sub-
netted. When a packet arrives at the router, the router checks the corresponding entry
andforwardsthepacketaccordingly.Whenweuseclasslessaddressing,itislikelythat
thenumberofforwardingtableentrieswillincrease.Thisisbecausetheintentofclass- less
addressing is to divide up the whole address space into manageable blocks. The
increasedsizeofthetableresultsinanincreaseintheamountoftimeneededtosearch the table.
To alleviate the problem, the idea of address aggregation was designed. In Figure
18.34 we have two routers.
R1isconnectedtonetworksoffourorganizationsthateachuse64addresses.R2is
somewherefarfromR1.R1hasalongerforwardingtablebecauseeachpacketmustbe
correctlyroutedtotheappropriateorganization.R2,ontheotherhand,canhaveavery
smallforwardingtable.ForR2,anypacketwithdestination140.24.7.0to140.24.7.255
552 PARTIVNETWORKLAYER
Figure18.34Addressaggregation
140.24.7.0/26
Organization1
140.24.7.64/26
Organization2
m1m0
m4 m0 m1
140.24.7.128/26 R1 R2
Organization3 m2m3 Somewhere
in the Internet
140.24.7.192/26
Organization4
is sent out from interface m0 regardless of the organization number. This is called
address aggregation because the blocks of addresses for four organizations are aggre-
gated into one larger block.R2 would have a longer forwarding table if each organiza-
tion had addresses that could not be aggregated into one block.
LongestMaskMatching
What happens if one of the organizations in the previous figure is not geographically
close to the other three? For example, if organization 4 cannot be connected to router
R1 for some reason, can we still use the idea of address aggregation and still assign
block 140.24.7.192/26 to organization 4? The answer is yes, because routing in class-
less addressing uses another principle, longest mask matching. This principle states
thattheforwardingtableissortedfromthelongestmasktotheshortestmask.In other words, if
there are three masks, /27, /26, and /24, the mask /27 must be the first entry and /24
must be the last. Let us see if this principle solves the situation in which
organization4isseparatedfromtheotherthreeorganizations.Figure18.35shows the
situation.
Suppose a packet arrives at router R2 for organization 4 with destination address
140.24.7.200. The first mask at router R2 is applied, which gives the network address
140.24.7.192. The packet is routed correctly from interface m1 and reaches organiza-
tion 4. If, however, the forwarding table was not stored with the longest prefix first,
applying the /24 mask would result in the incorrect routing of the packet to router R1.
CHAPTER18INTRODUCTIONTONETWORKLAYER 553
Figure18.35Longestmaskmatching
140.24.7.0/26
Forwarding table for R2
Organization1
Networkadd Next-
Interface
ress/mask hopaddr
140.24.7.64/26 ess
140.24.7.192/26 ---------- m1
Organization2
140.24.7.0/24 addressofR1 m0
0.0.0.0/0 default router m2
m1m0
m3 m0 m2
140.24.7.128/26 R1 R2m1
Organization3 m2
Organization4
Forwarding table for R1
140.24.7.192/26
Networkaddr Next-
Interface
ess/mask hopaddr
ess
140.24.7.0/26 ---------- m0
140.24.7.64/26 ---------- m1
140.24.7.128/26 ---------- m2
0.0.0.0/0 default router m3
HierarchicalRouting
To solve the problem of gigantic forwarding tables, we can create a sense of hierarchy
intheforwardingtables.InChapter2,wementionedthattheInternettodayhasasense of
hierarchy. We said that the Internet is divided into backbone and national ISPs.
National ISPs are divided into regional ISPs, and regional ISPs are divided into local
ISPs. If the forwarding table has a sense of hierarchy like the Internet architecture, the
forwarding table can decrease in size.
LetustakethecaseofalocalISP.AlocalISPcanbeassignedasingle,butlarge,block
ofaddresseswithacertainprefixlength.ThelocalISPcandividethisblockintosmaller
blocksofdifferentsizes,andassignthesetoindividualusersandorganizations,bothlarge
andsmall.IftheblockassignedtothelocalISPstartswitha.b.c.d/n,theISPcancreate
blocksstartingwithe.f.g.h/m,wheremmayvaryforeachcustomerandisgreaterthann.
Howdoesthisreducethesizeoftheforwardingtable?TherestoftheInternetdoes not
have to be aware of this division. All customers of the local ISP are defined as
a.b.c.d/n to the rest of the Internet. Every packet destined for one of the addresses in
thislargeblockisroutedtothelocal ISP.There isonly one entry in everyrouterinthe world
for all of these customers. They all belong to the same group. Of course, inside the
local ISP, the router must recognize the subblocks and route the packet to the des-
tinedcustomer.Ifoneofthecustomersisalargeorganization,italsocancreateanother level of
hierarchy by subnetting and dividing its subblock into smaller subblocks (or sub-
subblocks).Inclasslessrouting,thelevelsofhierarchyareunlimitedaslongaswe follow the
rules of classless addressing.
554 PARTIVNETWORKLAYER
Example18.10
As an example of hierarchical routing, let us consider Figure 18.36. A regional ISP is granted
16,384 addresses starting from 120.14.64.0. The regional ISP has decided to divide this block
into four subblocks, each with 4096 addresses. Three of these subblocks are assigned to three
localISPs;thesecondsubblockisreservedforfutureuse.Notethatthemaskforeachblockis/20 because
the original block with mask /18 is divided into 4 blocks.
Figure18.36HierarchicalroutingwithISPs
SOrg01 120.14.112.0/24
Local 120.14.112.0/20
SOrg16 ISP3 Total4096
ThefirstlocalISPhasdivideditsassignedsubblockinto8smallerblocksandassignedeach to a
small ISP. Each small ISP provides services to 128 households (H001 to H128), each using four
addresses. Note that the mask for each small ISP is now /23 because the block is further divided
into 8 blocks. Each household has a mask of /30, because a household has only 4 addresses(2 32–
30
4).ThesecondlocalISPhasdivideditsblockinto4blocksandhasassigned
theaddressesto4largeorganizations(LOrg01toLOrg04).Notethateachlargeorganizationhas 1024
addresses and the mask is /22.
The third local ISP has divided its block into 16 blocks and assigned each block to a small
organization(SOrg01toSOrg16).Eachsmallorganizationhas256addressesandthemaskis/24. There
is a sense of hierarchy in this configuration. All routers in the Internet send a packet with
destination address 120.14.64.0 to 120.14.127.255 to the regional ISP. The regional ISP sends
every packet with destination address 120.14.64.0 to 120.14.79.255 to Local ISP1. Local ISP1
sends every packet with destination address 120.14.64.0 to 120.14.64.3 to H001.
GeographicalRouting
Todecreasethesizeoftheforwardingtableevenfurther,weneedtoextendhierarchical
routingtoincludegeographicalrouting.Wemustdividetheentireaddressspaceintoa few
large blocks. We assign a block to America, a block to Europe, a block to Asia, a
blocktoAfrica,andsoon.TheroutersofISPsoutsideofEuropewillhaveonlyoneentry
forpacketstoEuropeintheirforwardingtables.TheroutersofISPsoutsideofAmerica
willhaveonlyoneentryforpacketstoAmericaintheirforwardingtables,andsoon.
CHAPTER18INTRODUCTIONTONETWORKLAYER 555
ForwardingTableSearchAlgorithms
In classless addressing, there isno network information inthedestinationaddress.The
simplest,butnotthemostefficient,searchmethodiscalledthelongestprefixmatch(as we
discussed before). The forwarding table can be divided into buckets, one for each
prefix.Therouterfirsttriesthelongestprefix.Ifthedestinationaddressisfoundinthis bucket,
the search is complete. If the address is not found, the next prefix is searched, and so
on. It is obvious that this type of search takes a long time.
One solution is to change the data structure used for searching. Instead of a list,
otherdatastructures(suchasatreeorabinarytree)canbeused.Onecandidateisatrie (a special
kind of tree). However, this discussion is beyond the scope of this book.
ForwardingBasedonLabel
In the 1980s, an effort started to somehow change IP to behave like a connection-
orientedprotocolinwhichtheroutingisreplacedbyswitching.Aswediscussedearlier in the
chapter, in a connectionless network (datagram approach), a router forwards a packet
based on the destination address in the header of the packet. On the other hand,
inaconnection-orientednetwork(virtual-circuitapproach),aswitchforwardsapacket based
on the label attached to the packet. Routing is normally based on searching the
contentsofatable; switching canbedone byaccessing atableusinganindex.Inother words,
routing involves searching; switching involves accessing.
Example18.11
Figure18.37showsasimpleexampleofsearchinginaforwardingtableusingthelongest mask
algorithm. Although there are some more efficient algorithms today, the principle is the same.
Figure18.37Example18.11:Forwardingbasedondestinationaddress
Forwardingtable
Mask(/n) Networkaddress
Next-hopaddress
Interface
32 NFNF Legend
32
31 NF :CompareNF: Not found
31 NFNFNFNFF F: Found
31
31
31 y 2
30 Interfaceandnext-hopaddress
29
Destinationaddress
0
x 1
y
2 x
556 PARTIVNETWORKLAYER
Example18.12
Figure18.38showsasimpleexampleofusingalabeltoaccessaswitchingtable.Sincethelabels are used as
the index to the table, finding the information in the table is immediate.
Figure18.38Example18.12:Forwardingbasedonlabel
Labelused
asindex SwitchingTable
0000 Interface Nextlabel
0001
0002
0003
0004
0005 2 0012
0006
Label
1000 Interfaceandl
abel address
0
Switch 1
0004
2 0012
Multi-ProtocolLabelSwitching(MPLS)
During the 1980s, several vendors created routers that implement switching technol-
ogy.LaterIETFapprovedastandardthatiscalledMulti-ProtocolLabelSwitching. In this
standard, some conventional routers in the Internet can be replaced by MPLS routers,
which can behave like a router and a switch. When behaving like a router, MPLS can
forward the packet based on the destination address; when behaving like a switch, it
can forward a packet based on the label.
ANew Header
Tosimulateconnection-orientedswitchingusingaprotocollikeIP,thefirstthingthatis
needed is to add a field to the packet that carries the label discussed later. The IPv4
packet format does not allow this extension (although this field is provided in the IPv6
packetformat,aswewillseelater).ThesolutionistoencapsulatetheIPv4packetinan
MPLSpacket(asthoughMPLSwerealayerbetweenthedata-linklayerandthe
CHAPTER18INTRODUCTIONTONETWORKLAYER 557
networklayer).ThewholeIPpacketisencapsulatedasthepayloadinanMPLSpacket and an
MPLS header is added. Figure 18.39 shows the encapsulation.
Figure18.39MPLSheaderaddedtoanIPpacket
MPLS IP IP
Header Header Payload
TheMPLSheaderisactuallyastackofsubheadersthatisusedformultilevelhier- archical
switching, as we will discuss shortly. Figure 18.40 shows the format of an MPLS
header in which each subheader is 32 bits (4 bytes) long.
Figure18.40MPLSheadermadeofastackof labels
0 20 24 31
Label Exp S TTL
Label Exp S TTL
Thefollowingisabriefdescriptionofeachfield:
❑ Label. This 20-bit field defines the label that is used to index the forwarding table
in the router.
❑ Exp.This3-bitfieldisreservedforexperimentalpurposes.
❑ S.Theone-bitstackfielddefinesthesituationofthesubheaderinthestack.When the bit
is 1, it means that the header is the last one in the stack.
❑ TTL. This 8-bit field is similar to the TTL field in the IP datagram. Each visited
router decrements the value of this field. When it reaches zero, the packet is dis-
carded to prevent looping.
HierarchicalSwitching
AstackoflabelsinMPLSallowshierarchicalswitching.Thisissimilartoconventional
hierarchicalrouting.Forexample,apacketwithtwolabelscanusethetoplabeltofor-
wardthepacketthroughswitchesoutsideanorganization;thebottomlabelcanbeused to
route the packet inside the organization to reach the destination subnet.
RoutersasPacketSwitches
Aswemayhaveguessedbynow,thepacketswitchesthatareusedinthenetworklayer are
called routers. Routers can be configured to act as either a datagram switch or a
virtual-circuitswitch.Wehavediscussedthestructureofapacket-switchinChapter8. The
discussion in that chapter can be applied to any router used in the Internet.
C H A P T E R 19
Network-LayerProtocols
I n the previous chapter we introduced the network layer and discussed the services
provided by this layer. We also discussed the logical addresses used in this layer. In
this chapter, we show how the network layer is implemented in the TCP/IP
protocolsuite.Theprotocolsinthenetworklayerhavegonethroughafewversions;inthis
chapter, we concentrateon thecurrent version (v4).Thenextgeneration,whichison the
horizon,isdiscussedinChapter22.
❑ The first section discusses the IPv4 protocol. It first describes the IPv4 datagram
format. It then explains the purpose of fragmentation in a datagram. The section
then briefly discusses options fields and their purpose in a datagram. The section
finally mentions some security issues in IPv4, which are addressed in Chapter 32.
❑ The second section discusses ICMPv4, one of the auxiliary protocols used in the
network layer to help IPv4. First, it briefly discusses the purpose of each option.
The section then shows how ICMP can be used as a debugging tool. The section
finally shows how the checksum is calculated for an ICMPv4 message.
❑ Thethirdsectiondiscusses the mobile IP,whoseuseisincreasing everydaywhen
people temporarily move their computers from one place to another. The section
first describes the issue of address change in this situation. It then shows the three
phases involved in the process. The section finally explains the inefficiency
involved in this process and some solutions.
561
562 PARTIVNETWORKLAYER
INTERNETPROTOCOL (IP)
Thenetworklayerinversion4can bethoughtofasonemainprotocol andthreeauxil-
iaryones.Themainprotocol,InternetProtocolversion4(IPv4),isresponsibleforpack-
etizing,forwarding,anddeliveryofapacketatthenetworklayer.TheInternetControl
MessageProtocolversion4(ICMPv4)helpsIPv4tohandlesomeerrorsthatmayoccur in the
network-layer delivery. The Internet Group Management Protocol (IGMP) is used to
help IPv4 in multicasting. The Address Resolution Protocol (ARP) is used to
gluethenetworkanddata-linklayersinmappingnetwork-layeraddressestolink-layer
addresses.Figure19.1showsthepositionsofthesefourprotocolsintheTCP/IPproto- col
suite.
Figure19.1PositionofIPandothernetwork-layerprotocolsinTCP/IPprotocolsuite
IGMP ICMP
Networklaye IP
r ARP
Data- UnderlyingLANorWAN
linklayer
Physicallaye technology
r
We will discuss IPv4 and ICMPv4 in this chapter. IGMP will be discussed when
we talk about multicasting in Chapter 21. We have discussed ARP in Chapter 9 when
we talked about link-layer addresses.
IPv4 is an unreliable datagram protocol—a best-effort delivery service. The term
best-effort means that IPv4 packets can be corrupted, be lost, arrive out of order, or be
delayed, and may create congestion for the network. If reliability is important, IPv4
must be paired with a reliable transport-layer protocol such as TCP. An example of a
more commonly understood best-effort delivery service is the post office. The post
officedoesitsbesttodelivertheregularmailbutdoesnotalwayssucceed.Ifanunreg- istered
letter is lost or damaged, it is up to the sender or would-be recipient to discover
this.Thepostofficeitselfdoesnotkeeptrackofeveryletterandcannotnotifyasender of loss
or damage of one.
IPv4isalsoaconnectionlessprotocolthatusesthedatagramapproach.Thismeans
thateachdatagramishandledindependently, andeachdatagramcanfollow adifferent route
to the destination. This implies that datagrams sent by the same source to the
samedestinationcouldarriveoutoforder.Again,IPv4reliesonahigher-levelprotocol to take
care of all these problems.
CHAPTER19NETWORK-LAYERPROTOCOLS 563
DatagramFormat
In this section, we begin by discussing the first service provided by IPv4, packetizing.
We show how IPv4 defines the format of a packet in which the data coming from the
upperlayerorotherprotocolsareencapsulated.PacketsusedbytheIParecalleddata- grams.
Figure 19.2 shows the IPv4 datagram format. A datagram is a variable-length packet
consisting of two parts: header and payload (data).Theheader is20 to 60bytes in length
and contains information essential to routing and delivery. It is customary in TCP/IP to
show the header in 4-byte sections.
Figure19.2IP datagram
20–65,535bytes
20–60bytes Legend
VER:versionnumber
HLEN:headerlengthb
Header Payload yte: 8 bits
a. IP datagram Flags D M
0 4 8 16 31
VER HLEN Service type Totallength16bits
4 bits 4 bits 8 bits Fragmentationoffset13bits
IdentificationFlags16 bits3 bits Headerchecksum16bits
Time-to-liveProtocol8bits8bits
b. Header
Discussing the meaning and rationale for the existence of each field is essential to
understanding the operation of IPv4; a brief description of each field is in order.
❑ VersionNumber.The4-bitversionnumber(VER)fielddefinestheversionofthe IPv4
protocol, which, obviously, has the value of 4.
❑ Header Length. The 4-bit header length (HLEN) field defines the total length of
the datagram header in 4-byte words. The IPv4 datagram has a variable-length
header.Whenadevicereceivesadatagram,itneedstoknowwhentheheaderstops and
the data, which is encapsulated in the packet, starts. However, to make the value
of the header length (number of bytes) fit in a 4-bit header length, the total length
of the header iscalculated as 4-bytewords.The total length is divided by 4 and the
value is inserted in the field. The receiver needs to multiply the value of this field
by 4 to find the total length.
❑ Service Type. In the original design of the IP header, this field was referred to as
type of service (TOS), which defined how the datagram should be handled. In the
late 1990s, IETF redefined the field to provide differentiated services (DiffServ).
564 PARTIVNETWORKLAYER
WhenwediscussdifferentiatedservicesinChapter30,wewillbeinabettersitua-
tiontodefinethebitsinthisfield.Theuseof4-bytewordsforthelengthheaderis
alsologicalbecausetheIPheaderalwaysneedstobealignedin4-byteboundaries.
❑ Total Length.This 16-bitfielddefines the totallength (header plus data) ofthe IP
datagraminbytes.A16-bitnumbercandefineatotallengthofupto65,535(whenall
bitsare1s).However,thesizeofthedatagramisnormallymuchlessthanthis.This
fieldhelpsthereceivingdevicetoknowwhenthepackethascompletelyarrived.To find
the length ofthe datacoming from the upper layer, subtract the headerlength
fromthetotallength.Theheaderlengthcanbefoundbymultiplyingthevalueinthe HLEN
field by 4.
Lengthofdatatotallength(HLEN)4
Thoughasizeof65,535bytesmightseemlarge,thesizeoftheIPv4datagram may
increase in the near future as the underlying technologies allow even more
throughput (greater bandwidth).
Onemayaskwhyweneedthisfieldanyway.Whenamachine(routerorhost)
receives a frame, it drops the header and the trailer, leaving the datagram. Why
include an extra field that is not needed? The answer is that in many cases we
reallydonotneedthevalueinthisfield.However,thereareoccasionsinwhichthe
datagram is not the only thing encapsulated in a frame; it may be that padding has
been added. For example, the Ethernet protocol has a minimum and maximum
restrictiononthesizeofdatathatcanbeencapsulatedinaframe(46to1500bytes).
IfthesizeofanIPv4datagramislessthan46bytes,somepaddingwillbeaddedto meet this
requirement. In this case, when a machine decapsulates the datagram, it
needstocheckthetotallengthfieldtodeterminehowmuchisreallydataandhow much is
padding.
❑ Identification, Flags, and Fragmentation Offset. These three fields are related to
the fragmentation of the IP datagram when the size of the datagram is larger than
theunderlyingnetworkcancarry.Wediscussthecontentsandimportanceofthese fields
when we talk about fragmentation in the next section.
❑ Time-to-live.Duetosomemalfunctioningofroutingprotocols(discussedlater)adata- gram
may be circulating in the Internet, visiting some networks over and over without
reachingthedestination.ThismaycreateextratrafficintheInternet.Thetime-to-live
(TTL)fieldisusedtocontrolthemaximumnumberofhops(routers)visitedbythe
datagram.Whenasourcehostsendsthedatagram,itstoresanumberinthisfield.This
valueisapproximatelytwotimesthemaximumnumberofroutersbetweenanytwo
hosts.Eachrouterthatprocessesthedatagramdecrementsthisnumberbyone.Ifthis
value,afterbeingdecremented,iszero,therouterdiscardsthedatagram.
❑ Protocol. In TCP/IP, the data section of a packet, called the payload, carries the
whole packet from another protocol. A datagram, for example, can carry a packet
belongingtoanytransport-layerprotocolsuchasUDPorTCP.Adatagramcanalso carry
a packet from other protocols that directly use the service of the IP, such as
someroutingprotocolsorsomeauxiliaryprotocols.TheInternetauthorityhasgiven
CHAPTER19NETWORK-LAYERPROTOCOLS 565
anyprotocolthatusestheserviceofIPaunique8-bitnumberwhichisinsertedinthe
protocolfield.WhenthepayloadisencapsulatedinadatagramatthesourceIP,the
correspondingprotocolnumberisinsertedinthisfield;whenthedatagramarrivesat
thedestination,thevalueofthisfieldhelpstodefinetowhichprotocolthepayload
shouldbedelivered.Inotherwords,thisfieldprovidesmultiplexingatthesourceand
demultiplexing at the destination, as shown in Figure 19.3. Note that the protocol
fields at the network layer play the same role as the port numbers at the transport
layer(Chapters23and24).However,weneedtwoportnumbersinatransport-layer
packetbecausetheportnumbersatthesourceanddestinationaredifferent,butwe
needonlyoneprotocolfieldbecausethisvalueisthesameforeachprotocolnomat- ter
whether it is located at the source or the destination.
Figure19.3Multiplexinganddemultiplexingusingthevalueoftheprotocolfield
Datagram
❑ Headerchecksum.IPisnotareliableprotocol;itdoesnotcheckwhetherthepay-
loadcarriedbyadatagramiscorruptedduringthetransmission.IPputstheburden of error checking of the
payload on the protocol that owns the payload, such as
UDPorTCP.Thedatagramheader,however,isaddedbyIP,anditserror-checking
istheresponsibilityofIP.ErrorsintheIPheadercanbeadisaster.Forexample,if the destination IP address is
corrupted, the packet can be delivered to the wrong host. If the protocol field is corrupted, the
payload may be delivered to the wrong
protocol.Ifthefieldsrelatedtothefragmentationarecorrupted,thedatagramcan- not be reassembled
correctly at the destination, and so on. For these reasons, IP
addsaheaderchecksumfieldtochecktheheader,butnotthepayload.Weneedto remember that, since the
value of some fields, such as TTL, which are related to fragmentation and options, may change from
router to router, the checksum needs to be recalculated at each router. As we discussed in Chapter 10,
checksum in the Internet normally uses a 16-bit field, which is the complement of the sum of other
fields calculated using 1s complement arithmetic.
❑ Source and Destination Addresses. These 32-bit source and destination address fields define the IP
address of the source and destination respectively. The source host shouldknowitsIPaddress.The
destination IPaddressiseitherknown by the protocol that uses the service of IP or is provided by the
DNS as described in Chapter 26. Note that the value of these fields must remain unchanged during
the
566 PARTIVNETWORKLAYER
time the IP datagram travels from the source host to the destination host. IP
addresses were discussed in Chapter 18.
❑ Options. A datagram header can have up to 40 bytes of options. Options can be
used for network testing and debugging. Although options are not a required part
oftheIPheader,optionprocessingisrequiredoftheIPsoftware.Thismeans that all
implementations must be able to handle options if they are present in the header.
The existence of options in a header creates some burden on the data-
gramhandling;someoptionscanbechangedbyrouters,whichforceseach router to
recalculate the header checksum. There are one-byte and multi-byte options that
we will briefly discuss later in the chapter. The complete discussionis posted at
the book website.
❑ Payload. Payload, or data, is the main reason for creating a datagram. Payload is
the packet coming from other protocols that use the service of IP. Comparing a
datagram to a postal package, payload is the content of the package; the header is
only the information written on the package.
Example19.1
AnIPv4packethasarrivedwiththefirst8bitsas(01000010)2Thereceiverdiscardsthepacket. Why?
Solution
Thereisanerrorinthispacket.The4leftmostbits(0100)2showtheversion,whichiscorrect.The next 4
bits (0010)2 show an invalid header length (2 4 8). The minimum number of bytes in the
header must be 20. The packet has been corrupted in transmission.
Example19.2
InanIPv4packet,thevalueofHLENis(1000)2.Howmanybytesofoptionsarebeingcarriedby this
packet?
Solution
The HLEN value is 8, which means the total number of bytes in the header is 8 4, or 32 bytes.
The first 20 bytes are the base header, the next 12 bytes are the options.
Example19.3
InanIPv4packet,thevalueofHLENis5,andthevalueofthetotallengthfieldis(0028)16.How many bytes of
data are being carried by this packet?
Solution
TheHLENvalueis5,whichmeansthetotalnumberofbytesintheheaderis54,or20bytes(no
options).Thetotallengthis(0028)16or40bytes,whichmeansthepacketiscarrying20bytesof data (40
20).
Example19.4
AnIPv4packethasarrivedwiththefirstfewhexadecimaldigitsasshown.
(45000028000100000102)16
CHAPTER19NETWORK-LAYERPROTOCOLS 567
Howmanyhopscanthispackettravelbeforebeingdropped?Thedatabelongtowhatupper-layer
protocol?
Solution
To find the time-to-live field, we skip 8 bytes (16 hexadecimal digits). The time-to-live field is
theninthbyte,whichis(01)16.Thismeansthepacketcantravelonlyonehop.Theprotocolfield is the
next byte (02)16, which means that the upper-layer protocol is IGMP.
Example19.5
Figure19.4showsanexampleofachecksumcalculationforanIPv4headerwithoutoptions.The header
is divided into 16-bit sections. All the sections are added and the sum is complemented after
wrapping the leftmost digit. The result is inserted in the checksum field.
Figure19.4ExampleofchecksumcalculationinIPv4
4 5 0 28
49.153 0 0
4 17 0
10.12.14.5
12.6.7.9
4, 5, and 0 4 5 0 0
28 0 0 1C
1 C0 0 1
0 and 0 0 0 0 0
4 and 17 0 4 1 1
0 0 0 0 0
10.12 Replaces 0
0A0C
14.5 0E0 5
12.6 0C0 6
7.9 0 7 0 9
Sum 134 4E
Wrapped sum 34 4 F
Checksum CBB0
Notethatthecalculationofwrappedsumandchecksumcanalsobedoneasfollowsin hexadecimal:
Wrapped Sum = Sum mod FFFF
Checksum=FFFFWrappedSum
Fragmentation
Adatagramcantravelthroughdifferentnetworks.EachrouterdecapsulatestheIPdata-
gramfromtheframeitreceives,processesit,andthenencapsulatesitinanotherframe. The
format and size of the received frame depend on the protocol used by the physical
network through which the frame has just traveled. The format and size of the sent
framedependontheprotocolusedbythephysicalnetworkthrough which theframeis
goingtotravel.Forexample,ifarouterconnectsaLANtoaWAN,itreceivesaframe in the
LAN format and sends a frame in the WAN format.
568 PARTIVNETWORKLAYER
MaximumTransferUnit(MTU)
Eachlink-layerprotocolhasitsownframeformat.Oneofthefeaturesofeachformatis the
maximum size of the payload that can be encapsulated. In other words, when a
datagram is encapsulated in a frame, the total size of the datagram must be less than
this maximum size, which is defined by the restrictions imposed by the hardware and
software used in the network (see Figure 19.5).
Figure19.5Maximumtransferunit(MTU)
The value of the MTU differs from one physical network protocol to another. For
example, the value for a LAN is normally 1500 bytes, but for a WAN it can be larger
or smaller.
In order to make the IP protocol independent of the physical network, the design-
ers decided to make the maximum length of the IP datagram equal to 65,535 bytes.
Thismakestransmission moreefficient ifone daywe usealink-layerprotocolwith an
MTU of this size. However, for other physical networks, we must divide the datagram
to make it possible for it to pass through these networks. This is called fragmentation.
When a datagram is fragmented, each fragment has its own header with most of
thefieldsrepeated,butsomehavebeenchanged.Afragmenteddatagrammayitselfbe
fragmented if it encounters a network with an even smaller MTU. In other words, a
datagram may be fragmented several times before it reaches the final destination.
A datagram can be fragmented by the source host or any router in the path. The
reassembly of the datagram, however, is done only by the destination host, because
each fragment becomes an independent datagram. Whereas the fragmented datagram
cantravelthroughdifferentroutes,andwecannevercontrolorguaranteewhichroutea
fragmented datagram may take, all of the fragments belonging to the same datagram
should finally arrive at the destination host. So it is logical to do the reassembly at the
finaldestination.Anevenstrongerobjectionforreassemblingpacketsduringthetrans-
mission is the loss of efficiency it incurs.
Whenwetalkaboutfragmentation,wemeanthatthepayloadoftheIPdatagramis
fragmented. However, most parts of the header, with the exception of some options,
must be copied by all fragments. The host or router that fragments a datagram must
changethe valuesofthreefields: flags,fragmentation offset, andtotal length.The rest
CHAPTER19NETWORK-LAYERPROTOCOLS 569
ofthefieldsmustbecopied.Ofcourse,thevalueofthechecksummustberecalculated
regardless of fragmentation.
FieldsRelatedtoFragmentation
WementionedbeforethatthreefieldsinanIPdatagramarerelatedtofragmentation:
identification,flags,andfragmentationoffset.Letusexplainthesefieldsnow.
The 16-bit identification field identifies a datagram originating from the source
host. The combination of the identification and source IP address must uniquelydefine
a datagram as it leaves the source host. To guarantee uniqueness, the IP proto- col uses
a counter to label the datagrams. The counter is initialized to a positive num- ber.
When the IP protocol sends a datagram, it copies the current value of the counter to the
identification field and increments the counter by one. As long as the counter is kept in
the main memory, uniqueness is guaranteed. When a datagram is fragmented, the
value in the identification field is copied into all fragments. In other words, all
fragments have the same identification number, which is also the same as the original
datagram. The identification number helps the destination in reassembling the data-
gram. It knows that all fragments having the same identification value should be
assembled into one datagram.
The3-bitflagsfielddefinesthreeflags.Theleftmostbitisreserved(notused).The second
bit (D bit) is called the do not fragment bit. If its value is 1, the machine must not
fragment the datagram. If it cannot pass the datagram through any available physi- cal
network, it discards the datagram and sends an ICMP error message to the source
host(discussedlater).Ifitsvalueis0,thedatagramcanbefragmentedifnecessary.The third
bit (M bit) is called the more fragment bit. If its value is 1, it means the datagram is not
the last fragment; there are more fragments after this one. If its value is 0, it means this
is the last or only fragment.
The 13-bit fragmentation offset field shows the relative position of this fragment
with respect to the whole datagram. It is the offset of the data in the original datagram
measuredinunitsof8bytes.Figure19.6showsadatagramwithadatasizeof4000bytes
fragmented into three fragments. The bytes in the original datagram are numbered 0 to
3999.Thefirstfragmentcarriesbytes0to1399.Theoffsetforthisdatagramis0/80. The
second fragment carries bytes 1400 to 2799; the offset value for this fragment is
1400/8 175. Finally, the third fragment carries bytes 2800 to 3999. The offset value
for this fragment is 2800/8 350.
Figure19.6Fragmentationexample
Rememberthatthevalueoftheoffsetismeasuredinunitsof8bytes.Thisis done
becausethelengthoftheoffsetfieldisonly13bitslongandcannotrepresentasequence
ofbytesgreaterthan8191.Thisforceshostsorroutersthatfragmentdatagramstochoose
thesizeofeachfragmentsothatthefirstbytenumberisdivisibleby8.
Figure 19.7 shows an expanded view of the fragments in the previous figure. The
original packet starts at the client; the fragments are reassembled at the server. The
value of the identification field is the same in all fragments, as is the value of the flags
fieldwiththemorebitsetforallfragmentsexceptthelast.Also,thevalueoftheoffset fieldfor
each fragment isshown.Note that although thefragmentsarrivedout of order at the
destination, they can be correctly reassembled.
Figure19.7Detailedfragmentationexample
F2.2
F2.1 F2.1
F2.2
1420
14,567 000
1offset
Bytes0000–1399
820
F1 14,567 1 175 offset
4020
14,567 000
0offset 1420
14,567 1offset
175 Bytes1400–2199
F2.1
Bytes2800–3999
F3
Thefigurealsoshowswhathappens if afragmentitselfisfragmented.Inthiscase
thevalue of the offsetfield is alwaysrelative to the original datagram. For
example,inthefigure,thesecondfragmentisitselffragmentedlaterintotwofragmentsof
572 PARTIVNETWORKLAYER
800 bytes and 600 bytes, but the offset shows the relative position of the fragments to
the original data.
It is obvious that even if each fragment follows a different path and arrives out of
order, the final destination host can reassemble the original datagram from the frag-
ments received (if none of them is lost) using the following strategy:
a. Thefirstfragmenthasanoffsetfieldvalueofzero.
b. Divide the length of the first fragment by 8. The second fragment has an offset
value equal to that result.
c. Dividethetotallengthofthefirstandsecondfragmentby8.Thethirdfragment has an
offset value equal to that result.
d. Continuetheprocess.ThelastfragmenthasitsMbitsetto0.
e. Continuetheprocess.Thelastfragmenthasamorebitvalueof0.
Example19.6
A packet has arrived with an M bit value of 0. Is this the first fragment, the last fragment, or a
middle fragment? Do we know if the packet was fragmented?
Solution
IftheMbitis0,itmeansthattherearenomorefragments;thefragmentisthelastone.However, we cannot
say if the original packet was fragmented or not. A nonfragmented packet is consid- ered the last
fragment.
Example19.7
A packet has arrived with an M bit value of 1. Is this the first fragment, the last fragment, or a
middle fragment? Do we know if the packet was fragmented?
Solution
If the M bit is 1, it means that there is at least one more fragment. This fragment can be the first
one or a middleone,butnotthe lastone.Wedon’tknowifitisthe first oneor amiddleone;we need
more information (the value of the fragmentation offset).
Example19.8
A packet has arrived with an M bit value of 1 and a fragmentation offset value of 0. Is this the
first fragment, the last fragment, or a middle fragment?
Solution
BecausetheMbitis1,itiseitherthefirstfragmentoramiddleone.Becausetheoffsetvalueis0, it is the
first fragment.
Example19.9
Apackethasarrivedinwhichtheoffsetvalueis100.Whatisthenumberofthefirstbyte?Dowe know the
number of the last byte?
Solution
To find the number of the first byte, we multiply the offset value by 8. This means that the first
bytenumberis800.We cannotdetermine the numberof the last byteunlessweknowthelength of the
data.
CHAPTER19NETWORK-LAYERPROTOCOLS 573
Example19.10
Apackethasarrivedinwhichtheoffsetvalueis100,thevalueofHLENis5,andthevalueofthe total
length field is 100. What are the numbers of the first byte and the last byte?
Solution
The first byte number is 100 8 800. The total length is 100 bytes, and the header length
is20bytes(5 4), which meansthatthere are80 bytes inthisdatagram. Ifthefirstbyte number is 800,
the last byte number must be 879.
Options
The header of the IPv4 datagram is made of two parts: a fixed part and a variable part.
The fixed part is 20 bytes long and was discussed in the previous section. The variable
partcomprisestheoptionsthatcanbeamaximumof40bytes(inmultiplesof4-bytes) to
preserve the boundary of the header.
Options,asthenameimplies,arenotrequiredforadatagram.Theycanbeusedfor
network testing and debugging. Although options are not a required part of the IPv4
header, option processing is required of the IPv4 software. This means that all imple-
mentationsmustbeabletohandleoptionsiftheyarepresentintheheader.Optionsare divided
into two broad categories: single-byte options and multiple-byte options. We give a
brief description of options here; for a complete description, see the book website
under Extra Materials.
Single-ByteOptions
Therearetwosingle-byteoptions.
NoOperation
Ano-operationoptionisa1-byteoptionusedasafillerbetweenoptions.
Endof Option
An end-of-option option is a 1-byte option used for padding at the end of the option
field. It, however, can only be used as the last option.
Multliple-ByteOptions
Therearefourmultiple-byteoptions.
RecordRoute
Arecordrouteoption is usedtorecordtheInternetroutersthathandlethedatagram.It can list
up to nine router addresses. It can be used for debugging and management purposes.
StrictSource Route
A strict source route option is used by the source to predetermine a route for the data-
gram as it travels through the Internet. Dictation of a route by the source can be useful
forseveralpurposes.Thesendercanchoosearoutewithaspecifictypeofservice,such
asminimumdelayormaximumthroughput.Alternatively,itmaychoosearoutethat is
574 PARTIVNETWORKLAYER
saferormorereliableforthesender’spurpose.Forexample,asendercanchoosearoute so that
its datagram does not travel through a competitor’s network.
If a datagram specifies a strict source route, all the routers defined in the option
must be visited by the datagram. A router must not be visited if its IPv4 address is not
listedinthedatagram.Ifthedatagramvisitsarouterthatisnotonthelist,thedatagram is
discarded and an error message is issued. If the datagram arrives at the destination and
someof the entrieswere not visited, it will also be discardedand anerror message
issued.
LooseSourceRoute
Aloosesourcerouteoptionissimilartothestrictsourceroute,butitislessrigid.Each router in
the list must be visited, but the datagram can visit other routers as well.
Timestamp
A timestamp option is used to record the time of datagram processing by a router. The
time is expressed in milliseconds from midnight, Universal time or Greenwich mean
time.Knowingthetimeadatagramisprocessedcan helpusersandmanagerstrack the
behavioroftheroutersintheInternet.Wecanestimatethetimeittakesforadatagram
togofromoneroutertoanother.Wesayestimatebecause,althoughallroutersmayuse
Universal time, their local clocks may not be synchronized.
SecurityofIPv4Datagrams
The IPv4 protocol, as well as the whole Internet, was started when the Internet users
trustedeachother.NosecuritywasprovidedfortheIPv4protocol.Today,however,the
situation is different; the Internet is not secure anymore.Although we will discuss net-
work security in general and IP security in particular in Chapters 31 and 32, here we
give a brief idea about the security issues in IP protocol and the solutions. There are
three security issues that are particularly applicable to the IP protocol: packet sniffing,
packet modification, and IP spoofing.
PacketSniffing
An intruder may intercept an IP packet and make a copy of it. Packet sniffing is a pas-
sive attack, in which the attacker does not change the contents of the packet. This type
ofattackisverydifficulttodetectbecausethesenderandthereceivermayneverknow that the
packet has been copied. Although packet sniffing cannot be stopped, encryp-
tionofthepacketcanmaketheattacker’seffortuseless.Theattackermaystillsniffthe packet,
but the content is not detectable.
PacketModification
The second type of attack is to modify the packet. The attacker intercepts the packet,
changes its contents, and sends the new packet to the receiver. The receiver believes
that the packet is coming from the original sender. This type of attack can be detected
using a data integrity mechanism. The receiver, before opening and using the contents
of the message, can use this mechanism to make sure that the packet has not been
changed during the transmission. We discuss packet integrity in Chapter 32.
CHAPTER19NETWORK-LAYERPROTOCOLS 575
IP Spoofing
AnattackercanmasqueradeassomebodyelseandcreateanIPpacketthatcarries the source
address of another computer. An attacker can send an IP packet to a bank pretending
that it is coming from one of the customers. This type of attack can be prevented using
an origin authentication mechanism (see Chapter 32).
IPSec
The IP packets today can be protected from the previously mentioned attacks using a
protocol called IPSec (IP Security). This protocol, which is used in conjunction with
the IP protocol, creates a connection-oriented service between two entities in which
theycanexchangeIPpacketswithoutworryingaboutthethreeattacksdiscussedabove. We
will discuss IPSec in detail in Chapter 32; here it is enough to mention that IPSec
provides the following four services:
❑ DefiningAlgorithmsandKeys.Thetwoentitiesthatwanttocreateasecurechan- nel
between themselves can agree on some available algorithms and keys to be used
for security purposes.
❑ Packet Encryption. The packets exchanged between two parties can be encrypted
forprivacyusingoneoftheencryptionalgorithmsandasharedkeyagreeduponin the
first step. This makes the packet sniffing attack useless.
❑ DataIntegrity.Dataintegrityguaranteesthatthepacketisnotmodifiedduringthe
transmission. If the received packet does not pass the data integrity test, it is dis-
carded. This prevents the second attack, packet modification, described above.
❑ Origin Authentication. IPSec can authenticate the origin of the packet to be sure
that the packet is not created by an imposter. This can prevent IP spoofing attacks
as described above.
ICMPv4
TheIPv4hasnoerror-reportingorerror-correctingmechanism.Whathappensifsome- thing
goes wrong? Whathappens if a router mustdiscard a datagram because it cannot find a
route to the final destination, or because the time-to-live field has a zero value? What
happens if the final destination host must discard the received fragments of a datagram
because it has not received all fragments within a predetermined time limit?
TheseareexamplesofsituationswhereanerrorhasoccurredandtheIPprotocolhasno built-in
mechanism to notify the original host.
The IP protocol also lacks a mechanism for host and management queries. A host
sometimesneedstodetermineifarouteroranotherhostisalive.Andsometimesanet- work
manager needs information from another host or router.
The Internet Control Message Protocol version 4 (ICMPv4) hasbeen designed
to compensate for the above two deficiencies. It is a companion to the IP protocol.
ICMPitself is a network-layer protocol. However, its messages are not passed directly
tothedata-linklayeraswouldbeexpected.Instead,themessagesarefirstencapsulated
insideIPdatagramsbeforegoingtothelowerlayer.WhenanIPdatagramencapsulates
576 PARTIVNETWORKLAYER
C H A P T E R 20
UnicastRouting
I naninternet,thegoalofthenetworklayeristodeliveradatagramfromitssourceto
itsdestinationordestinations.Ifadatagramisdestinedforonlyonedestination (one-to-
onedelivery),wehaveunicastrouting.Ifthedatagramisdestinedforseveral
destinations(one-to-manydelivery),wehavemulticastrouting.
Inthepreviouschapters,wehaveshownthattheroutingcanbepossibleifarouter
hasaforwardingtabletoforwardapackettotheappropriatenextnodeonitswaytothe
finaldestinationordestinations.Tomaketheforwardingtablesoftherouter,theInter-
netneedsroutingprotocolsthatwillbeactiveallthetimeinthebackgroundandupdate the
forwarding tables.
Inthischapterwediscussonlyunicastrouting;multicastroutingwillbediscussed in the
next chapter. This chapter is divided into three sections:
❑ The first section introduces the concept of unicast routing and describes the gen-
eral ideas behind it. The section then describes least-cost routing and least-cost
trees.
❑ ThesecondsectiondiscussescommonroutingalgorithmsusedintheInternet.The
section first describes distance-vector routing. It then describes link-state routing.
Finally, it explains path-vector routing.
❑ The third section explores unicast-routing protocols corresponding to the unicast-
routing algorithms discussed in the second section. This section first defines the
structure of the Internet as seen by the unicast-routing protocols. It then describes
RIP,aprotocolthatimplementsthedistance-vectorroutingalgorithm.Thesection next
describes OSPF, a protocol that implements the link-state routing algorithm.
Finally, the section describes the BGP, a protocol that implements the path-vector
routing algorithm.
595
596 PARTIVNETWORKLAYER
INTRODUCTION
Unicast routing in the Internet, with a large number of routers and a huge number of
hosts,canbedoneonlybyusinghierarchicalrouting:routinginseveralstepsusingdif-
ferentroutingalgorithms.Inthissection,wefirstdiscussthegeneralconceptofunicast routing
in an internet: an internetwork made of networks connected by routers. After the
routing concepts and algorithms are understood, we show how we can apply them to
the Internet using hierarchical routing.
General Idea
In unicast routing, a packet is routed, hop by hop, from its source to its destination by
the help of forwarding tables. The source host needs no forwarding table because it
delivers its packet to the default router in its local network. The destination host needs
no forwarding table either because it receives the packet from its default router in its
local network. This means that only the routers that glue together the networks in the
internet need forwarding tables. With the above explanation, routing a packet from its
source to its destination means routing the packet from a source router (the default
routerofthesourcehost)toadestinationrouter(therouterconnectedtothedestination
network). Although a packet needs to visit the source and the destination routers, the
question is what other routers the packet should visit. In other words, there are several
routes that a packet can travel from the source to the destination; what must be deter-
mined is which route the packet should take.
AnInternetasaGraph
Tofindthebestroute,aninternetcanbemodeledasagraph.Agraphincomputersci- ence isa
set of nodes and edges (lines) that connect the nodes.To model an internet as
agraph,wecanthinkofeachrouterasanodeandeachnetworkbetweenapairofrout-
ersasanedge.Aninternetis,infact,modeledasaweightedgraph,inwhicheachedge is
associated with a cost. If a weighted graph is used to represent a geographical area,
thenodescanbecitiesandtheedgescanberoadsconnectingthecities;theweights,in this case,
are distances between cities. In routing, however, the cost of an edge has a different
interpretation in different routing protocols, which we discuss in a later sec-
tion.Forthemoment,weassumethatthereisacostassociatedwitheachedge.Ifthere is no
edge between the nodes, the cost is infinity. Figure 20.1 shows how an internet can be
modeled as a graph.
Least-Cost Routing
When an internet is modeled asa weighted graph, one of the ways to interpret the best
route from the source router to the destination router is to find the least cost between
the two. In other words, the source router chooses a route to the destination router in
suchawaythatthetotalcostfortherouteistheleastcostamongallpossibleroutes.In Figure
20.1, the best route between A and E is A-B-E, with the cost of 6. This means
thateachrouterneedstofindtheleast-costroutebetweenitselfandalltheotherrouters to be
able to route a packet using this criteria.
CHAPTER20UNICASTROUTING 597
Figure20.1Aninternetanditsgraphicalrepresentation
Router LANEdge
Legend Node WAN2, 3, ...Costs
2 5
A B C
3
3 4 4 G
1
D E F
5 2
a.Aninternet b.Theweighted graph
Least-CostTrees
IfthereareNroutersinaninternet,thereare(N1)least-costpathsfromeachrouterto
anyotherrouter.ThismeansweneedN(N1)least-costpathsforthewholeinternet.If
wehaveonly10routersinaninternet,weneed90least-costpaths.Abetterwaytoseeall
ofthesepathsistocombinetheminaleast-costtree.Aleast-costtreeisatreewiththe
sourcerouterastherootthatspansthewholegraph(visitsallothernodes)andinwhich
thepathbetweentherootandanyothernodeistheshortest.Inthisway,wecanhaveonly
oneshortest-pathtreeforeachnode;wehaveNleast-costtreesforthewholeinternet.We show
howtocreatealeast-costtree foreachnodelaterinthissection;for themoment,
Figure20.2showsthesevenleast-costtreesfortheinternetinFigure20.1.
Figure20.2Least-costtreesfornodesintheinternetofFigure20.1
0 2 7 2 0 5 7 5 0
A B C A B C A B C
G 9 G 7 G3
D E F D E F D E F
3 6 8 5 4 6 10 6 4
3 5 10 6 4 6 8 6 4
A B C A B C A B C
G 8 G 3 G1
D E F D E F D E F
0 5 7 5 0 2 7 2 0
9 7 3 Legend
A B C
Root of the treeIntermediateorendnode
G0 1,2,...Totalcostfromthe root
D E F
8 3 1
598 PARTIVNETWORKLAYER
The least-cost trees for a weighted graph can have several properties if they are
created using consistent criteria.
1. The least-cost route from X to Y in X’s tree is the inverse of the least-cost route
from Y to X in Y’s tree; the cost in both directions is the same. For example, in
Figure 20.2, the route from A to F in A’s tree is (A B E F), but the route
fromFtoAinF’streeis(FEBA),whichistheinverseofthefirstroute. The cost is 8
in each case.
2. Instead of travelling from X to Z using X’s tree, we can travel from X to Y using
X’s tree and continue from Y to Z using Y’s tree. For example, in Figure 20.2, we
cangofromAtoGinA’streeusingtheroute(ABEFG).Wecanalso go from A
to E in A’s tree (A B E) and then continue in E’s tree using the route (E F
G). The combination of the two routes in the second case is the same route asin
the firstcase.Thecost inthe firstcase is9; the cost inthesecond case is also 9 (6 3).
ROUTINGALGORITHMS
After discussing the general idea behind least-cost trees and the forwarding tables that
canbemadefromthem,nowweconcentrateontheroutingalgorithms.Severalrouting
algorithms have been designed in the past. The differences between these methods are
in the way they interpret the least cost and the way they create the least-cost tree for
each node. In this section, we discuss the common algorithms; later we show how a
routing protocol in the Internet implements one of these algorithms.
Distance-VectorRouting
The distance-vector (DV) routing uses the goal we discussed in the introduction, to
find the best route. In distance-vector routing, the first thing each node creates is its
own least-cost tree with the rudimentary information it has about its immediate neigh-
bors. The incomplete trees are exchanged between immediate neighbors to make the
trees more and more complete and to represent the whole internet. We can say that in
distance-vector routing, a router continuously tells all of its neighbors what it knows
about the whole internet (although the knowledge can be incomplete).
Before we show how incomplete least-cost trees can be combined to make com-
pleteones,weneedtodiscusstwoimportanttopics:theBellman-Fordequationandthe
concept of distance vectors, which we cover next.
Bellman-FordEquation
Theheartofdistance-vectorroutingisthefamousBellman-Fordequation.Thisequation
isusedtofindtheleastcost(shortestdistance)betweenasourcenode,x,andadestina-
tionnode,y,throughsomeintermediarynodes(a, b, c,)whenthecostsbetweenthe
sourceandtheintermediarynodesandtheleastcostsbetweentheintermediarynodesand
thedestinationaregiven.ThefollowingshowsthegeneralcaseinwhichD ijistheshort- est
distance and cij is the cost between nodes i and j.
Dxymin{(cxaDay),(cxbDby),(cxcDcy),}
CHAPTER20UNICASTROUTING 599
Indistance-vectorrouting,normallywewanttoupdateanexistingleastcostwitha least
cost through an intermediary node, such as z, if the latter is shorter. In this case, the
equation becomes simpler, as shown below:
Dxymin{Dxy,(cxzDzy)}
Figure20.3showstheideagraphicallyforbothcases.
Figure20.3GraphicalideabehindBellman-Fordequation
a z
cxb Dby
x b y x y
Dxy
c
a. General case with three intermediate nodes b.Updatingapathwithanew route
WecansaythattheBellman-Fordequationenablesustobuildanewleast-costpath
frompreviously establishedleast-costpaths.InFigure20.3,wecan think of(ay),
(by),and(cy)as previously established least-cost paths and (xy) as the new least-
costpath.Wecaneventhinkofthisequationasthebuilderofanewleast-costtree from
previously established least-cost trees if we use the equation repeatedly. In other
words, the use of this equation in distance-vector routing is a witness that this method
also uses least-cost trees, but this use may be in the background.
We will shortly show how we use the Bellman-Ford equation and the concept of
distance vectors to build least-cost paths for each node in distance-vector routing, but
first we need to discuss the concept of a distance vector.
DistanceVectors
Theconcept ofadistance vectoristherationaleforthe namedistance-vectorrouting.
Aleast-costtreeisacombinationofleast-costpathsfromtherootofthetreetoalldes- tinations.
These paths are graphically glued together to form the tree. Distance-vector routing
unglues these paths and creates a distance vector, a one-dimensional array to represent
the tree. Figure 20.4 shows the tree for node A in the internet in Figure 20.1 and the
corresponding distance vector.
Notethatthenameofthedistancevectordefinestheroot,theindexesdefinethedes-
tinations,andthevalueofeachcelldefinestheleastcostfromtheroottothedestination.
Adistancevectordoesnotgivethepathtothedestinationsastheleast-costtreedoes;it
givesonlytheleastcoststothedestinations.Laterweshowhowwecanchangeadistance
vectortoaforwardingtable,butwefirstneedtofindalldistancevectorsforaninternet.
We know that a distance vector can represent least-cost paths in a least-cost tree,
but the question is how each node in an internet originally creates the corresponding
vector.Eachnodeinaninternet,whenitisbooted,createsaveryrudimentarydistance vector
with the minimum information the node can obtain from its neighborhood. The node
sends some greeting messages out of its interfaces and discovers the identity of
theimmediateneighborsandthedistancebetweenitselfandeachneighbor.Itthen
600 PARTIVNETWORKLAYER
Figure20.4Thedistancevectorcorrespondingtoatree
A
A 0
B 2
0 2 7
C 7
A B C 3
D
G 9 6
E
F 8
D E F
G 9
3 6 8
a.Treefornode A b.Distancevectorfornode A
makesasimpledistancevectorbyinsertingthediscovereddistancesinthecorrespond-
ingcellsandleavesthevalueofothercellsasinfinity.Dothesedistancevectorsrepre- sent
least-cost paths? They do, considering the limited information a node has. When we
know only one distance between two nodes, it is the least cost. Figure 20.5 shows
alldistancevectorsforourinternet.However,weneedtomentionthatthesevectorsare made
asynchronously, when the corresponding node has been booted; the existence of all of
them in a figure does not mean synchronous creation of them.
Figure20.5Thefirstdistancevectorforaninternet
A 0 A 2 A
8
B 2 B 0 B 5
C C 5 C 0
D 3 D D
8
88
E E 4 E
8 888
F F F 4
88
3
G G G
2 5 A
A B C 3 B
C 3
8888
3 4 4 GD
1 E
D E F F 1
5 2 G 0
A3 A
8
B B 4 ABCD
88
C C EFG
88
D0 D 5 4
E 5 E 0
8
F F 2 2
0
88
G G 1
Theserudimentaryvectorscannothelptheinternettoeffectivelyforwardapacket.
Forexample,nodeAthinksthatitisnotconnectedtonodeGbecausethecorresponding cell
shows the least cost ofinfinity. To improve these vectors, the nodes in the internet
needtohelpeachotherbyexchanginginformation.Aftereachnodehascreateditsvec- tor,
itsendsacopy ofthevector toall itsimmediateneighbors.Afteranodereceivesa distance
vector from a neighbor, it updates its distance vector using the Bellman-Ford
equation(secondcase).However,weneedtounderstandthatweneedtoupdate,not
CHAPTER20UNICASTROUTING 601
onlyoneleastcost,butNoftheminwhichNisthenumberofthenodesintheinternet. If we are
using a program, we can do this using a loop; if we are showing the concept on paper,
we can show the whole vector instead of the N separate equations. We show the whole
vector instead of seven equations for each update in Figure 20.6. The figure
showstwoasynchronousevents,happeningoneafteranotherwithsometimein
Figure20.6Updatingdistancevectors
8
0 B[]= min (B[ ],20+A[ ]) 2 0 B[ ]= min (B[ ],
0 4 + E[ ]) 4
5 5 5 5
8
8
5 3 5 5 5
8
4 4 4 4 0
8 8 8
8 8 6 2
8 8
8 8
8
8
a.Firstevent:BreceivesacopyofA’svector. b.Secondevent:BreceivesacopyofE’svector.
Note:
X[]: thewhole vector
between. In the first event, node A has sent its vector to node B. Node B updates its
vectorusingthecostcBA 2.Inthesecondevent,nodeEhassentitsvectortonodeB. Node B
updates its vector using the cost cEA 4.
Afterthefirstevent,nodeBhasoneimprovementinitsvector:itsleastcostto
nodeDhaschangedfrominfinityto5(vianodeA).Afterthesecondevent,nodeBhas one more
improvement in its vector; its least cost to node F has changed from infinity to 6 (via
node E). We hope that we have convinced the reader that exchanging vectors
eventually stabilizes the system and allows all nodes to find the ultimate least cost
between themselves and any other node. We need to remember that after updating a
node, it immediately sends its updated vector to all neighbors. Even if its neighbors
have received the previous vector, the updated one may help more.
Distance-Vector RoutingAlgorithm
Nowwecangiveasimplifiedpseudocodeforthedistance-vectorroutingalgorithm,as
showninTable20.1.Thealgorithmisrunbyitsnodeindependentlyandasynchronously.
Table20.1Distance-VectorRoutingAlgorithmforaNode
Distance_Vector_Routing()
1
{ 2
//Initialize(createinitialvectorsforthenode)
3
4
D[myself]0
602 PARTIVNETWORKLAYER
Table20.1Distance-VectorRoutingAlgorithmforaNode(continued)
5 for(y 1toN)
6 {
7 if(yisaneighbor)
8 D[y] c[myself][y]
9 else
10 D[y]
11 }
12 sendvector{D[1],D[2],,D[N]}toallneighbors
13 //Update(improvethevectorwiththevectorreceivedfromaneighbor) repeat
14 (forever)
15 {
16 wait(foravectorDwfromaneighborworanychangeinthe link)
17 for(y1 to N)
18 {
19 D[y] min[D[y],(c[myself][w] Dw[y])] //Bellman-Fordequation
20 }
21 if(anychangeinthe vector)
22 sendvector{D[1],D[2],,D[N]}toall neighbors
23 }
24 }//EndofDistanceVector
Lines4to11initializethevectorforthenode.Lines14to23showhowthevector can be
updated after receiving a vector from the immediate neighbor. The for loop in lines17
to 20allowsallentries(cells)inthevectortobeupdatedafter receivinga new vector. Note
that thenodesendsits vector inline12,afterbeinginitialized,andin line 22, after it is
updated.
CounttoInfinity
Aproblemwithdistance-vectorroutingisthatanydecreaseincost(goodnews)propa- gates
quickly, but any increase in cost (bad news) will propagate slowly. For a routing
protocoltoworkproperly,ifalinkisbroken(costbecomesinfinity),everyotherrouter
shouldbeawareofitimmediately,butindistance-vectorrouting,thistakessometime.
Theproblemisreferredtoascounttoinfinity.Itsometimestakesseveralupdatesbefore the
cost for a broken link is recorded as infinity by all routers.
Two-NodeLoop
Oneexampleofcounttoinfinityisthetwo-nodeloopproblem.Tounderstandtheprob- lem, let
us look at the scenario depicted in Figure 20.7.
The figure shows a system with three nodes. We have shown only the portions of
theforwardingtableneededforourdiscussion.Atthebeginning,bothnodesAandB
CHAPTER20UNICASTROUTING 603
Figure20.7Two-nodeinstability
X3 A X2 A
X
A B A B A B
a. Before failure b. After link failure c. AfterA is updated byB
X3A X4A X∞ X∞
X X
A B A B
d. After B is updated byA e. Finally
know how to reach node X. But suddenly, the link between A and X fails. Node A
changesitstable.IfAcansenditstabletoBimmediately,everythingisfine.However, the
system becomes unstable if B sends its forwarding table to A before receiving A’s
forwarding table. Node Areceives the updateand,assuming thatBhas found awayto
reach X, immediately updates its forwarding table. Now A sends its new update to B.
Now B thinks that something has been changed around A and updates its forwarding
table. The cost of reaching X increases gradually until it reaches infinity. At this
moment, both A and B know that X cannot be reached. However, during this time the
system is not stable. Node A thinks that the route to X is via B; node B thinks that the
routetoXisviaA.IfAreceivesapacketdestinedforX,thepacketgoestoBandthen comes back
to A. Similarly, if B receives a packet destined for X, it goes to A and
comesbacktoB.PacketsbouncebetweenAandB,creatingatwo-nodeloopproblem. A few
solutions have been proposed for instability of this kind.
SplitHorizon
One solution to instability is called split horizon. In this strategy, instead of flooding
the table through each interface, each node sends only part of its table through each
interface. If, according to its table, node B thinks that the optimum route to reach X is
via A, it does not need to advertise this piece of information to A; the information has
come from A (A already knows). Taking information from node A, modifying it, and
sending it back to node A is what creates the confusion. In our scenario, node B elimi-
nates the last line of its forwarding table before it sends it to A. In this case, node A
keeps the value of infinity as the distance to X. Later, when node A sends its forward-
ing table to B, node B also corrects its forwarding table. The system becomes stable
after the first update: both node A and node B know that X is not reachable.
PoisonReverse
Usingthesplit-horizonstrategyhasonedrawback.Normally,thecorrespondingproto-
colusesatimer,andifthereisnonewsaboutaroute,thenodedeletestheroutefromits table.
When node B in the previous scenario eliminates the route to X from its adver-
tisementtoA,nodeAcannotguesswhetherthisisduetothesplit-horizonstrategy(the
sourceofinformationwasA)orbecauseBhasnotreceivedanynewsaboutXrecently.
604 PARTIVNETWORKLAYER
InthepoisonreversestrategyBcanstilladvertisethevalueforX,butifthesourceof
CHAPTER20UNICASTROUTING 605
Link-State Routing
Aroutingalgorithmthatdirectlyfollowsourdiscussionforcreatingleast-costtreesand
forwarding tables is link-state (LS) routing. This method uses the term link-state to
define the characteristic of a link (an edge) that represents a network in the internet. In
this algorithm the cost associated with an edge defines the state of the link. Links with
lower costs are preferred to links with higher costs; if the cost of a link is infinity, it
means that the link does not exist or has been broken.
Link-StateDatabase(LSDB)
Tocreatealeast-costtreewiththismethod,eachnodeneedstohaveacompletemapof
thenetwork,whichmeansitneedstoknowthestateofeachlink.Thecollectionofstates for all
links is called the link-state database (LSDB). There is only one LSDB for the
wholeinternet;eachnodeneedstohaveaduplicateofittobeabletocreatetheleast-cost
tree.Figure20.8showsanexampleofanLSDBforthegraphinFigure20.1.TheLSDB can be
represented as a two-dimensional array(matrix) in which the value of each cell defines
the cost of the corresponding link.
Figure20.8Exampleofalink-statedatabase
ABCD EFG
A 0 2 3
8
8
8 8
8 8
2 B 2 0 5 4
8 8
A B 5 C 5 0 4 3
3 C
8
D 3 0 5
8
8
8 8
8 8
3 4 4 4 5 0 2
8 8 8
G E
1 F 4 2 0 1
8 8
8 8
D 3 1 0
8
E F G
5 2
a.The weighted graph b.Linkstate database
NowthequestionishoweachnodecancreatethisLSDBthatcontainsinformation
aboutthewholeinternet.Thiscanbedonebyaprocesscalledflooding.Eachnodecan send
some greeting messages to all its immediate neighbors (those nodes to which it is
connecteddirectly)tocollecttwo piecesofinformationforeachneighboringnode: the
identity of the node and the cost of the link. The combination of these two pieces of
information is called the LS packet (LSP); the LSP is sent out of each interface, as
shown in Figure 20.9 for our internet in Figure 20.1. When a node receives an LSP
fromoneofitsinterfaces,itcomparestheLSPwiththecopyitmayalreadyhave.Ifthe newly
arrived LSP is older than the one it has (found by checking the sequence num-
ber),itdiscardstheLSP.Ifitisnewerorthefirstonereceived,thenodediscardstheold LSP (if
there is one) and keeps the received one. It then sends a copy of it out of each
606 PARTIVNETWORKLAYER
interface except the one from which the packet arrived. This guarantees that flooding
stopssomewhereinthenetwork(whereanodehasonlyoneinterface).Weneedtocon-
vinceourselvesthat,afterreceivingallnewLSPs,eachnodecreatesthecomprehensive
LSDB as shown in Figure 20.9. This LSDB is the same for each node and shows the
whole map of the internet. In other words, a node can make the whole map if it needs
to, using this LSDB.
Figure20.9LSPscreatedandsentoutbyeachnodetobuildLSDB
We can compare the link-state routing algorithm with the distance-vector routing
algorithm.Inthedistance-vectorroutingalgorithm,eachroutertellsitsneighborswhat it
knows about the whole internet; in the link-state routing algorithm, each router tells
the whole internet what it knows about its neighbors.
FormationofLeast-CostTrees
Tocreatealeast-costtreeforitself,usingthesharedLSDB,eachnodeneedstorunthe famous
Dijkstra Algorithm. This iterative algorithm uses the following steps:
1. The node chooses itself as the root of the tree, creating a tree with a single node,
and sets the total cost of each node based on the information in the LSDB.
2. The node selects one node, among all nodes not in the tree, which is closest to the
root,andaddsthistothetree.Afterthisnodeisaddedtothetree,thecostofallother
nodesnotinthetreeneedstobeupdatedbecausethepathsmayhavebeenchanged.
3. Thenoderepeatsstep2untilallnodesareaddedtothetree.
We need to convince ourselves that the above three steps finally create the least-cost
tree. Table 20.2 shows a simplified version of Dijkstra’s algorithm.
Table20.2Dijkstra’sAlgorithm
1
Dijkstra’sAlgorithm( )
2
{
3
// Initialization
4
Tree={root}
//Treeismadeonlyoftheroot
CHAPTER20UNICASTROUTING 607
Table20.2Dijkstra’sAlgorithm(continued)
5 for(y 1toN) //Nisthenumberof nodes
6 {
7 if(yistheroot)
8 D[y]0 //D[y]isshortestdistancefromroottonodey
9 elseif(yisa neighbor)
10 D[y]c[root][y] //c[x][y] is cost between nodes xand y inLSDB
11 else
12 D[y]
13 }
14 // Calculation
15 repeat
16 {
17 findanodew,withD[w]minimumamongallnodesnotintheTree Tree
18 Tree {w} // Add w to tree
19 //Updatedistancesforallneighborsofw
20 for(everynodex,whichisaneighborofwandnotinthe Tree)
21 {
22 D[x]min{D[x],(D[w]c[w][x])}
23 }
24 }until(allnodesincludedintheTree)
25 }//Endof Dijkstra
Path-VectorRouting
Both link-state and distance-vector routing are based on the least-cost goal. However,
thereareinstanceswherethisgoalisnotthepriority.Forexample,assumethatthereare
someroutersintheinternetthatasenderwantstopreventitspacketsfromgoingthrough.
Forexample,aroutermaybelongtoanorganizationthatdoesnotprovideenoughsecu-
rityoritmaybelongtoacommercialrivalofthesenderwhichmightinspectthepackets for
obtaining information. Least-cost routing does not prevent a packet from passing
throughanareawhenthatareaisintheleast-costpath.Inotherwords,theleast-costgoal, applied
by LS or DV routing, does not allow a sender to apply specific policies to the
routeapacketmaytake.Asidefromsafetyandsecurity,thereareoccasions,asdiscussed
inthenextsection,inwhichthegoalofroutingismerelyreachability:toallowthepacket
toreachitsdestinationmoreefficientlywithoutassigningcoststotheroute.
608 PARTIVNETWORKLAYER
Figure20.10Least-costtree
Initialization
Legend
0 2 Root node
8
A B Nodeinthepath
C NodenotyetinthepathPotentialpath
Path
G
8
D E F
3
8
Iteration 1 Iteration 2
0 2 7 0 2 7
A B C A B C
G G
8
8
D E F D E F
3 6 3 6
8
8
Iteration 3 Iteration 4
0 2 7 0 2 7
A B C A B C
G G10
8
D E F D E F
3 6 8 3 6 8
Iteration 5 Iteration 6
0 2 7 0 2 7
A B C A B C
G9 G9
D E F D E F
3 6 8 3 6 8
the tree determined by the source when it imposes its own policy. If there is more than
one route to a destination, the source can choose the route that meets its policy best. A
source may apply several policies at the same time. One of the common policies uses
the minimum number of nodes to be visited (something similar to least-cost). Another
common policy is to avoid some nodes as the middle node in a route.
Figure 20.11 shows a small internet with only five nodes. Each source has created
its own spanning tree that meets its policy. The policy imposed by all sources is to use
the minimum number of nodes to reach a destination. The spanning tree selected by A
and E is such that the communication does not pass through D as amiddle node. Simi-
larly, the spanning tree selected by B is such that the communication does not pass
through C as a middle node.
Figure20.11Spanningtreesinpath-vectorrouting
A B E A B E A B E
D D D
A B E A B E A B E
D D D
CreationofSpanningTrees
Path-vector routing, like distance-vector routing, is an asynchronous and distributed
routingalgorithm.Thespanningtreesaremade,graduallyandasynchronously,byeach node.
When a node is booted, it creates a path vector based on the information it can obtain
about its immediate neighbor. A node sends greeting messages to its immediate
neighbors to collect these pieces of information. Figure 20.12 shows all of these path
vectors for ourinternet in Figure20.11.Note,however, thatwe do notmean thatallof
thesetablesarecreatedsimultaneously;theyarecreatedwheneachnodeisbooted.The figure
also shows how these path vectors are sent to immediate neighbors after they have
been created (arrows).
Each node, after the creation ofthe initial path vector, sends it to all its immediate
neighbors. Each node, when it receives a path vector from a neighbor, updates its path
vector using an equation similar to the Bellman-Ford, but applying its own policy
instead of looking for the least cost. We can define this equation as
Path(x,y)best{Path(x, y),[(xPath(v, y)]}for all v’s intheinternet.
In this equation, the operator () means to add x to the beginning of the path. We
alsoneedtobecautioustoavoidaddinganodetoanemptypathbecauseanemptypath means
one that does not exist.
CHAPTER20UNICASTROUTING 609
Figure20.609Pathvectorsmadeatbooting time
A
A B C, B
B
C B,A C D C
B C, D
D B, C C E C, E
E B, D
AA A
B A,B B
C A B E C E, C
D D E, D
E E E
D A
B D,B
C D,C
D D
E D,E
The policy is defined by selecting the best of multiple paths. Path-vector routing
also imposes one more condition on thisequation: If Path (v, y) includesx, that path is
discarded to avoid a loop in the path. In other words, x does not want to visit itself
when it selects a path to y.
Figure 20.13 shows the path vector of node C after two events. In the first event,
node C receives a copy of B’s vector, which improves its vector: now it knows how to
reachnodeA.Inthesecondevent,nodeCreceivesacopyofD’svector,whichdoesnot
changeitsvector.Asamatteroffactthevectorfor nodeCafterthe firstevent isstabi- lized and
serves as its forwarding table.
Figure20.13Updatingpathvectors
Note:
X[]:vectorXY: node Y
Event1:CreceivesacopyofB’svector Event2:CreceivesacopyofD’svector
610 PARTIVNETWORKLAYER
Path-VectorAlgorithm
Based on the initialization process and the equation used in updating each forwarding
table after receiving path vectors from neighbors, we can write a simplified version of
the path vector algorithm as shown in Table 20.3.
Table20.3Path-vectoralgorithmforanode
1 Path_Vector_Routing()
2 {
3 //Initialization
4 for(y1toN)
5 {
6 if(yis myself)
7 Path[y]myself
8 elseif(yisa neighbor)
9 Path[y]myselfneighbornode
10 else
11 Path[y]empty
12 }
13
Sendvector{Path[1],Path[2],,Path[y]}toallneighbors
14
15
//Update
16 repeat(forever)
17 {
18 wait(foravectorPathwfromaneighborw)
19 for(y1toN)
20 {
21 if(Pathwincludesmyself)
22 discardthe path//Avoidanyloop
23 else
24 Path[y]best{Path[y],(myselfPathw[y])}
25 }
26 If(thereisachangeinthe vector)
27 Sendvector{Path[1],Path[2],,Path[y]}toallneighbors
28 }
}//EndofPathVector
Lines 4 to 12 show the initialization for the node. Lines 17 to 24 show how the
node updates its vector after receiving a vector from the neighbor. The update process
is repeated forever. We can see the similarities between this algorithm and the DV
algorithm.