9 - 2. TCP
9 - 2. TCP
9 - 2. TCP
U A P R S F
R C S S Y I
G K H T N N
Source IP address
Destination IP address
Client Server
ACK: ISN_C+1
SYN: ISN_S
ACK: ISN_S+1
FIN, SeqA
ACK, SeqA+1
Data
ACK
FIN, SeqB
ACK, SeqB+1
TCP Flow Control
TCP uses a sliding window protocol without selective or
negative acknowledgments.
Selective acknowledgments would let the protocol say it’s
missing a range of bytes. TCP can only say that it has
received “up to byte N”.
The protocol has no way to specify a negative
acknowledgment. It can only say what has been received
Concepts same as discussed earlier, with some
differences
Sliding Window Based Flow Control
TCP sliding window works at byte level
So we talk about how many bytes are sent and ack’ed, utpto what
byte can be sent etc.
NOT how many segments are sent etc.
Sender maintains a window of size n and start of window X
Sender can send up to n bytes starting from byte X without
receiving an acknowledgement
When the first p bytes of data are acknowledged then the window
slides forward by p bytes to X+p. Sender can now send n bytes
starting from X+p
Window size determines how much unacknowledged data can
the sender send as usual
Sender Side Window
window
Sent and acked Sent but not acked Not yet sent
Next to be sent
Receive buffer
window
Solution
Receiver tells sender what is the current window size in
every segment it transmits to the sender (in Window field
of header)
This can be data sent from receiver to sender in the other
direction, or ack’s for the data received from the sender
Sender uses this advertised window size instead of a fixed
value
Window size (also called advertised window) is
continuously changing, = current free buffer space at
receiver
Can go to zero; sender not allowed to send anything!
Naïve implementations can cause silly window syndrome
Silly Window Syndrome
The problem of TCP sending very small segments
Small segments are bad
Too much overhead given headers have to go with each
segment
Can be caused by both sender side and receiver side
applications
Sender side application sending data very slowly
Receiver side application reading data from the receive
buffer very slowly
Receiver Side Silly Window Syndrome
Suppose that sender has lot of data to send, and sends one
window full of data
The application on the receiver side does not read the data
yet, so receiver’s buffer is full, so receiver advertises a
window size of 0, sender blocks (but has more data to send)
The application now reads very slowly, say 1 byte at a time
For each 1 byte read, the receiver advertises a window of size
1
Sender sends 1 more byte of data
This repeats, causing a lot of 1-byte segments to be sent
Solutions
Do not advertise small sized windows
Acknowledge data frames that arrive, but keep advertising a
0-sized window until either (i) half of the receive buffer is
free, or (ii) receiver buffer has at least MSS amount of free
space
Delay the acknowledgements for data frames that arrive
Sender side window cannot move, so cannot transmit more
data than current window
Sender Side Silly Window Syndrome
Caused when the application generates data slowly
Say 1 byte every time
If data is sent immediately, lots of small segments are sent
If it is to be delayed, how long to delay? No clue when the
Sender side application will generate data again
Nagle’s Algorithm:
If there is previous data that is sent but not acknowledged, place
any further data to be sent in the send buffer but do not send
until:
Either an acknowledgement is received, or
MSS sized data is available for sending
Nagle’s algorithm decides when are segments sent by a TCP
sender, subject to window restrictions
When are ACKs sent?
Suppose A is sending data to B
TCP acks are cumulative, acknowledges the longest
contiguous sequence from the start that is received
If ISN of A = 1000, A has sent 4 segments with sequence numbers
1001 (why not 1000?), 1600, 2800, and 3100, and B has received the
1st, 2nd and 4th segments only, then
Longest contiguous sequence from start received is byte
numbers1001 to 2799
TCP will ack with ack no. = 2800 (next byte expected)
Segment 1 and 2 are received in-order, Segment 3 is a
missing segment, Segment 4 is a segment received out-of
order
Note that receiver does not know there is one missing segment, it just
knows there is at least one
B sends an acknowledgement if/when
If B has data to send to A, always piggyback the ack for the data
received from A (ACK flag set, byte no. of next byte to expect put in
Acknowledgement Number field)
If B has no data to send, receives a segment from A in-order
(sequence number = next sequence number expected)
If all previous in-order segments are acknowledged, delay sending
the acknowledgement until one more segment arrives or a time
elapses (typically 500 milliseconds), then send
If B gets an out-of-order segment having a higher than expected
sequence number, send an ack with next sequence number expected
If the receiver gets a missing segment which extends the longest
contiguous byte stream from the start that it has received, send an ack
with the next sequence number to expect
If a duplicate segment arrives, discard the segment but send an ack
with the next sequence number expected
Ensuring Reliable, Inorder Transfer
Checksum (mostly) guarantees end-to-end data integrity
Sequence numbers detect packet sequencing problems:
Duplicate: ignore
Reordered: reorder or drop
Lost: retransmit
Lost segments detected by sender
Use timeout to detect lack of acknowledgment
Retransmission requires that sender keep copy of the
data.
Copy is discarded when ack is received
Ensuring Reliability
Keeps separate timer for each unacknowledged segment
Uses retransmissions of segments whose timers expire
Retransmission time-out depends on round-trip delay
Round-trip delay varies based on path followed, network
condition etc. So how to set?
Solution - estimate RTT dynamically
Estimating Round-trip Delay
Data
Sample Ack
Slow start is
not so slow!!
cwnd = 2
cwnd = 4
cwnd = 8
The change in congestion window is per ack
So a cumulative ack will still increase the window by 1
only
Given a ss_thres = 8, work out how many RTTs it will
take for slow start to reach the threshold if there is one
ack per 2 segments
Congestion Avoidance
Slow start sets a “good” congestion window fast
Congestion avoidance slows down the increase in cwnd
Why do we need to slow down even if there is no timeout?
If cwnd > ss_thresh then
each time a segment is acknowledged
increment cwnd by 1/cwnd (cwnd += 1/cwnd)
cwnd is increased by one only if all segments have been
acknowledged
Increases by 1 per RTT, vs. doubling per RTT in slow start
Additive Increase
cwnd = 1
cwnd = 2
cwnd = 3
cwnd = 4
cwnd = 5
On Detecting Congestion
On a timeout
ss_thresh is set to half the current size of the congestion
window:
ss_thresh = cwnd / 2
cwnd is reset to one
cwnd = 1
Slow-start phase is entered
This is called multiplicative decrease
Example
16 ss_thresh
Timeout
12
Congestion
Avoidance
cwnd cwnd = 8
ss_thresh = 4
1
Time
Fast Retransmit
On receiving 3 duplicate ACKs (total 4) for the same
segment,
Retransmit the segment without waiting for timeout
Set ss_thresh = cwnd/2
Set cwnd = cwnd/2 + 3
Why the +3?
TCP Reno
Adds Fast Recovery
On 3 duplicate ACKs, go back to congestion avoidance
Fast Retransmit and Fast Recovery are implemented
together
On receiving 3 duplicate ACKs for the same segment,
Retransmit the segment without waiting for timeout
Set ss_thresh = cwnd /2
Set cwnd = cwnd/2 + 3
On subsequent duplicate ACK, cwnd = cwnd + 1
On new ACK, cwnd = ss_thres
TCP Saw Tooth Behavior
TCP Variants
Many implementations of TCP has been done since Reno,
some to address some problems, some suited for specific
scenarios (ex. links with high bandwidth delay product (Long,
fat links) etc.)
TCP New Reno (1999)
TCP Vegas (1995)
TCP SACK (1996)
TCP Westwood (2001)
Fast TCP (2006)
HSTCP (High Speed TCP) (2003)
TCP Bic (2004)
Cubic TCP
Default implementation for Linux
Transport Layer Protocols:
UDP (User Datagram Protocol)
User Datagram Protocol (UDP)
Transport layer protocol like TCP, notion of port to
identify application layer service
Provides multiplexing/demultiplexing of applications
Connectionless (no connection setup/termination)
So demultiplexing done on the basis of one endpoint (<IP,
port> pair)
Each block of message given by user is sent
independently and separately
An UDP datagram
Datagrams can be lost, or arrive in different order from
the order sent
No flow control
So no acknowledgement, window maintenance etc.
No error control
So no acknowledgement, timeout, retransmission etc.
Unreliable service
No congestion detection or control
So provides only multiplexing/demultiplexing of
applications
But simple and fast, as there is no complex operations
like flow control, error control etc.
So no need of any connection state maintenance also
UDP Message Format
0 16 32
Length Checksum
Data
Header Fields
Source port, destination port: Identifies the UDP
applications at the two ends
Length: Size of the datagram in bytes, including header
Checksum: Checksum of Psuedoheader + UDP datagram
Psuedoheader computed the same way as for TCP
Simple, fast protocol used by many applications when
reliability is not a big issue
DNS (53) – Domain Name System
tftp (69) – Trivial File Transfer Protocol
ntp (123) – Network time Protocol
snmp (161) – Simple Network Management Protocol
RIP (520) – Routing Information Protocol
DHCP (546,547) – Dynamic Host Configuration Protocol
Many other well known applications