Which factors make TCP reliable

Transport layer

Above the Internet layer is the Transport layer (host-to-host transport layer). The two most important protocols of the transport layer are Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP). The role of TCP is to provide secure and reliable end-to-end transport of data through a network. In contrast, UDP is a connectionless transport protocol that allows applications to transmit encapsulated raw IP packets.

Transmission Control Protocol (TCP)

The Transmission Control Protocol (TCP) is a reliable, connection-oriented, Byte stream Protocol. The main role of TCP is to provide secure transport of data through the network. TCP is defined in RFC 793. In the course of time, errors and inconsistencies have been removed from these definitions (RFC 1122) and some requirements have been added (RFC 1323).

In the following, the above-mentioned properties of the Transmission Control Protocol - reliable, connection-oriented, byte stream - will now be considered in more detail.

The Transmission Control Protocol provides the reliability transferring data using a mechanism called Positive Acknowledgment with Re-Transmission (PAR) called, ready. This means nothing other than that the system that sends the data repeats the transmission of the data until the recipient acknowledges or positively confirms receipt of the data. The data units that are exchanged between the sending and receiving TCP units are called Segments. A TCP segment consists of a protocol header with a size of at least 20 bytes (see below The TCP header) and the data to be transmitted. Each of these segments contains a checksum which the recipient can use to check whether the data is free of errors. In the event of an error-free transmission, the receiver sends an acknowledgment of receipt to the sender. Otherwise the datagram will be discarded and no confirmation of receipt will be sent. If the sender has not received a confirmation of receipt after a certain period of time (timeout period), the sender sends the relevant segment again. For more information on time monitoring see [Sa94].

TCP is a connection-oriented protocol. Connections are made via a Three-way handshake built up. The three-way handshake is used to exchange control information, which is the logical End-to-end connection establish. To establish a connection, a host (host 1) sends another host (host 2), with which it wants to establish a connection, a segment in which the SYN flag (see below the TCP header flags) is set. With this segment, host 1 informs host 2 that a connection is required. The sequence number of the segment sent by host 1 also tells host 2 which sequence number host 1 is using for data transmission. Sequence numbers are necessary to ensure that the data from the sender arrives at the receiver in the correct order. The receiving host 2 can now accept or reject the connection. If he accepts the connection, an acknowledgment segment is sent. The SYN bit and the ACK bit (see below The TCP header - flags) are set in this segment. In the field for the acknowledgment number, host 2 confirms the sequence number of host 1 by sending the sequence number of host 1 increased by one. The sequence number of the acknowledgment segment from host 2 to host 1 informs host 1 of the sequence number starting with which host 2 receives the data. Finally, host 1 confirms receipt of the acknowledgment segment from host 2 with a segment in which the ACK flag is set and the sequence number from host 2 increased by one is entered in the acknowledgment number field. With this segment, the first data can also be transmitted to host 2 at the same time. After exchanging this information, host 1 has confirmation that host 2 is ready to receive data. The data transfer can now take place. A TCP connection always consists of exactly two end points (point-to-point connection).


Three-way handshake (here connection establishment).

To terminate the connection, the two hosts exchange a three-way handshake in which the FIN bit (see below The TCP header flags) is set to terminate the connection. Of course, establishing a connection is not always without problems. A number of interesting considerations can be found in [Ta96].

TCP takes Data streams of applications and divides them into segments of a maximum of 64 KB (around 1,500 bytes are common). Each of these segments is sent as an IP datagram. If IP datograms with TCP data arrive at a machine, they are forwarded to TCP and reassembled into the original byte streams. However, the IP layer does not guarantee that the datograms will be delivered correctly. It is therefore, as already said above, the task of TCP to ensure that the data is retransmitted. However, it is also possible that the IP datagrams arrive correctly but are in the wrong order. In this case, TCP must ensure that the data is put back into the correct order. For this, TCP uses a Sequence number and a Confirmation number (see: The TCP Header - Sequence Number, Acknowledgment Number).

Port numbers

TCP is also responsible for forwarding the received data to the correct application. To address the applications, so-called Port numbers (channel numbers) used. Port numbers are 16 bits; theoretically, a host can thus establish up to 65,535 different TCP connections. UDP also uses port numbers for addressing. Port numbers are not unique between the transport protocols - the transport protocols each have their own address spaces. This means that TCP and UDP can use the same port numbers. This means that port number 53 in TCP is not identical to port number 53 in UDP. The scope of a port number is limited to one host.

----- That doesn't quite fit yet ...
An IP address together with the port number specifies a communication end point, a so-called Socket. The socket numbers of the source and destination identify the connection (socket1, socket2). A connection is clearly identified by specifying this pair. For example, if a host A wants to establish a connection to a remote host B, e.g. to display the content of a website, port number 80 for the Hypertext Transfer Protocol (http) is specified on the TCP layer as the destination port. Host A, who would like to use the service on port 80 of host B, specifies a dynamic port number (see below) from the range 49.152 - 65.535 as the source port, so that the desired data can be returned from host B to him. This clearly identifies the connection on the TCP layer by specifying the source and destination port. Together with the IP addresses, the port numbers form the two sockets that uniquely identify the communication between host A and host B.
-----

Until 1992 port numbers were below 256 for well-known ports reserved. Well-known ports are used for standard services such as telnet, ftp etc. Ports between 256 and 1023 were generally used for UNIX-specific services (such as). An example of the difference between an Internet-wide service and a UNIX-specific service is the difference between Telnet and RLogin. Both services allow you to log in to a remote host via the network. Telnet is a TCP / IP standard with port number 23 and can be implemented by almost all operating systems. In contrast, RLogin is a UNIX-specific service whose port number is 53.
The administration of the port numbers is now also from the Internet Assigned Numbers Authority (IANA)[http://www.iana.org] has been taken over. Port numbers have been divided into three areas: well-known ports, registered ports and dynamic ports.

On UNIX systems, port numbers are defined in the file. Extract from the file of a Linux system:

heiko @ phoenix: ~> more / etc / services # # Network services, Internet style # # Note that it is presently the policy of IANA to assign a single well-known # port number for both TCP and UDP; hence, most entries here have two entries # even if the protocol doesn't support UDP operations. # Updated from RFC 1340, `` Assigned Numbers '' (July 1992). Not all ports # are included, only the more common ones. # # from: @ (#) services 5.8 (Berkeley) 5/9/91 # $ Id: services, v 1.

 

The TCP header

The following figure shows the structure of the TCP protocol header.


The TCP header.

The sending and receiving TCP units exchange data in the form of segments. A segment is nothing other than the data to be transmitted, provided with "control information". Each segment begins with a 20-byte header, which can be followed by header options. Finally, the options are followed by the data to be transferred. The segment size is limited by two factors: first, each segment, including the TCP header, must fit into the payload field of the IP protocol (65,535 bytes); second, every network has one Maximum Transfer Unit (MTU)into which the segment must fit. As a rule, the MTU is a few thousand bytes and specifies the upper limit of the segment size (e.g. Ethernet 1,500 bytes). If a segment runs through a number of networks and encounters a network with a smaller MTU, the router must divide the segment into smaller segments (fragmented) become. Regardless of the size of the MTU, the TCP header and the options can be followed by a maximum of 65,535-20-20 = 65,495 data bytes (the first 20 bytes refer to the IP header, the second to the TCP header; the length of the options is counted among the data bytes). TCP segments without data are permitted and are used to transmit confirmations and control messages.

The fields of the TCP header have the following meaning:

Source / destination port:
The fields Source Port and Destination Port address the endpoints of the connection. The size for the two fields is 16 bits (see also the section on port numbers).

Sequence Number, Acknowledgment Number:
The Sequence number and the Confirmation number are each 32-bit numbers. The numbers indicate the position of the data in the segment within the data stream exchanged in the connection. The sequence number applies in the sending direction, the confirmation number for receipts. Each of the two TCP connection partners generates a sequence number when the connection is established, which changes during the period of the connection Not may repeat. However, this is due to the large number range of 232 probably adequately secured. These numbers are exchanged when the connection is established and mutually acknowledged. When data is transmitted, the sender increases the sequence number by the number of bytes already sent. The receiver uses the acknowledgment number to indicate the byte up to which it has already received the data correctly. However, the number does not indicate which byte was received correctly last, but which byte is to be expected next.

Offset:
The field Offset (or Header Length) specifies the length of the TCP header in 32-bit words. This corresponds to the beginning of the data in the TCP segment. The field is necessary because the header has a variable length due to the option field.

Flags:
With the six 1-bit flags in the FlagsField, certain actions are activated in the TCP protocol:
URG
Will the flag URG set to 1, it means that the Urgent Pointer is used.
ACK
The ACK bit is set to indicate that the confirmation number is in the field Acknowledgment Number is valid. If the bit is set to 0, the TCP segment does not contain a confirmation, the Acknowledgment Number field is ignored.
PSH
If the PSH bit is set, the data in the corresponding segment are made available immediately upon arrival of the addressed application without being buffered.
RST
The RST bit is used to reset a connection if an error occurred during transmission. This can be the case if an invalid segment was transmitted, a host has crashed or an attempt to establish a connection is to be rejected.
SYN
The SYN flag (Synchronize Sequenze Numbers) is used to establish connections. Together with the Acknowledgment Number and the ACK bit, the connection is in the form of a Three-way handshake constructed (see above).
FIN
The FIN bit is used to terminate a connection. If the bit is set, this indicates that the sender has no further data to transmit. The segment with the FIN bit set must be acknowledged.

Window:
The field Window size Contains the number of bytes that the recipient can receive from the already confirmed byte. With the specification of the window size, the flow control takes place in TCP. The TCP protocol works on the principle of a Sliding window with variable size. Each side of a connection is allowed to send the number of bytes specified in the field for the window size without waiting for an acknowledgment from the recipient side. During the transmission, acknowledgments for the data received from the other side can arrive at the same time (these acknowledgments can in turn set new window sizes). A window size of 0 means that the bytes up to and including the acknowledgment number minus one have been received, but the receiver cannot receive any further data at the moment. The permission to send further data is given by sending a segment with the same confirmation number and a window size not equal to zero.

Checksum:
The Checksum checks the log header, the data and the Pseudo headers (see picture).


The pseudo header in the checksum.

The algorithm for the formation of the checksum is simple: all 16-bit words are added in 1's complement and the sum is determined. During the calculation, the Checksum field is set to zero and the data field is padded by a zero byte if the length is odd. If the recipient of the segment carries out the calculation for the entire segment - including the field for the checksum - the result should be 0 [Ta96]. The pseudo header contains the 32-bit IP addresses of the source and destination machines as well as the protocol number (for TCP 6) and the length of the TCP segment. The inclusion of the fields of the pseudo header in the checksum calculation helps to identify incorrectly assigned packets due to IP. However, the use of IP addresses at the transport level is a violation of the protocol hierarchy.

Urgent pointer:
The Urgent pointer together with the sequence number results in a pointer to a data byte. This corresponds to a byte offset to a point at which urgent data is found. With this, TCP signals that there is important data at a certain point in the data stream that should be read immediately. The field is only read if the Urgent flag (see above) is also set.

Options:
The Options field is intended to provide a way of providing functions that are not provided in the normal TCP protocol header. Three options are defined in TCP: End of Option List, No-Operation and Maximum Segment Size. The most important of these three options is the maximum segment size. With this option, a host can transmit the maximum amount of user data that it wants or can accept. While a connection is being established, each side can transmit its maximum of user data; the smaller of the two numbers is used as the maximum user data size for transmission. If this option is not supported by a host, the default of 536 bytes is used.

Padding:
The field Padding is used to ensure that the header ends on a 32-bit boundary and the data begins on a 32-bit boundary. The fill field is filled with zeros.

User Datagram Protocol (UDP)

The User Datagram Protocol (UDP) is defined in RFC 768. UDP is an unreliable, connectionless protocol. As mentioned before, unreliable in this context does not mean that the data may arrive incorrectly at the target computer, but that the protocol does not provide any mechanisms to ensure that the data actually arrives at the target computer. However, if the data have arrived at the target computer, they are also correct. Compared to TCP, UDP offers the advantage of a low protocol overhead. Many applications in which only a small amount of data is transmitted (e.g. client / server applications that run on the basis of a request and a response) use UDP as the transport protocol, as it may take the effort to establish a connection and a reliable one Data transmission is greater than the repeated transmission of the data.

A UDP segment consists of an 8-byte header followed by the data. The header is shown in the following figure:


The UDP header.

The sender and receiver port numbers serve the same purpose as in the Transmission Control Protocol. They identify the endpoints of the source and destination machines. The length field contains the length of the entire data gram, including the length of the log header. The checksum contains the Internet checksum of the UDP data, the protocol header and the pseudo header. The checksum field is optional. If the field contains a 0, no checksum was entered by the sender and therefore no verification takes place at the recipient.

In addition to the services of the Internet Protocol, the User Datagram Protocol only provides port numbers for addressing the communication endpoints and an optional checksum. The protocol does not contain any transport acknowledgments or other mechanisms for providing a reliable end-to-end connection. However, this makes UDP very efficient and is therefore particularly suitable for applications in which the speed of data transmission is primarily important (e.g. distributed file systems such as NFS).