The Internet and TCP/IP
Brian Bramer
Department of Computing Sciences
DeMontfort University
Leicester UK
TCP/IP Introduction
1 The Evolution of TCP/IP (and the Internet)...... 2
2 The TCP/IP Protocol Architecture...... 3
3 The Internet Protocol Layer...... 3
3.1 IPAddresses...... 4
3.1.1 The subnet mask...... 5
3.2 DHCP (Dynamic Host Configuration Protocol)...... 5
Some ISPs offer static IP address to broadband users, e.g. if you runs servers on your home PC.5
3.3 IP Domains and Host Names...... 5
3.3.1 Domain Name Servers (DNS)...... 6
3.4 Problems with IP addressing and the IPv6 proposal...... 6
4 The Transport Layer – TCP and UDP protocols...... 6
4.1 Clients and servers and TCP and UDP ports...... 7
5 TCP/IP Applications...... 9
7 NAT (Network Address Translation Protocol) ...... 9
Appendix A The IP datagram packet...... 10
Appendix B Routing IP Datagrams...... 12
B1. IP address classes...... 12
B.2 The routing process...... 12
TCP/IP Introduction
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
The Internet (also called the World Wide Web – WWW) interconnects many different types of computers running different operating systems attached to numerous types of networks.
Where did it originate and how did it evolve into the world spanning communication system we have today?
1 The Evolution of TCP/IP (and the Internet)
In the 1960’s many governments and large organisations had numbers of (expensive) computer systems and were having serious problems transferring information between them, i.e. incompatible operating systems, hardware, character codes, number systems, etc.
- U.S. Department of Defence funded an experiment to interconnect research sites
- December 1968, the Advanced Research Projects Agency (ARPA) awarded a contract to design and deploy a packet switching network.
- September 1969, the first ARPANET node installed at UCLA.
- End of 1969 - four nodes installed, i.e. the Internet started with four nodes
- 1971 - ARPANET spanned the continental U.S.
- 1973 - had connections to Europe.
- ARPANET developed a user-network protocol - that has become the standard interface between users and packet switched networks, i.e. ITU-T (formerly CCITT) X.25.
- In 1974 a suite of communications protocols was proposed and implemented throughout the ARPANET based upon the Transmission Control Protocol (TCP) and Internet Protocol (IP) - usually referred to simply as TCP/IP.
- In 1983, the US DofD mandated that all of their computer systems would use the TCP/IP protocol suite for long-haul communications.
- In 1983, the ARPANET was split into two components:
ARPANET - interconnects research! development and academic sites;
MILNET - carries military traffic and became part of the Defense Data Network.
- 1983 - University of California s UNIX implementation, 4.2BSD (Berkeley Software Distribution) UNIX included TCP/IP.
- 1986 - the NSF built a backbone network to interconnect four NSF-funded regional supercomputer centres and the National Centre for Atmospheric Research (NeAR).
- NSFNET was originally intended as a backbone for other networks and limited traffic to non-commercial use.
Thus
- ARPANET started with four nodes in 1969
- 600 nodes before it was split in 1983.
- 160000 in 1989
- 19500000 in 1997
- today is unknown but is in 100’s of millions
2 The TCP/IP Protocol Architecture
- Communication over Internet is mostly TCP/IP (Transmission Control Protocol/Internet protocol)
- TCP/IP "stack" is software which allows applications to communicate over network
- TCP/IP support is either built into OS (e.g... UNIX) or available as an add-on
The diagram on the left shows the TCP/IP architecture layers in outline (discussed below) and the diagram on the right shows the protocols which make up the layers.
Application is what the users see, e.g. programs such as ftp, email, web browser, telnet, etc.
TCP (transmission control protocol) takes messages from the application, breaks them up into packets and sends them to the remote system where the message is put back together and passed to the application - TCP corrects for errors in transmission (e.g. due to noise) and looks after flow control (a slow system talking to a fast one).
IP (Internet protocol) looks after addressing of machines (each machine has its own unique address) and routing the packets over the underlying network.
Low level stuff is the Network Interface Layer connecting to the underlying network(s) - TCP/IP was developed by the USA Dept of Defence to operate over multiple unreliable local or wide area networks connecting many different types of computer systems.
3 The Internet Protocol Layer
The Internet Protocol (RFC 791) provides services that are roughly equivalent to the OSI Network Layer.
IP provides a datagram (connectionless) transport service across the network. This service is sometimes referred to as unreliable because the network does not guarantee delivery nor notify the end host system about packets lost due to errors or network congestion, i.e. TCP/IP was assumed to run over unreliable networks.
IP datagrams contain a message, or one fragment of a message, that may be up to 65,535 bytes (octets) in length. IP does not provide a mechanism for flow control (the TCP layer can provide this if necessary).
The above diagram shows two host computers communicating using TCP/IP via two nodes, e.g. routers. For example, consider a user on a home network communicating using a web browser with it’s ISP (Internet Service Provider) server
- Host 1 is connected via Subnet 1 (the home network e.g. Ethernet or wireless) to Gateway 1 which is a combined router and ADSL modem
- Gateway 1 ADSL modem communicates via subnet 2 (telephone lines, etc.) with the ISP’s ADSL modem/ router Gateway 2
- Gateway 2 communicates with Host 2 via the ISP’s internal network Subnet 3.
IP interfaces with each subnet in turn using it to transmit datagrams (DG) to the next node (using whatever physical, datalink and network layer that subnet has) until they reach the destination. The TCP layer can create a virtual circuit (correcting for errors, flow control, etc.) between the hosts for the applications which exchange messages.
3.1 IPAddresses
Every machine on a TCP/IP network requires a unique address so it can be identified and packets routed to it. IP addresses are 32 bits in length typically written as a sequence of four 8-bit numbers (range 0 to 255), representing the decimal value of each of the address bytes. e.g. 199.182.20.17.
IP addresses are hierarchical for routing purposes and are subdivided into two subfields
- Network Identifier (NET ID) subfield identifies the TCP/IP subnetwork connected to the Internet and is used for high-level routing between networks, i.e. as the country code, city code, or area code is used in the telephone network.
- The Host Identifier (HOST_ID) subfield indicates the specific host within a subnetwork.
To accommodate different size networks, IP defines several address classes.
Class A addresses have a 7-bit NET_ID and 24-bit HOST_ID - intended for very large networks and can address up to 16,777,216 (224) hosts per network. The first digit of a Class A addresses will be a number between 1 and 126. Relatively few Class A addresses have been assigned; examples include 9.0.0.0 (IBM) and 35.0.0.0 (Merit).
Class B addresses have a 14-bit NET ID and 16-bit HOST ID - intended for moderate sized networks and can address up to 65,536 (216) hosts per network. The first digit of a Class B address will be a number between 128 and 191. The Class B address space is most in danger of being exhausted of any of the classes and it is very difficult to get a Class B address assigned at this time, e.g. 128.138.0.0 (Colorado SuperNet) and 147.225.0.0 (AINSNET).
Class C addresses have a 21-bit NET_ID and 8-bit HOST_ID - intended for small networks and can address only up to 256 hosts per network. The first digit of a Class C address will be a number between 192 and 223. Most addresses assigned to networks today are Class C; examples include 192.100.81.0 (Netcom) and 192.80.64.0 (St. Michael’s College, Colchester, VT).
The remaining two address classes are used for special functions only and are not commonly assigned to individual hosts.
Class D addresses may begin with a value between 224 and 239, and are used for IP multicasting (i.e., sending a single datagram to multiple hosts).
Class E addresses begin with a value between 240 and 255 and are reserved for experimental use.
3.1.1 The subnet mask
An additional addressing tool is the subnet mask which is used to indicate to applications the portion of the address that identifies the network from the portion that identifies the individual hosts.
The subnet mask is written in dotted decimal and the number of is indicates the significant NET_ID bits.
A Class B address, for example, would typically have a subnet mask of 255.255.0.0 since the first l6 bits are NET_ID.
3.2 DHCP (Dynamic Host Configuration Protocol)
A machine can have a static IP address which is the same each time it connects or a dynamic address which is assigned when it connects to the Internet. (and can be different each time). DHCP is the protocol for assigning dynamic IP addresses – the ISP has a range of IP addresses available which are assigned when devices connect and become free on disconnection.
Machines permanently (e.g. DMU labs) attached to TCP/IP network are permanently allocated an IP address by the network manager. When you dial-in from home to access the Internet via ISP (Internet Service Provider) Dial-in service your machine needs an IP address.
This is usually assigned by the ISP’s DHCP server at dial-in time using dynamic IP address assignment:
- the server has a number of addresses available for dial-in clients and assigns the next free one
- at disconnection the IP address is then available for another client
Some ISPs offer static IP address to broadband users, e.g. if you runs servers on your home PC.
3.3 IP Domains and Host Names
While IP addresses are 32 bits in length, most users do not memorize the numeric addresses of the hosts to which they attach; instead, people are more comfortable with host names. Most IP hosts, then, have both a numeric IP address and a name.
Internet hosts use a hierarchical naming structure comprising a top-level domain (TLD), domain and subdomain (optional), and host name, e.g. = 146.227.1.23
The domain name structure is best read from right-to-left, Internet host names end with a top- level domain name.
World-wide generic top-level domains include:
.com: Commercial organizations
.edu: Educational institutions, although today usually limited to 4-year colleges and universities
.net: Network providers
.org: Non-profit organizations
.int: Organizations established by international treaty
.gov: U.S. Federal government agencies
.mil: U.S. military
3.3.1 Domain Name Servers (DNS)
Domain names are convenient for people, however, the name must be translated back to a numeric address for routing purposes:
- names and numbers are stored by a "domain name server" (DNS)
- Client programs may query the DNS to find a number before making a connection, e.g. UNIX nslookup command
e.g. = 146.227.1.23
3.4 Problems with IP addressing and the IPv6 proposal
The are a number of problems with the current IP addressing (called IPv4) based on a 32-bit number:
- Exhaustion of address space – the Internet is running out of IP addresses, e.g. class A and B sites were allocated 16 million and 65536 addresses respectively even if they only used a small proportion of them
- Poor routability IPv4 - network addresses bear no relation to physical location, making routine difficult and creating large routing tables
- Limited support for multicast – Ipv4 has class D for multicast which is very limited for commercial applications
- Inefficient/inflexible header – all IP packets carry the same header overhead no matter what their requirements are.
IPv6 has a 128 bit address space which is usually written as eight 16-bit numbers e.g., FEDC:BA98:7654:3210:FEDC:BA98:7654:3210. Thus Ipv6 supports addresses which are four times the number of bits as Ipv4 addresses (128 vs. 32). This is 4 Billion times 4 Billion times 4 Billion (2^^96) times the size of the Ipv4 address space (2^^32). This works out to be: 340,282,366,920,938,463,463,374,607,431,768,211,456 This is an extremely large address space. In a theoretical sense this is approximately 665,570,793,348,866,943,898,599 addresses per square meter of the surface of the planet Earth (assuming the earth surface is 511,263,971,197,990 square meters).
In addition to extended addressing IPv6 offers
- improved security: encrypting along with authentication and security headers.
- Improved multicasting
- sender and a receiver to establish a high-quality, high-bandwidth secure path between each other.
4 The Transport Layer – TCP and UDP protocols
The TCP/IP protocol suite comprises two protocols that correspond roughly to the OSI Transport and Session Layers; these protocols are called the Transmission Control Protocol and the User Datagram Protocol (UDP).
TCP provides a virtual circuit (connection-oriented) communication service across the network. TCP includes rules for formatting messages, establishing and terminating virtual circuits, sequencing, flow control, and error correction.
Most of the applications in the TCP/IP suite operate over the reliable transport service provided by TCP.
UDP provides an end-to-end datagram (connectionless) service. Some applications, such as those that involve a simple query and response, are better suited to the datagram service of UDP because there is no time lost to virtual circuit establishment and termination.
4.1 Clients and servers and TCP and UDP ports
Consider a server program running on a machine in a TCP/IP network and providing a service (mail, WWW, etc) to remote clients.
To send a message to the server (e.g. to collect email) the client has to send a packet to
a)a particular program, e.g. the email server
b)running on a particular machine, e.g. DMU’s email server
Requirement b) is satisfied by knowing the machine’s Domain Name or IP address, e.g. DMU’s email server is helios.dmu.ac.uk on IP address 146.227.1.2.
However, a particular machine may be running several servers (email, ftp, www, etc.) so how is a packet delivered to the correct program, i.e. requirement a) above? This achieved by ‘ports’ via which programs communicate.
When TCP/IP is running on a particular machine (with a particular IP address) TCP and UDP each have 65536 ports numbered 0 to 65535 many of which are reserved for standard services.
For example, consider a student accessing the DMU web page on
- When a server is started it attaches to a TCP or UDP port (it is said to ‘listen’ to that port), e.g. a HTTP (WWW) server listens on TCP port 80
- The student would run a web client program (a ‘web browser’ such as MS Internet Explorer)
- The student would enter the URL
- The web client would contact a DNS server to obtain the webs server’s IP address; at the time of writing DMU’s web server was running on a machine called parkway.dmu.ac.uk with IP address 146.227.1.23
- The web client would send a packet to IP address 146.227.1.23 port 80 requesting a web page
- The packet would specify the clients IP address (e.g. 146.227.23.9 if the student is in one of DMU’s labs) and a spare port on the client machine (e.g. 5023) for the reply to be sent to – the client now ‘listens’ on this port
- the web server receives the packet, gets the web page off disk, constructs a packet and sends it to the client’s IP address and specified port, e.g. 146.227.23.9 port 5023
A server program is usually capable of processing requests from many simultaneous clients (see for more details).
Port numbers below 1024 are reserved for ‘standard’ services and many not be used by users, e.g.
- E-mail - SMPT (you send the email to ) port 25 and POP3 (you get the email from) port 110
- Remote login (telnet) - port 23
- finger - port 79
- File transfer (FTP) - 20 (FTP data transfer), 21 (FTP control)
In Summary
Applications and utilities reside in host, or end-communicating, systems. TCP provides a reliable, virtual circuit connection between the two hosts.
UDP provides an end-to-end datagram connection at this layer.
IP provides a datagram (DG) transport service over any intervening subnetworks, including local and wide area networks.
The underlying subnetwork may employ nearly any common local or wide area network technology.
5 TCP/IP Applications
The Application Layer protocols include:
Telnet: Short for Telecommunication Network, a virtual terminal protocol allowing a user logged on to one TCP/IP host to access other hosts on the network
FTP: The File Transfer Protocol allows a user to transfer files between local and remote host computer.
SMTP: The Simple Mail Transfer Protocol is the standard protocol for the exchange of electronic mail.
HTTP: The Hypertext Transfer Protocol is the basis for exchange of information over the Internet (WWW).
Finger: Used to determine the status of other hosts and/or users.
POP: The Post Office Protocol defines a simple interface between a user’s mail reader software and an electronic mail server; the current version is POP3.
DNS: The Domain Name System defines the structure of Internet names and their association with lP addresses, as well as the association of mail, name, and other servers with domains.
SNMP: The Simple Network Management Protocol defines procedures and management information databases for managing TCP/IP-based network devices.
Ping: A utility that allows a user at one system to determine the status of other hosts and the latency in getting a message to that host. Uses ICMP Echo messages.
Whois/NICNAME: Utilities that search databases for information about Internet domain and domain contact information.
Traceroute: A tool that displays the route that packets will take when travelling to a remote host.
The above diagram shows some common TCP/IP servers attached to their TCP and UDP ports at the TCP layer which pass data to the IP layer which transmits it as datagrams over the underlying network.
7 NAT (Network Address Translation Protocol)
An organisation may be assigned one IP address (or a small number) yet have many machines, e.g. an domestic ADSL line is assigned one IP address yet a house may have four or five PCs.