Book Organization
This book delves into many particulars of network programming. It is organized into five specific parts, each part building upon earlier parts:
- Part I: Network Programming from the Client Perspective
This part introduces sockets and defines terms. It describes the different types of sockets, addressing schemes, and network theory.
- Part II: The Server Perspective and Load Control
Part II expands socket programming with servers, multitasking techniques, I/O control, and socket options.
- Part III: Looking at Sockets Objectively
C is not the only programming language that provides access to sockets. This part presents some object-oriented approaches and describes the advantages and limitations of object technology in general.
- Part IV: Advanced Sockets—Adding Value
This part introduces the large realm of advanced network programming techniques, including security, broadcast and multicasting, IPv6, and raw sockets.
- Part V: Appendixes
The appendixes consolidate much of the resource material relevant to sockets. The first appendix includes tables and listings too large for the chapters. The second and third appendixes describe the Socket and Kernel APIs.
The companion Web site contains all the source code for the book examples, the book appendixes in HTML and Adobe's Portable Document Format (PDF), and socket programming– related RFCs in HTML format.
Conventions Used in This Book
The following typographic conventions are used in this book:
- Code lines, commands, statements, variables, and any text you type or see onscreen appears in a mono typeface. Bold mono typeface is often used to represent the user's input.
- Placeholders in syntax descriptions appear in an italic mono typeface. Replace the placeholder with the actual filename, parameter, or whatever element it represents.
- Italic highlights technical terms when they're being defined.
- The å icon is used before a line of code that is really a continuation of the preceding line. Sometimes a line of code is too long to fit as a single line on the page. If you see å before a line of code, remember that it's part of the line immediately above it.
- The text also includes references to the Internet standards documents called Requests For Comment (RFCs). The reference citations are enclosed in brackets with the RFC number, for example [RFC875].
1 Introducing the Cookbook Network Client
2 TCP/IP Network Language Fluency
3 Different Types of Internet Packets
4 Sending Messages Between Peers
5 Understanding the Network Layering Model
Chapter 1. Introducing the Cookbook Network Client
IN THIS CHAPTER
- Connecting the World with Sockets
- Talking the Talk: TCP/IP Addressing Overview
- Hearing the Server: The Client's Basic Algorithm
- Summary: What's Going On Behind the Scenes?
That blasted CMOS RAM battery! Okay, what time is it? No clocks visible, I'll just call Time. 1-614-281-8211. Ring. "...The time is eight twenty-three and forty seconds." Click. Hurrumph! a.m. or p.m.? Do they expect me to look outside?
The computer you use probably is connected to a network of some kind. It could be a full corporate intranet with firewalls into the Internet; it could be a couple of computers you connected in your spare time. The network connects workstations, servers, printers, disk arrays, faxes, modems, and so forth. And each network connection uses or provides a service. Some services provide information without any interaction. Just as in our example of calling Time, a basic network client connects with and listens to a server.
What kinds of services do servers provide? Many. All services fit into four resource categories: common, limited or expensive, shared, and delegated. Here are some examples of each:
Common / Disk space (centrally backed up)Limited / Printers, modems, disk arrays
Shared / Databases, project control, documentation
Delegated / Remote programs, distributed queries
This chapter steps you through writing a basic client that connects to some server. This process helps you understand all that is involved in writing network programs. The client initially connects to the server's correct time service (or some other service that does not expect input first). Along the way, the chapter explains the different calls, their parameters, and common errors.
The client program needs a send/receive interface and an address to connect to a server. Both clients and servers use sockets to connect and send messages independently of location. Consider the telephone example again: The handset has two parts, a microphone (transmission) and a speaker (reception). Sockets also have these two channels. In addition, the telephone number is essentially the unique address for the phone.
The socket likewise has two parts or channels: one for listening and one for sending (like the read/write mode for file I/O). The client (or caller) connects with the server (or answerer) to start a network conversation. Each host offers several standard services (see /etc/services on the file system), like the correct time telephone number.
NOTE
You can run most of the book's program examples without being connected to a network, if you have networking enabled in the kernel and the inetd network server daemon running. In fact, many examples use the local (or loopback) address of 127.0.0.1. If you do not have the network drivers up and running, most Linux distributions include everything you need for at least loopback networking.
Your client program must take several steps to communicate with a peer or server. These steps have to follow a particular sequence. Of course, you could ask: "Why not replace all the steps with fewer calls?" Between each step, your program can select from many options. Still, some steps are optional. If your client skips some steps, usually the operating system fills in the blanks for you with default settings.
You can follow a few basic steps to create a socket, set up the destination host, establish the channel to another network program, and shut down. Figure 1.1 graphically shows the steps the client takes to connect to a server.
Figure 1.1. Each client interfaces with the operating system by making several calls in succession.
The following list describes each step:
- Create a socket. Select from the various network domains (such as the Internet) and socket types (stream).
- Set socket options (optional). You have many options that affect the behavior of the socket. The program can change these options anytime the socket is open. (See Chapter 9, "Breaking Performance Barriers," for more detail.)
- Bind to address/port (optional). Accept connections from all or a single IP address, and establish port service. If you skip this, the operating system assumes any IP address and assigns a random port number. (Chapter 2, "TCP/IP Network Language Fluency," discusses addresses and ports in much greater detail.)
- Connect to peer/server (optional). Reach out and establish a bidirectional channel between your program and another network program. If you skip this, your program uses directed or connectionless communication.
- Partially close the connection (optional). Limit the channel to either sending or receiving. You may want to use this step after duplicating the channel.
- Send/receive messages (optional). One reason to opt out of any I/O might include checking host availability.
- Close the connection. Of course this step is important: Long-running programs may eventually run out of available file descriptors if the programs do not close defunct connections.
The following sections describe some of these steps, defining the system calls and providing examples
Connecting the World with Sockets
Several years ago, networking involved a dedicated serial line from one computer to another. No other computers shared the same circuit, and UNIX used UUCP (UNIX-to-UNIX Copy) to move files around. As line transmission technology improved, the concept of sharing the transmission line became feasible. This meant that each computer needed to identify itself uniquely and take turns transmitting. There are several different methods for sharing time on the network, and many work rather well. At times, computers transmit simultaneously, causing a packet collision.
The hardware and low-level drivers handle issues such as collisions and retransmission, now an artifact of past programming. This frees up your design to focus on transmission and reception of messages. The Socket API (Application Programming Interface) provides designers the conduit to receive or send messages.
Socket programming differs from typical application or tool programming, because you work with concurrently running programs and systems. This means that you need to consider synchronization, timing, and resource management.
Sockets link asynchronous tasks with a single bidirectional channel. This could lead to problems like deadlock and starvation. With awareness and planning, you can avoid most of these problems. You can read how to handle multitasking issues in Chapter 7, "Dividing the Load: Multitasking," and building robust sockets in Chapter 10, "Designing Robust Linux Sockets."
Typically, an overloaded server slows down the Internet's perceived responsiveness. Timing and resource management reduce the server's burden, increasing network performance. You can find many ideas for improving performance in Part II, "The Server Perspective and Load Control."
The Internet was designed to be entirely packet switched. Each and every packet has to have all the necessary information it needs to get to the destination. Like a letter, a packet must include source and destination addresses. The packet switches from one computer to the next along the connections (or links). If the network loses a link while passing a message, the packet finds another route (packet switching), or the router bounces an error back to the originator if it fails to reach the host. This ensures a form of data reliability. Broken paths in the network result in network outages. You probably have encountered a few network outages yourself.
Talking the Talk: TCP/IP Addressing Overview
Networks support many different types of protocols. Programmers have geared some protocols to address specific issues such as radio/microwave; others attempt to solve the network reliability problems. TCP/IP (Transmission Control Protocol/Internet Protocol) focuses on the packet and the potential of lost communication channels. At any time, the protocol attempts to find a new route when a network segment fails.
Packet tracking, loss detection, and retransmission are difficult algorithms, because timing is not the only indicator. Luckily, industry experience has proven the algorithms used in the protocol. Usually, you can ignore those issues during design, because the solutions are hidden deep in the protocol.
TCP/IP is layered: Higher-level protocols provide more reliability but less flexibility, and lower levels offer greater flexibility but sacrifice reliability. With all the different levels of flexibility and reliability, the Socket API offers all the needed interfaces. This is a departure from the standard UNIX approach of every level having its own set of calls.
The standard file I/O likewise uses a layered approach. Computers connected via TCP/IP use sockets predominantly to communicate with each other. This may seem strange, considering all the different protocol layers available to a program and having been taught that open() (which yields a file descriptor) and fopen() (which yields a file reference) are different and almost incompatible. All protocol layers are available through one call: socket(). This single call abstracts away all the implementation details of the different networks (TCP/IP, IPX, Rose).
Fundamentally, each packet has the data, the originator address, and the destination address. Every layer in the protocol adds its own signature and other data (wrapper) to the transmission packet. When transmitted, the wrapper helps the receiver forward the message to the appropri ate layer to await reading.
Every computer connected to the Internet has an Internet Protocol (IP) address, a unique 32-bit number. Without the uniqueness, there is no way to know the proper destination for packets.
TCP/IP takes the addressing one step further with the concept of ports. Like the 3- to 5-digit telephone extensions, each computer address has several ports through which the computers communicate. These are not physical; rather, they are abstractions of the system. All information still goes through the network address like the primary telephone number.
The standard written format for IP addresses is [0-255].[0-255].[0-255].[0-255]—for example, 123.45.6.78. Note that zero and 255 are special numbers used in network masks and broadcasting, so be careful how you use them (Chapter 2 discusses IP numbering in greater detail). Internet ports usually separate these numbers with either a colon or a period:
[0-255].[0-255].[0-255].[0-255]:[0-65535]
For example, 128.34.26.101:9090 (IP=128.34.26.101, port=9090).
[0-255].[0-255].[0-255].[0-255].[0-65535]
For example, 64.3.24.24.9999 (IP=64.3.24.24, port=9999).
NOTE
The colon notation is more common for ports than the period notation.
Each IP address effectively offers about 65,000 port numbers that a socket may connect to. See Chapter 2 for more information.
Hearing the Server: The Client's Basic Algorithm
The simplest client-socket connection is one that opens a connection to a server, sends a request, and accepts the response. Some of the standard services don't even expect any prompting. One example is the time-of-day service found on port 13. Unfortunately, many Linux distributions do not have that service open without revising the /etc/inetd.conf file. If you have access to a BSD, HP-UX, or Solaris machine, you can try that port.
There are several services available to play with safely. You may try running Telnet on your machine to connect to the FTP port (21):
% telnet 127.0.0.1 21
After connecting, the program gets the welcome message from the server. Using Telnet to con nect with the FTP server does not work very well, but you can see the basic interaction. The simple client example in Listing 1.1 connects to the server, reads the welcome, and then disconnects.
Example 1.1. A Basic TCP Client Algorithm
/************************************************************/
/*** A basic client algorithm. ***/
/************************************************************/
Create a socket
Create a destination address for server
Connect to server
Read & display any messages
Close connection.
The algorithm in Listing 1.1 may seem overly simplified, and perhaps it is. However, connecting to and communicating with a server is really that simple. The following sections describe each of these steps. You can find the complete source for this program at the end of the book and on the accompanying CD-ROM.
The Socket System Call: Procedures and Caveats
The single tool that creates your effective message receiver and starts the whole process of sending and receiving messages from other computers is the socket() system call. This call is the common interface between all protocols available on a Linux/UNIX operating system. Just like the system call open() creates a file descriptor to access files and devices on your system, socket() creates a descriptor to access computers on your network. It requires information that determines what layers you want to access. The syntax is as follows:
#include <sys/socket.h>
#include <resolv.h>
int socket(int domain, int type, int protocol);
The socket() system call accepts several different values. For a complete list, see Appendix A, "Data Tables." For now, you'll find a few in Table 1.1.
Table 1.1. Selected socket() System Call Parameter Values
Parameter / Value / Descriptiondomain / PF_INET / Internet IPv4 protocols; TCP/IP stack.
PF_LOCAL / BSD-style locally named pipes. Typically used in the system logger or a print queue.
PF_IPX / Novell protocols.
PF_INET6 / Internet IPv6 protocols; TCP/IP stack.
type / SOCK_STREAM / Reliable, sequential data flow (byte stream) [Transaction Control Protocol (TCP)].
SOCK_RDM / Reliable, packetized data (not yet implemented in most operating systems).
SOCK_DGRAM / Unreliable, packetized data (datagram) [User Datagram Protocol (UDP)].
SOCK_RAW / Unreliable, low-level packetized data.
protocol / This is a 32-bit integer in network byte order (see the section on network byte-ordering in Chapter 2). Most connection types support only protocol = 0 (zero). The SOCK_RAW requires specifying a protocol value between 0 and 255.
For now, the only parameters the example uses are domain=PF_INET, type=SOCK_STREAM, and protocol=0 (zero).
NOTE
This book uses the PF_* (protocol family) domains in the socket call, because PF_* domain constants are the proper form. However, most programs use the AF_* (address family) constants interchangeably. Be careful not to get confused when you see source code that uses the AF style. (The C-header files define the AF_* constants as PF_*.) If you want, using AF_* works just as well. However, this may cause incompatibilities in the future.
An example for a streaming TCP/IP call looks like this:
int sd;
sd = socket(PF_INET, SOCK_STREAM, 0);
The sd is the socket descriptor. It functions the same way as the file descriptor fd:
int fd;
fd = open(...);
The call returns a negative value when an error occurs and places the error code in errno (the standard global variable for library errors). Here are some common errors you can get:
- EPROTONOSUPPORT The protocol type or the specified protocol is not supported within this domain. This occurs when the domain does not support the requested protocol. Except for SOCK_RAW, most domain types support only a protocol of zero.
- EACCES Permission to create a socket of the specified type or protocol is denied. Your program may not have adequate privileges to create a socket. SOCK_RAW and PF_PACKET require root privileges.
- EINVAL Unknown protocol, or protocol family not available. This occurs when a value in either the domain or type field is invalid. For a complete list of valid values, refer to Appendix A.
Of course, you need to know the important header files to include. For Linux, these are