“The HTTP protocol is used to send HTML documents through the Internet. The HTTP protocol sends the HTML documents in packets, using TCP/IP. With each packet, the HTTP protocol attaches a header, which contains information such as the name and location of the page being requested, the name and IP address of the remote server that contains the Web page, the IP address of the local client, the HTTP version number, and the URL of the referring page. This information is referred to as the server variables. Internet programmers are able to retrieve the values in the header.

It is important to know that HTTP version 1.0 is a stateless protocol. This means that when a client requests a document from the Web server, the server will return the Web page to the client and end all communications with the client. If the client requests another page, the Web server normally has no way of knowing that the client has previously visited the Web site. However, by using methods such as cookies, session variables, text files and databases the server can maintain state – that is, recognize the client over multiple transactions – and thereby remember information from each transaction and link it with the specific client.

“HTTP 1.0 is documented in the informational RFC 1945; it is not an official Internet standard because it was primarily developed outside the IETF by early browser and server vendors. HTTP 1.1 is a proposed standard being developed by the W3C and the HTTP working group of the IETF. It provides for much more flexible and power communication between the client and the server. It’s also a lot more scalable.

The primary improvement in HTTP 1.1 is state. HTTP 1.0 opens a new connection for every request. In practice, the time taken to open and close all the connections opened in a typical web session can outweigh the time taken to transmit the data, especially for sessions with many small documents. HTTP 1.1 allows a browser to send many different requests over a single connection; the connection remains open until it is explicitly closed. The requests and responses are all asynchronous. A browser doesn’t need to wait for a response that consists of a series of headers, followed by a blank line, followed by MIME-encoded data.

There are a lot of smaller improvements in HTTP 1.1:

  • Requests include a Host MIME header so that one web server can easily serve different sites at different URLs.
  • Servers and browsers can exchange compressed files and particular byte ranges of a document, both of which can decrease network traffic.
  • HTTP 1.1 is designed to work much better with proxy servers
  • HTTP 1.1 is a strict superset of HTTP 1.0, so HTTP 1.1 web servers have no trouble interacting with older browsers that speak only HTTP 1.0.”

Java Networking Programming by O’Reilly

In version HTTP 1.1, the Web server and the client can maintain their connection across Web pages. The NTWeb server known as Internet Information Server can be configured to support this “keep alive” HTTP 1.1 feature.”

Internet Programming with VBScript and JavaScript by Kathleen Kalata

The TCP/IP protocols are used to establish connections between machines, but Berners-Lee also had to develop a set of procedures for identifying the page being requested and returning that page to the user. These procedures are called the Hypertext Transfer Protocols (HTTP), and this is the protocol whose name appears at the beginning of most URLs.

Simple example.

Imagine that you are browsing a Web page and have just clicked on a link whose URL is The following sequence of events will take place to let you access that page:

  1. Your Web browser will determine the URL associated with the link and will extract the name of the machine to which it must connect – in this case,
  2. The browser will use the TCP/IP protocols to establish a connection across the Internet between your computer and
  3. When the connection between these two machines has been established, your browser will send a special HTTP message called GET, which indicates that it wants the destination machine to retrieve a page. The GET command contains the name of the desired page, in this case “faculty.html.”
  4. The remote machine locates the file name in the GET message, reads it, copies it, and returns the copy to your browser, again using TCP/IP and the Internet.
  5. Your browser receives the page and displays its contents on your screen.

Your machine  Internet 

Link

Link = (a) Using TCP/IP to Establish a

Connection to the Destination Machine

GET  Intenet  faculty.html

faculty.html

(b) Sending an HTTP GET

Message to the Destination to Fetch

the Desired Page

Web browser  Internet  computer  faculty.html

faculty.html

(c) Returning a Copy of the Page

to the Requesting Node and Displaying

It Using the Web Browser

An Invitation to Computer Science Second Edition by G. Michael Schneider & Judith L. Gersting

HTTP, the Hypertext Transfer Protocol, is the standard protocol for communication between web browsers and web servers. HTTP specifies how a client and server establish a connection, how the client requests data from the server, how the server responds to that request, and finally how the connection is closed. HTTP connections use the TCP/IP protocol for data transfer.

HTTP 1.0 is the currently accepted version of the protocol. It uses MIME to encode data. The basic protocol defines a sequence of four steps for each request from a client to the server:

  1. Making the connection. The client establishes a TCP connection to the server, on port 80 by default; other ports may be specified in the URL.
  1. Making a request. The client sends a message to the server requesting the page at a specified URL. The format of this request is typically something like:

GET /index.html HTTP 1.0

GET is keyword. /indexl.html is a relative URL to a file on the server. The file is assumed to be on the machine that receives the request, so there is no need to prefix it with . HTTP 1.0 is the version of the protocol that the client understands. The request is terminated with two carriage return/linefeed pairs (\r\n\r\n in Java parlance) regardless of how lines are terminated on the client or sever platform.

Although the GET line is all that is required, a client request can include other information as well. This takes the following form:

Keyword: Value

The most common such keyword is Accept, which tells the server what kinds of data the client can handle (though servers often ignore this). For example, the following line says that the client can handle four MIME types, corresponding to HTML documents, plain text, and JPEG and GIF images.

Accept: text/html, text/plain, image/gif, image/jpeg

User-Agent is another common keyword that lets the server know what browser is being used. This allows the server to send files optimized for the particular browser type. The line below says that the request comes from Version 2.4 of the Lynx browser:

User-Agent: Lynx/2.4 libwww/2.1.4

Finally the request is terminated with a blank line; that is, two carriage return/linefeed pairs, \r\n\r\n. A complete request might look like:

GET /index.html HTTP 1.0

Accept: text/html

Accept: text/plain

User-Agent: Lynx/2.4 libwww/2.1.4

In addition to GET, there are several other request types. HEAD retrieves only the header for the file, not the actual data. This is commonly used to check the modification data of a file, to see whether a copy stored in the local cache is still valid. POST sends form data to the server, and PUT uploads a file to the server.

  1. The response. The server sends a response to the client. The response begins with a response code, followed by MIME header information, then a blank line, then the requested document or an error message. Assuming the requested file is found, a typical response looks like this:

HTTP 1.0 200 OK

Server: NCSA/1.4.2

MIME-version: 1.0

Content-type: text/html

Content-length: 107

<html>

<Head>

<Title>

A Sample HTML file

</Title>

</Head>

<body>

The rest of the document goes here

</body>

</html>

The first line indicates the protocol the server is using (HTTP 1.0), followed by a response code. 200 OK is the most common response code, indicating that the request was successful.

2xx Successful ----- Response codes from 200-299 indicate that the request was received, understood, and accepted.

3xx Redirection ----- Response codes from 300-399 indicate that the web browser needs to go to a different page.

4xx Client Error ----- Response codes from 400-499 indicate that the client has erred in some fashion, though this may as easily be the result of an unreliable network connection as it is of a buggy or nonconforming web browser. The browser should stop sending data to the server as soon as it receives a 4xx response. Unless it is responding to a HEAD request, the server should explain the error status in the body of its response.

5xx Server Error ----- Response codes form 500-599 indicate that something has gone wrong with the server, and the server cannot fix the problem.

The other header lines identify the server software (the NCSA server, version 1.4.2), the version of MIME in use, the MIME content type, and the length of the document delivered (not counting this header) -- in this case, 107 bytes.

  1. Closing the connection. Either client or the server or both close the connection. Thus, a separate network connection is used for each request. If the client reconnects, the server retains no memory of past requests is called stateless; in contrast, a stateful protocol such as FTP can process many requests before the connection is closed. The lack of state is both a strength and a weakness of HTTP.

MIME stands for Multipurpose Internet Mail Extensions. It’s an open standard for sending multipart, multimedia data through Internet email. The data may be binary, or it may use multiple ASCII and non-ASCII character sets. Although MIME was originally intended for email, it has become a widely used technique to describe a file’s contents so that client software can tell the difference between different kinds of data.

For example, a web browser uses MIME to tell whether a file is a GIF image or a printable PostScript file.

MIME supports almost a hundred predefined types of content. Content types are classified at two levels: a type and a subtype. The type shows very generally what kind of data is contained: is it a picture, is it text, is it a movie? The subtype identifies the specific type of data: GIF image, JPEG image, TIFF image.

For example, HTML’s content type is text/html; the type is text, and the subtype is html. The content type for a GIF image is image/gif; the type is image, and the subtype is gif.

The data returned by an HTTP 1.0 or 1.1 web server is sent in MIME format. Most web servers and clients understand at least two MIME text content types, text/html and text/plain, and two image formats, image/gif and image/jpeg. The web uses MIME for posting forms to web servers, a common way for an applet to communicate with a server. Java, relies on MIME types to pick the appropriate content handler for a particular stream of data.

“The HTTP protocol is based on a request/response paradigm. A client establishes a connection with a server and sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content. The server responds with a status line, including the message's protocol version and a success or error code, followed by a MIME-like message containing server information, entity metainformation, and possible body content.

Most HTTP communication is initiated by a user agent and consists of a request to be applied to a resource on some origin server. In the simplest case, this may be accomplished via a single connection (v) between the user agent (UA) and the origin server (O).

request chain ------>

UA ------v------O

<------response chain

A more complicated situation occurs when one or more intermediaries are present in the request/response chain. There are three common forms of intermediary: proxy, gateway, and tunnel. A proxy is a forwarding agent, receiving requests for a URI in its absolute form, rewriting all or parts of the message, and forwarding the reformatted request toward the server identified by the URI. A gateway is a receiving agent, acting as a layer above some other server(s) and, if necessary, translating the requests to the underlying server's protocol. A tunnel acts as a relay point between two connections without changing the messages; tunnels are used when the communication needs to pass through an intermediary (such as a firewall) even when the intermediary cannot understand the contents of the messages.

request chain ------>

UA -----v----- A -----v----- B -----v----- C -----v----- O

<------response chain

The figure above shows three intermediaries (A, B, and C) between the user agent and origin server. A request or response message that travels the whole chain must pass through four separate connections. This distinction is important because some HTTP communication options may apply only to the connection with the nearest, non-tunnel neighbor, only to the end-points of the chain, or to all connections along the chain. Although the diagram is linear, each participant may be engaged in multiple, simultaneous communications. For example, B may be receiving requests from many clients other than A, and/or forwarding requests to servers other than C, at the same time that it is handling A's request.

Any party to the communication which is not acting as a tunnel may employ an internal cache for handling requests. The effect of a cache is that the request/response chain is shortened if one of the participants along the chain has a cached response applicable to that request. The following illustrates the resulting chain if B has a cached copy of an earlier response from O (via C) for a request which has not been cached by UA or A.

request chain ------>

UA -----v----- A -----v----- B ------C ------O

<------response chain

Not all responses are cachable, and some requests may contain modifiers which place special requirements on cache behavior. Some HTTP/1.0 applications use heuristics to describe what is or is not a "cacheable" response, but these rules are not standardized.

On the Internet, HTTP communication generally takes place over TCP/IP connections. The default port is TCP 80, but other ports can be used. This does not preclude HTTP from being implemented on top of any other protocol on the Internet, or on other networks. HTTP only presumes a reliable transport; any protocol that provides such guarantees can be used, and the mapping of the HTTP/1.0 request and response structures onto the transport data units of the protocol in question is outside the scope of this specification.

Except for experimental applications, current practice requires that the connection be established by the client prior to each request and closed by the server after sending the response. Both clients and servers should be aware that either party may close the connection prematurely, due to user action, automated time-out, or program failure, and should handle such closing in a predictable fashion. In any case, the closing of the connection by either or both parties always terminates the current request, regardless of its status.”