In Introduction To HTTP Basics
In Introduction To HTTP Basics
In Introduction To HTTP Basics
InIntroductiontoHTTPBasics
yet another insignificant programming notes... | HOME
Introduction
The WEB
Internet or The Web is a massive distributed client/server information system as depicted in the following diagram.
Many applications are running concurrently over the Web, such as web browsing/surfing, email, file transfer, audio & video streaming, and so on. In order
for proper communication to take place between the client and the server, these applications must agree on a specific applicationlevel protocol such as
HTTP, FTP, SMTP, POP, and etc.
HTTP is a stateless protocol. In other words, the current request does not know what has been done in the previous requests.
HTTP permits negotiating of data type and representation, so as to allow systems to be built independently of the data being transferred.
Quoting from the RFC2616: "The Hypertext Transfer Protocol HTTP is an applicationlevel protocol for distributed, collaborative, hypermedia
information systems. It is a generic, stateless, protocol which can be used for many tasks beyond its use for hypertext, such as name servers and
distributed object management systems, through extension of its request methods, error codes and headers."
Browser
Whenever you issue a URL from your browser to get a web resource using HTTP, e.g. http://www.test101.com/index.html, the browser turns the URL
into a request message and sends it to the HTTP server. The HTTP server interprets the request message, and returns you an appropriate response message,
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
1/21
11/13/2015
InIntroductiontoHTTPBasics
which is either the resource you requested or an error message. This process is illustrated below:
HTTP Protocol
As mentioned, whenever you enter a URL in the address box of the browser, the browser translates the URL into a request message according to the
specified protocol; and sends the request message to the server.
For example, the browser translated the URL http://www.test101.com/doc/index.html into the following request message:
GET/docs/index.htmlHTTP/1.1
Host:www.test101.com
Accept:image/gif,image/jpeg,*/*
AcceptLanguage:enus
AcceptEncoding:gzip,deflate
UserAgent:Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1)
(blankline)
When this request message reaches the server, the server can take either one of these actions:
1. The server interprets the request received, maps the request into a file under the server's document directory, and returns the file requested to the
client.
2. The server interprets the request received, maps the request into a program kept in the server, executes the program, and returns the output of the
program to the client.
3. The request cannot be satisfied, the server returns an error message.
An example of the HTTP response message is as shown:
HTTP/1.1200OK
Date:Sun,18Oct200908:56:53GMT
Server:Apache/2.2.14(Win32)
LastModified:Sat,20Nov200407:16:26GMT
ETag:"10000000565a52c3e94b66c2e680"
AcceptRanges:bytes
ContentLength:44
Connection:close
ContentType:text/html
XPad:avoidbrowserbug
<html><body><h1>Itworks!</h1></body></html>
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
2/21
11/13/2015
InIntroductiontoHTTPBasics
The browser receives the response message, interprets the message and displays the contents of the message on the browser's window according to the
media type of the response as in the ContentType response header. Common media type include "text/plain", "text/html", "image/gif",
"image/jpeg", "audio/mpeg", "video/mpeg", "application/msword", and "application/pdf".
In its idling state, an HTTP server does nothing but listening to the IP addresses and ports specified in the configuration for incoming request. When a
request arrives, the server analyzes the message header, applies rules specified in the configuration, and takes the appropriate action. The webmaster's main
control over the action of web server is via the configuration, which will be dealt with in greater details in the later sections.
TCP/IP Transmission Control Protocol/Internet Protocol is a set of transport and networklayer protocols for machines to communicate with each other
over the network.
IP Internet Protocol is a networklayer protocol, deals with network addressing and routing. In an IP network, each machine is assigned an unique IP
address e.g., 165.1.2.3, and the IP software is responsible for routing a message from the source IP to the destination IP. In IPv4 IP version 4, the IP
address consists of 4 bytes, each ranges from 0 to 255, separated by dots, which is called a quaddotted form. This numbering scheme supports up to 4G
addresses on the network. The latest IPv6 IP version 6 supports more addresses. Since memorizing number is difficult for most of the people, an english
like domain name, such as www.test101.com is used instead. The DNS Domain Name Service translates the domain name into the IP address via
distributed lookup tables. A special IP address 127.0.0.1 always refers to your own machine. It's domian name is "localhost" and can be used for local
loopback testing.
TCP Transmission Control Protocol is a transportlayer protocol, responsible for establish a connection between two machines. TCP consists of 2 protocols:
TCP and UDP User Datagram Package. TCP is reliable, each packet has a sequence number, and an acknowledgement is expected. A packet will be re
transmitted if it is not received by the receiver. Packet delivery is guaranteed in TCP. UDP does not guarantee packet delivery, and is therefore not reliable.
However, UDP has less network overhead and can be used for applications such as video and audio streaming, where reliability is not critical.
TCP multiplexes applications within an IP machine. For each IP machine, TCP supports multiplexes up to 65536 ports or sockets, from port number 0 to
65535. An application, such as HTTP or FTP, runs or listens at a particular port number for incoming requests. Port 0 to 1023 are preassigned to popular
protocols, e.g., HTTP at 80, FTP at 21, Telnet at 23, SMTP at 25, NNTP at 119, and DNS at 53. Port 1024 and above are available to the users.
Although TCP port 80 is preassigned to HTTP, as the default HTTP port number, this does not prohibit you from running an HTTP server at other user
assigned port number 102465535 such as 8000, 8080, especially for test server. You could also run multiple HTTP servers in the same machine on different
port numbers. When a client issues a URL without explicitly stating the port number, e.g., http://www.test101.com/docs/index.html, the browser will
connect to the default port number 80 of the host www.test101.com. You need to explicitly specify the port number in the URL, e.g.
http://www.test101.com:8000/docs/index.html if the server is listening at port 8000 and not the default port 80.
In brief, to communicate over TCP/IP, you need to know a IP address or hostname, b Port number.
HTTP Specifications
The HTTP specification is maintained by W3C Worldwide Web Consortium and available at http://www.w3.org/standards/techs/http. There are currently
two versions of HTTP, namely, HTTP/1.0 and HTTP/1.1. The original version, HTTP/0.9 1991, written by Tim BernersLee, is a simple protocol for
transferring raw data across the Internet. HTTP/1.0 1996 defined in RFC 1945, improved the protocol by allowing MIMElike messages. HTTP/1.0 does
not address the issues of proxies, caching, persistent connection, virtual hosts, and range download. These features were provided in HTTP/1.1 1999
defined in RFC 2616.
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
3/21
11/13/2015
InIntroductiontoHTTPBasics
Request Line
The first line of the header is called the request line, followed by optional request headers.
The request line has the following syntax:
requestmethodnamerequestURIHTTPversion
requestmethodname: HTTP protocol defines a set of request methods, e.g., GET, POST, HEAD, and OPTIONS. The client can use one of these methods
to send a request to the server.
requestURI: specifies the resource requested.
HTTPversion: Two versions are currently in use: HTTP/1.0 and HTTP/1.1.
Examples of request line are:
GET/test.htmlHTTP/1.1
HEAD/query.htmlHTTP/1.0
POST/index.htmlHTTP/1.1
Request Headers
The request headers are in the form of name:value pairs. Multiple values, separated by commas, can be specified.
requestheadername:requestheadervalue1,requestheadervalue2,...
Example
The following shows a sample HTTP request message:
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
4/21
11/13/2015
InIntroductiontoHTTPBasics
Status Line
The first line is called the status line, followed by optional response headers.
The status line has the following syntax:
HTTPversionstatuscodereasonphrase
HTTPversion: The HTTP version used in this session. Either HTTP/1.0 and HTTP/1.1.
statuscode: a 3digit number generated by the server to reflect the outcome of the request.
reasonphrase: gives a short explanation to the status code.
Common status code and reason phrase are "200 OK", "404 Not Found", "403 Forbidden", "500 Internal Server Error".
Examples of status line are:
HTTP/1.1200OK
HTTP/1.0404NotFound
HTTP/1.1403Forbidden
Response Headers
The response headers are in the form name:value pairs:
responseheadername:responseheadervalue1,responseheadervalue2,...
Example
The following shows a sample response message:
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
5/21
11/13/2015
InIntroductiontoHTTPBasics
Telnet
"Telnet" is a very useful networking utility. You can use telnet to establish a TCP connection with a server; and issue raw HTTP requests. For example,
suppose that you have started your HTTP server in the localhost IP address 127.0.0.1 at port 8000:
>telnet
telnet>help
...telnethelpmenu...
telnet>open127.0.0.18000
ConnectingTo127.0.0.1...
GET/index.htmlHTTP/1.0
(Hitentertwicetosendtheterminatingblankline...)
...HTTPresponsemessage...
Telnet is a characterbased protocol.Each character you enter on the telnet client will be sent to the server immediately. Therefore, you cannot make typo
error in entering you raw command, as delete and backspace will be sent to the server.You may have to enable "local echo" option to see the characters
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
6/21
11/13/2015
InIntroductiontoHTTPBasics
you enter. Check the telnet manual search Windows' help for details on using telnet.
Network Program
You could also write your own network program to issue raw HTTP request to an HTTP server. You network program shall first establish a TCP/IP connection
with the server. Once the TCP connection is established, you can issue the raw request.
An example of network program written in Java is as shown assuming that the HTTP server is running on the localhost IP address 127.0.0.1 at port 8000:
importjava.net.*;
importjava.io.*;
publicclassHttpClient{
publicstaticvoidmain(String[]args)throwsIOException{
//Thehostandporttobeconnected.
Stringhost="127.0.0.1";
intport=8000;
//CreateaTCPsocketandconnecttothehost:port.
Socketsocket=newSocket(host,port);
//Createtheinputandoutputstreamsforthenetworksocket.
BufferedReaderin
=newBufferedReader(
newInputStreamReader(socket.getInputStream()));
PrintWriterout
=newPrintWriter(socket.getOutputStream(),true);
//SendrequesttotheHTTPserver.
out.println("GET/index.htmlHTTP/1.0");
out.println();//blanklineseparatingheader&body
out.flush();
//Readtheresponseanddisplayonconsole.
Stringline;
//readLine()returnsnullifserverclosethenetworksocket.
while((line=in.readLine())!=null){
System.out.println(line);
}
//ClosetheI/Ostreams.
in.close();
out.close();
}
}
<html><body><h1>Itworks!</h1></body></html>
Connectiontohostlost.
In this example, the client issues a GET request to ask for a document named "/index.html"; and negotiates to use HTTP/1.0 protocol. A blank line is
needed after the request header. This request message does not contain a body.
The server receives the request message, interprets and maps the requestURI to a document under its document directory. If the requested document is
available, the server returns the document with a response status code "200 OK". The response headers provide the necessary description of the document
returned, such as the lastmodified date LastModified, the MIME type ContentType, and the length of the document ContentLength. The
response body contains the requested document. The browser will format and display the document according to its media type e.g., Plaintext, HTML,
JPEG, GIF, and etc. and other information obtained from the response headers.
Notes:
The request method name "GET" is case sensitive, and must be in uppercase.
If the request method name was incorrectly spelt, the server would return an error message "501 Method Not Implemented".
If the request method name is not allowed, the server will return an error message "405 Method Not Allowed". E.g., DELETE is a valid method name, but
may not be allowed or implemented by the server.
If the requestURI does not exist, the server will return an error message "404 Not Found". You have to issue a proper requestURI, beginning from the
document root "/". Otherwise, the server would return an error message "400 Bad Request".
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
7/21
11/13/2015
InIntroductiontoHTTPBasics
If the HTTPversion is missing or incorrect, the server will return an error message "400 Bad Request".
In HTTP/1.0, by default, the server closes the TCP connection after the response is delivered. If you use telnet to connect to the server, the message
"Connection to host lost" appears immediately after the response body is received. You could use an optional request header "Connection:Keep
Alive" to request for a persistent or keepalive connection, so that another request can be sent through the same TCP connection to achieve better
network efficiency. On the other hand, HTTP/1.1 uses keepalive connection as default.
<!DOCTYPEHTMLPUBLIC"//IETF//DTDHTML2.0//EN">
<html><head>
<title>501MethodNotImplemented</title>
</head><body>
<h1>MethodNotImplemented</h1>
<p>getto/index.htmlnotsupported.<br/>
</p>
</body></html>
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
8/21
11/13/2015
InIntroductiontoHTTPBasics
<!DOCTYPEHTMLPUBLIC"//IETF//DTDHTML2.0//EN">
<html><head>
<title>404NotFound</title>
</head><body>
<h1>NotFound</h1>
<p>TherequestedURL/t.htmlwasnotfoundonthisserver.</p>
</body></html>
Note: The latest Apache 2.2.14 ignores this error and returns the document with status code "200 OK".
<!DOCTYPEHTMLPUBLIC"//IETF//DTDHTML2.0//EN">
<html><head>
<title>400BadRequest</title>
</head><body>
<h1>BadRequest</h1>
<p>Yourbrowsersentarequestthatthisservercouldnotunderstand.<br/>
</p>
</body></html>
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
9/21
11/13/2015
InIntroductiontoHTTPBasics
HTTP/1.1200OK
Date:Sun,18Oct200910:47:06GMT
Server:Apache/2.2.14(Win32)
LastModified:Sat,20Nov200407:16:26GMT
ETag:"10000000565a52c3e94b66c2e680"
AcceptRanges:bytes
ContentLength:44
KeepAlive:timeout=5,max=100
Connection:KeepAlive
ContentType:text/html
<html><body><h1>Itworks!</h1></body></html>
Notes:
The message "Connection to host lost" for telnet appears after "keepalive" timeout
Before the "Connection to host lost" message appears i.e., Keepalive timeout, you can send another request through the same TCP connection.
The header "Connection:Keepalive" is not case sensitive. The space is optional.
If an optional header is misspelled or invalid, it will be ignored by the server.
<!DOCTYPEHTMLPUBLIC"//IETF//DTDHTML2.0//EN">
<html><head>
<title>403Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>Youdon'thavepermissiontoaccess/forbidden/index.html
onthisserver.</p>
</body></html>
<html><body><h1>Itworks!</h1></body></html>
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
10/21
11/13/2015
InIntroductiontoHTTPBasics
Request".
GET/index.htmlHTTP/1.1
(blankline)
HTTP/1.1400BadRequest
Date:Sun,18Oct200912:13:46GMT
Server:Apache/2.2.14(Win32)
ContentLength:226
Connection:close
ContentType:text/html;charset=iso88591
<!DOCTYPEHTMLPUBLIC"//IETF//DTDHTML2.0//EN">
<html><head>
<title>400BadRequest</title>
</head><body>
<h1>BadRequest</h1>
<p>Yourbrowsersentarequestthatthisservercouldnotunderstand.<br/>
</p>
</body></html>
Request Headers
This section describes some of the commonlyused request headers. Refer to HTTP Specification for more details. The syntax of header name is words with
initialcap joined using dash , e.g., ContentLength, IfModifiedSince.
Host:domainname HTTP/1.1 supports virtual hosts. Multiple DNS names e.g., www.test101.com and www.test102.com can reside on the same physical
server, with their own document root directories. Host header is mandatory in HTTP/1.1 to select one of the hosts.
The following headers can be used for content negotiation by the client to ask the server to deliver the preferred type of the document in terms of the
media type, e.g. JPEG vs. GIF, or language used e.g. English vs. French if the server maintain multiple versions for the same document.
Accept:mimetype1,mimetype2,... The client can use the Accept header to tell the server the MIME types it can handle and it prefers. If the
server has multiple versions of the document requested e.g., an image in GIF and PNG, or a document in TXT and PDF, it can check this header to decide
which version to deliver to the client. E.g., PNG is more advanced more GIF, but not all browser supports PNG. This process is called contenttype
negotiation.
AcceptLanguage:language1,language2,... The client can use the AcceptLanguage header to tell the server what languages it can handle or
it prefers. If the server has multiple versions of the requested document e.g., in English, Chinese, French, it can check this header to decide which version to
return. This process is called language negotiation.
AcceptCharset:Charset1,Charset2,... For character set negotiation, the client can use this header to tell the server which character sets it can
handle or it prefers. Examples of character sets are ISO88591, ISO88592, ISO88595, BIG5, UCS2, UCS4, UTF8.
AcceptEncoding:encodingmethod1,encodingmethod2,... The client can use this header to tell the server the type of encoding it supports. If
the server has encoded or compressed version of the document requested, it can return an encoded version supported by the client. The server can also
choose to encode the document before returning to the client to reduce the transmission time. The server must set the response header "Content
Encoding" to inform the client that the returned document is encoded. The common encoding methods are "xgzip .gz, .tgz" and "xcompress .Z".
Connection:Close|KeepAlive The client can use this header to tell the server whether to close the connection after this request, or to keep the
connection alive for another request. HTTP/1.1 uses persistent keepalive connection by default. HTTP/1.0 closes the connection by default.
Referer:refererURL The client can use this header to indicate the referrer of this request. If you click a link from web page 1 to visit web page 2, web
page 1 is the referrer for request to web page 2. All major browsers set this header, which can be used to track where the request comes from for web
advertising, or content customization. Nonetheless, this header is not reliable and can be easily spoofed. Note that Referrer is misspelled as "Referer"
unfortunately, you have to follow too.
UserAgent: browsertype Identify the type of browser used to make the request. Server can use this information to return different document
depending on the type of browsers.
ContentLength:numberofbytes Used by POST request, to inform the server the length of the request body.
ContentType:mimetype Used by POST request, to inform the server the media type of the request body.
CacheControl:nocache|... The client can use this header to specify how the pages are to be cached by proxy server. "nocache" requires proxy to
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
11/21
11/13/2015
InIntroductiontoHTTPBasics
obtain a fresh copy from the original server, even though a local cached copy is available. HTTP/1.0 server does not recognize "CacheControl: no
cache". Instead, it uses "Pragma:nocache". Included both request headers if you are not sure about the servers version.
Authorization: Used by the client to supply its credential username/password to access protected resources. This header will be described in later
chapter on authentication.
Cookie:cookiename1=cookievalue1,cookiename2=cookievalue2,... The client uses this header to return the cookies back to the
server, which was set by this server earlier for state management. This header will be discussed in later chapter on state management.
IfModifiedSince:date Tell the server to send the page only if it has been modified after the specific date.
<!DOCTYPEHTMLPUBLIC"//IETF//DTDHTML2.0//EN">
<html><head>
<title>301MovedPermanently</title>
</head><body>
<h1>MovedPermanently</h1>
<p>Thedocumenthasmoved<ahref="http://127.0.0.1:8000/testdir/">here</a>.</p>
</body></html>
Most of the browser will follow up with another request to "/testdir/". For example, If you issue http://127.0.0.1:8000/testdir without the trailing
"/" from a browser, you could notice that a trailing "/" was added to the address after the response was given. The morale of the story is: you should include
the "/" for directory request to save you an additional GET request.
ed
<!DOCTYPEHTMLPUBLIC"//IETF//DTDHTML2.0//EN">
<HTML><HEAD>
<TITLE>302Found</TITLE>
</HEAD><BODY>
<H1>Found</H1>
Thedocumenthasmoved
<AHREF="http://www.amazon.com:80/exec/obidos/subst/home/home.html">
here</A>.<P>
</BODY></HTML>
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
12/21
11/13/2015
InIntroductiontoHTTPBasics
Example
HEAD/index.htmlHTTP/1.0
(blankline)
HTTP/1.1200OK
Date:Sun,18Oct200914:09:16GMT
Server:Apache/2.2.14(Win32)
LastModified:Sat,20Nov200407:16:26GMT
ETag:"10000000565a52c3e94b66c2e680"
AcceptRanges:bytes
ContentLength:44
Connection:close
ContentType:text/html
XPad:avoidbrowserbug
Notice that the response consists of the header only without the body, which contains the actual document.
"*" can be used in place of a requestURI to indicate that the request does not apply to any particular resource.
Example
For example, the following OPTIONS request is sent through a proxy server:
OPTIONShttp://www.amazon.com/HTTP/1.1
Host:www.amazon.com
Connection:Close
(blankline)
HTTP/1.1200OK
Date:Fri,27Feb200409:42:46GMT
ContentLength:0
Connection:close
Server:Stronghold/2.4.2Apache/1.3.6C2NetEU/2412(Unix)
Allow:GET,HEAD,POST,OPTIONS,TRACE
Connection:close
Via:1.1xproxy(NetCacheNetApp/5.3.1R4D5)
(blankline)
All servers that allow GET request will allow HEAD request. Sometimes, HEAD is not listed.
Example
The following example shows a TRACE request issued through a proxy server.
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
13/21
11/13/2015
InIntroductiontoHTTPBasics
TRACEhttp://www.amazon.com/HTTP/1.1
Host:www.amazon.com
Connection:Close
(blankline)
HTTP/1.1200OK
TransferEncoding:chunked
Date:Fri,27Feb200409:44:21GMT
ContentType:message/http
Connection:close
Server:Stronghold/2.4.2Apache/1.3.6C2NetEU/2412(Unix)
Connection:close
Via:1.1xproxy(NetCacheNetApp/5.3.1R4D5)
9d
TRACE/HTTP/1.1
Connection:keepalive
Host:www.amazon.com
Via:1.1xproxy(NetCacheNetApp/5.3.1R4D5)
XForwardedFor:155.69.185.59,155.69.5.234
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
14/21
11/13/2015
InIntroductiontoHTTPBasics
Special characters are not allowed inside the query string. They must be replaced by a "%" followed by the ASCII code in Hex. E.g., "~" is replaced by "%7E",
"#" by "%23" and so on. Since blank is rather common, it can be replaced by either "%20" or "+" the "+" character must be replaced by "%2B". This
replacement process is called URLencoding, and the result is a URLencoded query string. For example, suppose that there are 3 fields inside a form, with
name/value of "name=Peter Lee", "address=#123 Happy Ave" and "language=C++", the URLencoded query string is:
name=Peter+Lee&address=%23123+Happy+Ave&Language=C%2B%2B
The query string can be sent to the server using either HTTP GET or POST request method, which is specified in the <form>'s attribute "method".
<formmethod="get|post"action="url">
If GET request method is used, the URLencoded query string will be appended behind the requestURI after a "?" character, i.e.,
GETrequestURI?querystringHTTPversion
(otheroptionalrequestheaders)
(blankline)
(optionalrequestbody)
Using GET request to send the query string has the following drawbacks:
The amount of data you could append behind requestURI is limited. If this amount exceed a serverspecific threshold, the server would return an error
"414 Request URI too Large".
The URLencoded query string would appear on the address box of the browser.
POST method overcomes these drawbacks. If POST request method is used, the query string will be sent in the body of the request message, where the
amount is not limited. The request headers ContentType and ContentLength are used to notify the server the type and the length of the query string.
The query string will not appear on the browsers address box. POST method will be discussed later.
Example
The following HTML form is used to gather the username and password in a login menu.
<html>
<head><title>Login</title></head>
<body>
<h2>LOGIN</h2>
<formmethod="get"action="/bin/login">
Username:<inputtype="text"name="user"size="25"/><br/>
Password:<inputtype="password"name="pw"size="10"/><br/><br/>
<inputtype="hidden"name="action"value="login"/>
<inputtype="submit"value="SEND"/>
</form>
</body>
</html>
The HTTP GET request method is used to send the query string. Suppose the user enters "Peter Lee" as the username, "123456" as password; and clicks the
submit button. The following GET request is:
GET/bin/login?user=Peter+Lee&pw=123456&action=loginHTTP/1.1
Accept:image/gif,image/jpeg,*/*
Referer:http://127.0.0.1:8000/login.html
AcceptLanguage:enus
AcceptEncoding:gzip,deflate
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
15/21
11/13/2015
InIntroductiontoHTTPBasics
UserAgent:Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1)
Host:127.0.0.1:8000
Connection:KeepAlive
Note that although the password that you enter does not show on the screen, it is shown clearly in the address box of the browser. You should never use
send your password without proper encryption.
http://127.0.0.1:8000/bin/login?user=Peter+Lee&pw=123456&action=login
Encoded URL
URL cannot contain special characters, such as blank or '~'. Special characters are encoded, in the form of %xx, where xx is the ASCII hex code. For example,
'~' is encoded as %7e; '+' is encoded as %2b. A blank can be encoded as %20 or '+'. The URL after encoding is called encoded URL.
The request parameters, in the form of name=value pairs, are separated from the URL by a '?'. The name=value pairs are separated by a '&'.
The #nameAnchor identifies a fragment within the HTML document, defined via the anchor tag <aname="anchorName">...</a>.
URL rewriting for session management, e.g., "...;sessionID=xxxxxx".
(URLencodedquerystring)
Request headers ContentType and ContentLength is necessary in the POST request to inform the server the media type and the length of the request
body.
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
16/21
11/13/2015
InIntroductiontoHTTPBasics
<h2>LOGIN</h2>
<formmethod="post"action="/bin/login">
Username:<inputtype="text"name="user"size="25"/><br/>
Password:<inputtype="password"name="pw"size="10"/><br/><br/>
<inputtype="hidden"name="action"value="login"/>
<inputtype="submit"value="SEND"/>
</form>
</body>
</html>
Suppose the user enters "Peter Lee" as username and "123456" as password, and clicks the submit button, the following POST request would be generated
by the browser:
POST/bin/loginHTTP/1.1
Host:127.0.0.1:8000
Accept:image/gif,image/jpeg,*/*
Referer:http://127.0.0.1:8000/login.html
AcceptLanguage:enus
ContentType:application/xwwwformurlencoded
AcceptEncoding:gzip,deflate
UserAgent:Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1)
ContentLength:37
Connection:KeepAlive
CacheControl:nocache
User=Peter+Lee&pw=123456&action=login
Note that the ContentType header informs the server the data is URLencoded with a special MIME type application/xwwwformurlencoded, and
the ContentLength header tells the server how many bytes to read from the message body.
Example
The following HTML form can be used for file upload:
<html>
<head><title>FileUpload</title></head>
<body>
<h2>UploadFile</h2>
<formmethod="post"enctype="multipart/formdata"action="servlet/UploadServlet">
Whoareyou:<inputtype="text"name="username"/><br/>
Choosethefiletoupload:
<inputtype="file"name="fileID"/><br/>
<inputtype="submit"value="SEND"/>
</form>
</body>
</html>
When the browser encountered an <input> tag with attribute type="file", it displays a text box and a "browse..." button, to allow user to choose the file
to be uploaded.
When the user clicks the submit button, the browser send the form data and the content of the selected files. The old encoding type "application/x
wwwformurlencoded" is inefficient for sending binary data and nonASCII characters. A new media type "multipart/formdata" is used instead.
Each part identifies the input name within the original HTML form, and the content type if the media is known, or as application/octetstream
otherwise.
The original local file name could be supplied as a "filename" parameter, or in the "ContentDisposition:formdata" header.
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
17/21
11/13/2015
InIntroductiontoHTTPBasics
7d41b838504d8ContentDisposition:formdata;name="username"
PeterLee
7d41b838504d8ContentDisposition:formdata;name="fileID";filename="C:\temp.html"ContentType:text/plain
<h1>Homepageonmainserver</h1>
7d41b838504d8
Servlet 3.0 provides builtin support for processing file upload. Read "Uploading Files in Servlet 3.0".
Content Negotiation
As mention earlier, HTTP support content negotiation between the client and the server. A client can use additional request headers such as Accept,
AcceptLanguage, AcceptCharset, AcceptEncoding to tell the server what it can handle or which content it prefers. If the server possesses multiple
versions of the same document in different format, it will return the format that the client prefers. This process is called content negotiation.
ContentType Negotiation
The server uses a MIME configuration file called "conf\mime.types" to map the file extension to a media type, so that it can ascertain the media type of
the file by looking at its file extension. For example, file extensions ".htm", ".html" are associated with MIME media type "text/html", file extension of
".jpg", ".jpeg" are associated with "image/jpeg". When a file is returned to the client, the server has to put up a ContentType response header to inform
the client the media type of the data.
For contenttype negotiation, suppose that the client requests for a file call "logo" without specifying its type, and sends an header "Accept:image/gif,
image/jpeg,...". If the server has 2 formats of the "logo": "logo.gif" and "logo.jpg", and the MIME configuration file have the following entries:
image/gifgif
image/jpegjpegjpgjpe
The server will return "logo.gif" to the client, based on the client Accept header, and the MIME type/file mapping. The server will include a "Content
type:image/gif" header in its response.
The message trace is shown:
GET/logoHTTP/1.1
Accept:image/gif,image/xxbitmap,image/jpeg,image/pjpeg,
application/xshockwaveflash,application/vnd.msexcel,
application/vnd.mspowerpoint,application/msword,*/*
AcceptLanguage:enus
AcceptEncoding:gzip,deflate
UserAgent:Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1)
Host:test101:8080
Connection:KeepAlive
(blankline)
HTTP/1.1200OK
Date:Sun,29Feb200401:42:22GMT
Server:Apache/1.3.29(Win32)
ContentLocation:logo.gif
Vary:negotiate,accept
TCN:choice
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
18/21
11/13/2015
InIntroductiontoHTTPBasics
LastModified:Wed,21Feb199619:45:52GMT
ETag:"0916312b7670;404142de"
AcceptRanges:bytes
ContentLength:2326
KeepAlive:timeout=15,max=100
Connection:KeepAlive
ContentType:image/gif
(blankline)
(bodyomitted)
However, if the server has 3 "logo.*" files, "logo.gif", "logo.html", "logo.jpg", and "Accept:*/*" was used:
GET/logoHTTP/1.1
Accept:*/*
AcceptLanguage:enus
AcceptEncoding:gzip,deflate
UserAgent:Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1)
Host:test101:8080
Connection:KeepAlive
(blankline)
HTTP/1.1200OK
Date:Sun,29Feb200401:48:16GMT
Server:Apache/1.3.29(Win32)
ContentLocation:logo.html
Vary:negotiate,accept
TCN:choice
LastModified:Fri,20Feb200404:31:17GMT
ETag:"01040358d95;404144c1"
AcceptRanges:bytes
ContentLength:16
KeepAlive:timeout=15,max=100
Connection:KeepAlive
ContentType:text/html
(blankline)
(bodyomitted)
Accept:*/*
The AddType directive can be used to include additional MIME type mapping in the configuration file:
AddTypemimetypeextension1[extension2]
The DefaultType directive gives the MIME type of an unknown file extension in the ContentType response header
DefaultTypetext/plain
Suppose that the client requests for "index.html" and send an "AcceptLanguage: enus". If the server has "test.html", "test.html.en" and
"test.html.cn", based on the clients preference, "test.html.en" will be returned. "en" includes "enus".
A message trace is as follows:
GET/index.htmlHTTP/1.1
Accept:*/*
AcceptLanguage:enus
AcceptEncoding:gzip,deflate
UserAgent:Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1)
Host:test101:8080
Connection:KeepAlive
(blankline)
HTTP/1.1200OK
Date:Sun,29Feb200402:08:29GMT
Server:Apache/1.3.29(Win32)
ContentLocation:index.html.en
Vary:negotiate
TCN:choice
LastModified:Sun,29Feb200402:07:45GMT
ETag:"01340414971;40414964"
AcceptRanges:bytes
ContentLength:19
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
19/21
11/13/2015
InIntroductiontoHTTPBasics
KeepAlive:timeout=15,max=100
Connection:KeepAlive
ContentType:text/html
ContentLanguage:en
(blankline)
(bodyomitted)
The AddLanguage directive is needed to associate a language code with a file extension, similar to MIME type/file mapping.
Note that "OptionsAll" directive does not include "MultiViews" option. That is, you have to explicitly turn on MultiViews.
The directive LanguagePriority can be used to specify the language preference in case of a tie during content negotiation or if the client does not express
a preference. For example:
<IfModulemod_negotiation.c>
LanguagePriorityendanletfrdeelitjakrnoplptptbr
</IfModule>
The commonly encountered character sets include: ISO88591 LatinI, ISO88592, ISO88595, BIG5 Chinese Traditional, GB2312 Chinese Simplified,
UCS2 2byte Unicode, UCS4 4byte Unicode, UTF8 Encoded Unicode, and etc.
Similarly, the AddCharset directive is used to associate the file extension with the character set. For example:
AddCharsetISO88598.iso88598
AddCharsetISO2022JP.jis
AddCharsetBig5.Big5.big5
AddCharsetWINDOWS1251.cp1251
AddCharsetCP866.cp866
AddCharsetISO88595.isoru
AddCharsetKOI8R.koi8r
AddCharsetUCS2.ucs2
AddCharsetUCS4.ucs4
AddCharsetUTF8.utf8
Encoding Negotiation
A client can use the AcceptEncoding header to tell the server the type of encoding it supports. The common encoding schemes are: "xgzip (.gz,
.tgz)" and "xcompress(.Z)".
AcceptEncoding:encodingmethod1,encodingmethod2,...
Similarly, the AddEncoding directive is used to associate the file extension with the an encoding scheme. For example:
AddEncodingxcompress.Z
AddEncodingxgzip.gz.tgz
The MaxKeepAliveRequests directive sets the maximum number of requests that can be sent through a persistent connection. You can set to 0 to allow
unlimited number of requests. It is recommended to set to a high number for better performance and network efficiency.
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
20/21
11/13/2015
InIntroductiontoHTTPBasics
MaxKeepAliveRequests200
The KeepAliveTimeOut directive set the time out in seconds for a persistent connection to wait for the next request.
KeepAliveTimeout10
Range Download
AcceptRanges:bytes
TransferEncoding:chunked
Under Construction
Cache Control
The client can send a request header "Cachecontrol:nocache" to tell the proxy to get a fresh copy from the original server, even thought there is a
local cached copy. Unfortunately, HTTP/1.0 server does not understand this header, but uses an older request header "Pragma: nocache". You could
include both headers in your request.
Pragma:nocache
CacheControl:nocache
https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
21/21