0% found this document useful (0 votes)
2 views47 pages

2_URL_HTTP

The document provides an overview of the World Wide Web, distinguishing it from the Internet, and outlines the semantic components including URIs, URLs, and HTTP. It explains the structure of URLs, the differences between URIs, URLs, and URNs, and details the HTTP protocol's request-response model, including various request methods and status codes. Additionally, it discusses the role of cookies and HTTP proxies in web communication.

Uploaded by

anupkumarlal58
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views47 pages

2_URL_HTTP

The document provides an overview of the World Wide Web, distinguishing it from the Internet, and outlines the semantic components including URIs, URLs, and HTTP. It explains the structure of URLs, the differences between URIs, URLs, and URNs, and details the HTTP protocol's request-response model, including various request methods and status codes. Additionally, it discusses the role of cookies and HTTP proxies in web communication.

Uploaded by

anupkumarlal58
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

URL & HTTP

Dr. Alekha Kumar Mishra


Web Definition
● The World Wide Web, or simply the Web, is the
universe of information (documents and pages)
accessible via networked computers
● Internet is different from web. It is a network of
computers, in which a computer may not necessarily
act as a web client or web server.
● internet is simply any network made up of multiple
smaller networks using the same internetworking
protocols. An internet (little "i") isn't necessarily
connected to the Internet (big "I"), nor does it
necessarily use TCP/IP as its internetworking protocol.
2

Dr. Alekha Kumar Mishra


Semantic components of Web
● Three main semantic components of the Web are:
– A naming infrastructure (URI)
– A document language (HTML)
– A message exchange protocol (HTTP)

Dr. Alekha Kumar Mishra


Universal Resource Identifier (URI)
● URI is a universal naming mechanism for identifying resource
on Web, independent of its current location or value.
● URI can be thought of as a pointer to a black box to which
request method can be applied to generate different
responses at different times.
● Request method is a simple operation such as fetching,
changing, or deleting a resource.
– for example in the high level a string such as
– http:// www.foo.com/coolpic.gif is a URI.
● How it is different from an URL??

Dr. Alekha Kumar Mishra


Universal Resouce Locator (URL)
● A uniform resource locator or universal resource locator
(URL) is a specific character string that constitutes a
reference to an Internet resource(e.g., its network
“location”).
● A URL is technically a type of uniform resource identifier
(URI) but in many technical documents and verbal
discussions URL is often used as a synonym for URI.
● URLs occur most commonly to reference web pages
(http), but are also used for file transfer (ftp), email
(mailto), database access (JDBC), and many other
applications

Dr. Alekha Kumar Mishra


URL structure analysis
● scheme://host[:port#]/path/. . ./[;url-params][?query-string][#anchor]

● scheme : this portion of the URL designates the underlying protocol to be used
● host—this is either the name of the IP address for the web server being accessed
● port#—this is an optional portion of the URL designating the port number that the
target web server listens to
● path—logically this is the file system path from the ‘root’ directory of the server to the
desired document.
● url-params—includes optional ‘URL parameters’. Now a days it is used more
frequently, for session identifiers in web servers supporting the Java Servlet API
● query-string—this optional portion of the URL contains other dynamic parameters
associated with the request. Usually, these parameters are produced as the result of
user-entered variables in HTML forms. Equal signs (=) separate the parameters from
their values, and ampersands (&) mark the boundaries between parameter value
pairs.
● anchor—this optional portion of the URL is a reference to a positional marker within
the requested document, like a bookmark. 6

Dr. Alekha Kumar Mishra


URL : an example

Dr. Alekha Kumar Mishra


URL parameters
● Active parameters : This parameters indicate differences
between an original page and a page that contains largely the
same content but has been slightly altered by some function.
This parameter type can show that a page has been affected in
the following ways:
– Sorts : Changes the order of the content. (example,
sort=price_ascending)
– Narrows: Filters the content on the page. (for example, t-shirt_size=XS)
– Specifies: Determines the set of content displayed on a page. (for
example, store=women)
– Translates: Displays a translated version of the content. (for example,
lang=fr)
– Paginates: Displays a specific page of a long listing or article. (for
example, page=2)
– Other:
8

Dr. Alekha Kumar Mishra


URL parameters
● Passive parameters : Passive URL parameters
are often used to track visits and referrers, but
have no affect on the actual content of the page.
– http://www.example.com/products/women/dresses?
sessionid=12345
– http://www.example.com/products/women/dresses?
sessionid=34567&source=google.com
– Some examples of passive parameters include:
sessionid, affiliateid.

Dr. Alekha Kumar Mishra


Universal Resource Name (URN)
● A uniform resource name (URN) is the historical name for a
uniform resource identifier (URI) that uses the urn scheme
● URN does not imply availability of the identified resource.
● Both URNs (names) and URLs (locators) are URIs, and a
particular URI may be a name and a locator at the same time.
● The syntax of a URN is represented in Backus-Naur form
● <URN> ::= "urn:" <NID> ":" <NSS>
● This renders as: urn:<NID>:<NSS>
● <NID> is the namespace identifier, which determines the
syntactic interpretation of <NSS>, the namespace-specific string

10

Dr. Alekha Kumar Mishra


URN Example
● urn:isbn:0451450523 The 1968 book The
Last Unicorn, dentified by its book number.
● urn:isan:0000-0000-9E59-0000-O-0000-0000-2
The 2002 film Spider-Man, identified by its
audiovisual number.
● urn:ISSN:0167-6423 The scientific journal
Science of Computer Programming, identified by
its serial number.
● urn:ietf:rfc:2648 The IETF's RFC 2648.

11

Dr. Alekha Kumar Mishra


URI-URL-URN Comparison
● A URN is similar to a person's name, while a URL is
like a street address.
● The URN defines something's identity, while the URL
provides a location. Essentially, "what" vs. "where".
● A URI is either a URL or a URN.
● URL is a subset of URI. It identifies a resource using
one of the URI schemes.
● URN is a subset of URI. It identifies a resource
independent of its location.

12

Dr. Alekha Kumar Mishra


HTTP

13

Dr. Alekha Kumar Mishra


HTML and HTTP
● HTML: hypertext markup language
– Definitions of tags that are added to Web documents to control their
appearance
● HTTP: hypertext transfer protocol
– The rules governing the conversation between a Web client and a
Web server
– Also called request-response model
– An application level protocol in the TCP/IP protocol suite, using TCP
as the underlying Transport Layer protocol for transmitting messages
● The HTTP protocol used for Web applications was invented by
Tim Berners Lee

14

Dr. Alekha Kumar Mishra


HTTP Conversation
Client Server
• I would like to open a connection

• OK

• GET <file location>

• Send page or error message

• Display response
• Close connection

• OK

15

Dr. Alekha Kumar Mishra


HTTP
● A client/server, Request/Response
architecture
– You request a page
●Say, http://www.msn.com/default.asp
● HTTP Request

– The Web server responds with data in the form of


a Web page
●HTTP response
● Web page is expressed as HTML

– Pages are identified as URLs


16

Dr. Alekha Kumar Mishra


HTTP Transaction
● Client browser finds a machine address from an
internet Domain Name Server (DNS).
● Client and Server open TCP/IP socket connection.
● Server waits for a request.
● Browser sends a verb and an object:
– GET XYZ.HTM or POST form
– If there is an error server can send back an HTML-based explanation.
● Server applies headers to a returned HTML file and
delivers to browser.
● Client and Server close connection.
17

Dr. Alekha Kumar Mishra


HTTP proxies
● HTTP proxies are programs that act as both servers and clients
● HTTP proxies make requests to web servers on behalf of other
clients.

Proxies enable HTTP transfers across firewalls.
● They also provide support for caching of HTTP messages and
filtering of HTTP requests.

They also fill a variety of other interesting roles in complex
environments.

When we refer to HTTP clients, the statements we make are
applicable to browsers, proxies, and other custom HTTP client
programs

18

Dr. Alekha Kumar Mishra


HTTP connection as virtual circuit

19

Dr. Alekha Kumar Mishra


HTTP is Stateless

HTTP is a stateless protocol

Each HTTP request is independent of previous and
subsequent requests

When a protocol supports ‘state’, it provides for the
interaction between client and server to contain a
sequence of commands.

The sequence of transmitted and executed commands
is often called a session.

FTP, SMTP, POP and many more are ‘stateful’
protocols.

HTTP 1.1 introduced keep-alive for efficiency

20

Dr. Alekha Kumar Mishra


Cookies
● A mechanism to store a small amount of
information (up to 4KB) on the client
● A cookie is associated with a specific web site
● Cookie is sent in HTTP header
● Cookie is sent with each HTTP request
● Can last for only one session (until browser is
closed) or can persist across sessions
● Can expire some time in the future

21

Dr. Alekha Kumar Mishra


HTTP Request Message

GET /default.asp HTTP/1.0


Host : www.msn.com
Accept: image/gif, image/x-bitmap, image/jpeg, */*
Accept-Language: en
User-Agent: Mozilla/1.22 (compatible; MSIE 2.0; Windows 95)
Connection: Keep-Alive
If-Modified-Since: Sunday, 17-Apr-96 04:32:58 GMT

22

Dr. Alekha Kumar Mishra


HTTP Response Message

HTTP/1.0 200 OK
Date: Sun, 21 Apr 1996 02:20:42 GMT
Server: Microsoft-Internet-Information-Server/5.0
Connection: keep-alive
Content-Type: text/html
Last-Modified: Thu, 18 Apr 1996 17:39:05 GMT
Content-Length: 2543

<HTML> Some data to display </HTML>


23

Dr. Alekha Kumar Mishra


HTTP Request Methods
● Request methods impose constraints on
message structure, and specifications that
define how servers should process requests,
including the CGI and Java Servlet API
● Most basic ones defined in HTTP/1.1
– GET, HEAD, and POST.
● Less commonly used
– PUT, DELETE, TRACE, OPTIONS and CONNECT

24

Dr. Alekha Kumar Mishra


GET Request Method
● On entering a URL in browser, or click on a hyperlink, the
browser uses the GET method when making the request to
the web server.
● GET method had no body till HTTP 1.1 version
● Consider the following form

25

Dr. Alekha Kumar Mishra


Example : GET Request Method
● The URL constructed by the browser is
http://www.finance.yahoo.com/q?s=YHOO and
the submitted request looks as follows:

26

Dr. Alekha Kumar Mishra


Example : GET Request Method
● The response would look something like this
HTTP/1.0 200 OK
Date: Sat, 03 Feb 2001 22:48:35 GMT
Connection: close
Content-Type: text/html
Set-Cookie: B=9ql5kgct7p2m3&b=2;expires=Thu,15 Apr 2010 20:00:00 GMT;
path=/; domain=.yahoo.com
<HTML>
<HEAD><TITLE>Yahoo! Finance - YHOO</TITLE></HEAD>
40
Birth of the World Wide Web: HTTP
<BODY>
...
</BODY>
</HTML> 27

Dr. Alekha Kumar Mishra


POST Request Method
● POST requests have a body: content that
follows the block of headers, with a blank line
separating the headers from the body.
● Consider the same example, but using POST

28

Dr. Alekha Kumar Mishra


Post Request Example
● http://www.idomail.com/eco/Auth?dom=sitech

29

Dr. Alekha Kumar Mishra


HEAD Request Method
● HEAD method operate similarly to requests
that use the GET method, except that the
server sends back only headers in the
response.
● The body of the request is not transmitted, and
only the response metadata found in the
headers is available to the client
● Why we need HEAD method then??

30

Dr. Alekha Kumar Mishra


Need of HEAD Method
● Suppose we want to look at a page that we
visit regularly in our browser
● So, the browser will have a copy of the page
stored in its cache.
● One a request of the same page, the browser
can determine as to whether it needs to re-
retrieve the page from the server by first
submitting a HEAD request

31

Dr. Alekha Kumar Mishra


Example : HEAD Request Method

● The response to the above request would be

32

Dr. Alekha Kumar Mishra


HTTP Request Method Summary
● GET request-URI HTTP/1.1
– Retrieve entity specified in request-URI as body of response message
● POST request-URI HTTP/1.1
– Sends data in message body to the entity specified in request-URI
● PUT request-URI HTTP/1.1
– Sends entity in message body to become newly created entity specified by request-URI
● HEAD request-URI HTTP/1.1
– Same as GET except the server does not send specified entity in response message
● DELETE request-URI HTTP/1.1
– Request to delete entity specified in request-URI.
● TRACE request-URI HTTP/1.1
– Request for each host node to report back

33

Dr. Alekha Kumar Mishra


Status Codes
200 OK
Classes:
201 Created
202 Accepted 1xx: Informational - not used, reserved for future
204 No Content
301 Moved Permanently 2xx: Success - action was successfully received,
302 Moved Temporarily understood, and accepted
304 Not Modified
400 Bad Request 3xx: Redirection - further action needed to
401 Unauthorized complete request
403 Forbidden
404 Not Found 4xx: Client Error - request contains bad syntax or
500 Internal Server Error cannot be fulfilled
501 Not Implemented
502 Bad Gateway 5xx: Server Error- server failed to fulfill an
503 Service Unavailable apparently valid request

34

Dr. Alekha Kumar Mishra


Information Through Header
● Use of header helps to establish and maintain
sessions, set caching policies, control
authentication, implement business logic etc.
● HTTP Categorization of header
– General Header
– Request Header
– Response Header
– Entity Header

35

Dr. Alekha Kumar Mishra


General Header Example
● Apply to both request and response messages, but do not
describe the body of the message.
● Date: Sun, 11 Feb 2001 22:28:31 GMT
– specifies the time and date that this message was created.
● Connection: Close
– indicates whether or not the client or server that generated the
message intends to keep the connection open.
● Warning: Danger, Will Robinson!
– This header stores text for human consumption, something that
would be useful when tracing a problem.

36

Dr. Alekha Kumar Mishra


Request Headers Example
● User-Agent: Mozilla/4.75 [en] (WinNT; U)
– Identifies the software (e.g. a web browser) responsible for making the request.
● Host: www.neurozen.com
– It is introduced to support virtual hosting
● Referer: http://www.cs.rutgers.edu/∼shklar/index.html
– provides the server with context information about the request.
● Authorization: Basic [encoded-credentials]
– This header is transmitted with requests for resources that are restricted only to
authorized users.
– Browsers will include this header after being notified of an authorization
challenge via a response with a ‘401’ status code.
– They consequently prompt users for their credentials (i.e. userid and password ).

37

Dr. Alekha Kumar Mishra


Response Header Examples
● Location: http://www.mywebsite.com/relocatedPage.html
– specifies a URL towards which the client should redirect its origina request.
– It always accompanies the ‘301’ and ‘302’ status codes that direct clients to try
a new location.
● WWW-Authenticate: Basic realm="KremlinFiles"
– This header accompanies the ‘401’ status code that indicates an authorization
challenge.
– In the case of web browsers, the combination of the ‘401’ status code and the
WWW-Authenticate header causes users to be prompted for ids and
passwords.
● Server: Apache/1.2.5
– This header is not tied to a particular status code. It is an optional header that
identifies the server software.

38

Dr. Alekha Kumar Mishra


Entity Header Examples
● Entity headers describe either message bodies or
target resources.
● Content-Type: mime-type/mime-subtype
– specifies the MIME type of the message body’s content.
● Content-Length: xxx
– optional header provides the length of the message
body.
● Last-Modified: Sun, 11 Feb 2001 22:28:31 GMT
– This header provides the last modification date of the
content that is transmitted in the body of the message
39

Dr. Alekha Kumar Mishra


Caching
● Web servers and server-side applications are in the best
position to judge whether clients should be allowed to
cache their responses
● If the content of a response is relatively static, browsers,
proxies, and other clients may be instructed to cache the
response.
● If the content is highly static, the response can be cached
for an arbitrarily long amount of time.
● If the content has a limited lifetime, then clients may be
instructed to cache the response but only for that limited
period.
● if the content is dynamic, the server can make the decision
that its clients can ‘tolerate’ a cached response for a 40
specified time period.
Dr. Alekha Kumar Mishra
Caching
● There are two headers for establishing caching rules.
● Cache-Control: public, private, and no-cache (HTTP/1.1)
– The public setting removes all restrictions and authorizes both
shared and non-shared caching mechanisms to cache the response.
– The private setting indicates that the response is directed at a single
user and should not be stored in a shared cache.
– The no-cache setting indicates that neither browsers nor proxies are
allowed to cache the response.
● Pragma: no-cache (for HTTP/1.0)
– Pragma, when used with the Cache-Control header, prevent
HTTP/1.0 browsers and proxies from caching the response.

41

Dr. Alekha Kumar Mishra


Appendix
HTTP Message Headers

42

Dr. Alekha Kumar Mishra


Request Message
request line
request methods:
headers DELETE, GET, HEAD, POST, PUT, TRACE

blank line

body

GET /pub/index.html HTTP/1.0


Date: Wed, 20 Mar 2002 10:00:02 GMT
Pragma: no-cache
From: amer@udel.edu
User-Agent: Mozilla/4.03

43

Dr. Alekha Kumar Mishra


Response Message
status line HTTP/1.1 200 OK
Date: Tue, 08 Oct 2002 00:31:35 GMT
Server: Apache/1.3.27 tomcat/1.0
headers Last-Modified: 7Oct2002 23:40:01 GMT
ETag: "20f-6c4b-3da21b51"
Accept-Ranges: bytes
blank line Content-Length: 27723
Keep-Alive: timeout=5, max=300
body Connection: Keep-Alive
Content-Type: text/html

44

Dr. Alekha Kumar Mishra


Headers

Request Line Status Line

General Headers General Headers

Request Headers Response Headers

Entity Headers Entity Headers

A Blank Line A Blank Line

Body Body

45

Dr. Alekha Kumar Mishra


Headers
General Headers Request Headers
Date Cache Control Authorization Accept
Pragma Connection From Accept-Charset
Trailer If-Modified-Since Accept-Encoding
Transfer-Encoding Referer Accept
User-Agent Language
Upgrade
Expect
Via Host
Warning
If-Match
If-None-Match
If-Range
If-Unmodified-Since
Max-Forwards
Proxy-Authorization
Range
TE

Headers present in HTTP/1.0 & HTTP/1.1

New Headers added in HTTP/1.1 46

Dr. Alekha Kumar Mishra


Headers

Response Headers Entity Headers

Location Accept-Ranges Allow Content-Language


Age Content-Encoding Content-Location
Server Content-Length Content-MD5
ETag
WWW-Authenticate Proxy-Authenticate Content-Type Content-Range
Retry-After Expires
Vary Last-Modified
extension-header

Headers present in HTTP/1.0 & HTTP/1.1

New Headers added in HTTP/1.1

47

Dr. Alekha Kumar Mishra

You might also like