28
HTTP Web Technologies [email protected] t

Web technologies: HTTP

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Web technologies: HTTP

HTTP

Web [email protected]

Page 2: Web technologies: HTTP

HTTP

• HyperText Transfer Protocol• Application level protocol for the exchange of hypertext

document• Standardizes

– Resource names (URL)– requests– responses

• Versions: HTTP/0.9, 1.0, 1.1• Ref: Tim Berners Lee, Request for Comment 1945,

HTTP/1.0– http://www.w3.org/Protocols/rfc1945/rfc1945

Page 3: Web technologies: HTTP

HTTP as a client server system• Client

– An application program that establishes connections for the purpose of sending requests.

• Server – An application program that accepts connections in order to service

requests by sending back responses • User agent

– The client which initiates a request. These are often browsers, editors, spiders (web-traversing robots), or other end user tools

• Origin server– The server on which a given resource resides or is to be created

• Resource– A network data object or service which can be identified by a URI

Page 4: Web technologies: HTTP

The HTTP browser

• Sends HTTP requests to a server• Receives and interprets responses• Visualizes resources• Timeline

http://meyerweb.com/eric/browsers/timeline-structured.html

Page 5: Web technologies: HTTP

Browser features

• Version of the document description languages supported (HTML, CSS)

• Native programming language support (Javascript)

• Extension mechanisms– Plug-in interface

• Content viewers (e.g., Adobe Acrobat for PDF, Microsoft Silverlight, Apple Quicktime)

• Programming language interpreters (e.g., Java)

Page 6: Web technologies: HTTP

The HTTP server• Functionality

– Network access with HTTP for handling requests

– Access to resources in secondary storage

– Delivery of HTTP responses– Access control– Server-side program execution– Logging– Monitoring and administration– Virtual hosting– URL mapping– Connection to application

servers

Page 7: Web technologies: HTTP

HTTP server vs application server

ClientWeb

serverApplication

server

Database (with pooled connections)

App.

Servers

Applications

Page 8: Web technologies: HTTP

Example

Page 9: Web technologies: HTTP

HTTP limitations

• HTTP is stateless– Every HTTP request-response cycle is independent– No data are preserved between two connections

of the same client or of different clients– HTTP is thus sessionless– HTTP 1.0 also closes the TCP connection between

the client and the server host at each roundtrip (fixed in HTTP 1.1)

Page 10: Web technologies: HTTP

Application server features

• The application server can be stateful (e.g. a residential process)

• It can preserve the user’s session across multiple request-response cycles

• Can preserve session data• Can handle shared resources (e.g, pool of database

connections) • Can be optimized (multi-threading, multi-processing,

multi-host distribution)• Can be multi-protocol (e.g., Corba IIOP, COM/DCOM)

Page 11: Web technologies: HTTP

HTTP Proxy

• An intermediary program which acts as both a server and a client for the purpose of making requests on behalf of other clients.

• Main usage:– Access control (inbound,

outbound)– Resource caching

Page 12: Web technologies: HTTP

HTTP Gateway• A server which acts as an

intermediary for some other server. Unlike a proxy, a gateway receives requests as if it were the origin server for the requested resource; the requesting client may not be aware that it is communicating with a gateway.

• Usage – protocol translators for access

to resources stored on non-HTTP systems.

Page 13: Web technologies: HTTP

Uniform Resource Locator (URL)• Structured string

– http_URL = "http:" "//" host [ ":" port ] [ abs_path ]

– http://www.elet.polimi.it:8080/people/fraterna.html• Protocol: http, but also ftp, file• Host address:

– symbolic: www.elet.polimi.it– numeric (IP): 131.175.21.1

• Can include port number (e.g. :8080)• Path: directory sequence• Resource name: file id

– If resource is an html file, can include an internal fragment address (e.g. fraterna.html#curriculum)

• More on the URL when introducing dynamic Web resources

Page 14: Web technologies: HTTP

HTTP request• full-request :- request-line

*(general-header | request-header |

entity-header) CRLF [entity-body]

• request-line :- method SP URL SP version CRLF

• method :- GET | POST | HEAD | others..

• Example of request-line:GET /pub/papers/pap101.html HTTP/1.0

Page 15: Web technologies: HTTP

HTTP Response

• full-response :- status-line*(general-header |

request-header | entity-header)

CRLF [entity-body]• status-line :- version SP status SP message CRLF• status: Codici di stato:

1XX (informative), 2XX (success),3XX (redirection), 4XX(client error), 5XX (server error)

• Example: HTTP 404 - File not found

Page 16: Web technologies: HTTP

Headersentity-header = Allow | Content-Encoding | Content-Language | Content-Length | Content-Location | Content-MD5 | Content-Range | Content-Type | Expires | Last-Modified

general-header = Cache-Control | Connection

| Date | Pragma | Trailer | Transfer-Encoding

| Upgrade | Via | Warning

Page 17: Web technologies: HTTP

Headersrequest-header = Accept

| Accept-Charset | Accept-Encoding | Accept-Language | Authorization

| Expect | From

| Host | If-Match | If-Modified-Since | If-None-Match

| If-Range | If-Unmodified-Since

| Max-Forwards | Proxy-Authorization | Range | Referer | TE

| User-Agent

response-header = Accept-Ranges | Age

| ETag | Location | Proxy-Authenticate | Retry-After | Server | Vary | WWW-Authenticate

Quick reference to HTTP headershttp://www.cs.tut.fi/~jkorpela/http.html

Test for the headers sent by the browserhttp://www.tipjar.com/cgi-bin/test

Page 18: Web technologies: HTTP

HTTP headers in a request (examples)Field name Description Example

Accept Content-Types that are acceptable Accept: text/plain

Accept-Charset Character sets that are acceptable Accept-Charset: utf-8

Accept-Encoding Acceptable encodings. See HTTP compression. Accept-Encoding: gzip, deflate

Accept-Language Acceptable human languages for response Accept-Language: en-US

Authorization Authentication credentials for HTTP authentication Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

Cache-ControlUsed to specify directives that MUST be obeyed by all caching

mechanisms along the request/response chainCache-Control: no-cache

Connection What type of connection the user-agent would prefer Connection: keep-alive

Cookiean HTTP cookie previously sent by the server with Set-

Cookie (below)Cookie: $Version=1; Skin=new;

Content-Length The length of the request body in octets (8-bit bytes) Content-Length: 348

Content-MD5A Base64-encoded binary MD5 sum of the content of the request

bodyContent-MD5: Q2hlY2sgSW50ZWdyaXR5IQ==

Content-TypeThe MIME type of the body of the request (used with POST and

PUT requests)Content-Type: application/x-www-form-urlencoded

Date The date and time that the message was sent Date: Tue, 15 Nov 1994 08:12:31 GMT

ExpectIndicates that particular server behaviors are required by the

clientExpect: 100-continue

User-Agent The user agent string of the user agentUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0)

Gecko/20100101 Firefox/12.0

....

Page 19: Web technologies: HTTP

HTTP headers in a response (examples)

Field name Description Example

Accept-Ranges What partial content range types this server supports Accept-Ranges: bytes

Age The age the object has been in a proxy cache in seconds Age: 12

Cache-ControlTells all caching mechanisms from server to client whether they

may cache this object. It is measured in secondsCache-Control: max-age=3600

Connection Options that are desired for the connection[21] Connection: close

Content-Encoding The type of encoding used on the data. See HTTP compression. Content-Encoding: gzip

Content-Language The language the content is in Content-Language: da

Content-Length The length of the response body in octets (8-bit bytes) Content-Length: 348

Content-Location An alternate location for the returned data Content-Location: /index.htm

Content-MD5A Base64-encoded binary MD5 sum of the content of the

responseContent-MD5: Q2hlY2sgSW50ZWdyaXR5IQ==

Content-Range Where in a full body message this partial message belongs Content-Range: bytes 21010-47021/47022

Content-Type The MIME type of this content Content-Type: text/html; charset=utf-8

Date The date and time that the message was sent Date: Tue, 15 Nov 1994 08:12:31 GMT

Expires Gives the date/time after which the response is considered stale Expires: Thu, 01 Dec 1994 16:00:00 GMT

Last-ModifiedThe last modified date for the requested object, in RFC 2822

formatLast-Modified: Tue, 15 Nov 1994 12:45:26 GMT

Page 20: Web technologies: HTTP

HTTP security• Resources are pooled in domains at the server (called realms)• Realms can be protected• HTTP request for protected resource must provide authorization header

– Credentials transmitted in clear, base64-encoded• If credentials are wrong server sends response with status code 401

(unauthorized) + (authenticate) header, which causes the dialog for inputting credential to appear

Page 21: Web technologies: HTTP

HTTP 1.1

• Calendar– Jan 1997: HTTP/1.1 becomes Proposed Standard (RFC

2068) – June 1999 Improvements and updates under RFC 2616 in– Main innovations

• Tunnels• Chunked encoding• Multi-request connections• Content negotiation• Advanced cache management• New methods (OPTIONS, PUT, DELETE, TRACE, CONNECT,

extension-method)

Page 22: Web technologies: HTTP

Tunnels• Tunnel = An intermediary

program which is acting as a blind relay between two connections.

• A tunnel is not a party to the HTTP communication, though the tunnel may have been initiated by an HTTP request. It does not change the messages;

• Tunnels are used when the communication needs to pass through an intermediary (such as a firewall) even when the intermediary cannot understand the contents of the messages.

Page 23: Web technologies: HTTP

Chuncked transfer encoding

Behavior• A data transfer mechanism in which

data is sent in blocks called "chunks“• It uses the Transfer-Encoding header

in place of the Content-Length header, the sender does not need to know the length of the content before it starts transmitting a response to the receiver. (useful for dynamically-generated content).

• Size is sent before the chunk so that the receiver can tell when it has finished receiving data for that chunk.

• Data transfer is terminated by a final chunk of length zero.

Benefits• Allows a server to maintain

an HTTP persistent connection for dynamically generated content

• Allows the sender to send header fields after the message body, in cases where values cannot be known until the content has been produced (e.g., digital signature)

Page 24: Web technologies: HTTP

Persistent connection

Behavior• HTTP 1.0 required opening a new

connection for every single request/response pair

• Connection: Keep-Alive header used in HTTP 1.0 to avoid dropping the connection.

• When the client sends another request, it uses the same connection. This will continue until either the client or the server decides that the conversation is over, and one of them drops the connection.

• In HTTP 1.1 all connections are persistent, unless otherwise specified

Benefits• Less CPU and memory usage

(because fewer connections are open simultaneously)

• Enables HTTP pipelining of requests and responses

• Reduced network congestion (fewer TCP connections)

• Reduced latency in subsequent requests (no handshaking)

• Errors can be reported without the penalty of closing the TCP connection

Page 25: Web technologies: HTTP

Content negotiation

Behavior• Server driven: the request

contains headers (e.g., accept-encoding) and the server pick the corresponding version (client must include header in each request)

• Agent driven: the response contains the URIs of the alternative versions (Alternates) and client chooses (requires 2 requests)

• Trasparent: managed by the proxy cache

Benefits• makes it possible to serve

different versions of resource at the same URI, so that user agents can obtain the version that fits their capabilities the best

Page 26: Web technologies: HTTP

Cache management

• Goal: minimaze network traffic and bandwidth usage

• Mechanism: storing a duplicate of the resource in a location closer to the client and serving that in response to a request

• Semantic transparency: – the client must be unaware of the cache– Warning must be given to the client if the duplicate

may be disaligned wrt to the original resource

Page 27: Web technologies: HTTP

Cache operations

• Expiration– The server can declare the validity in time of a resource (Cache-

Control and Expires header)– Requires computing the age of a resource (in the Age header)

in presence of time zones and differences, multiple responses• Validation

– The cache can control the validity of the expired copy, (e.g., based on Date and Last-Modified time, or on explicit entity tags, i.e., version control numbers)

– Requires conditional requests and validation headers– May produce the Warning general-header, when the response

contains a possibly stale entity

Page 28: Web technologies: HTTP

References

• HTTP1.0: Tim Berners Lee, Request for Comment 1945, HTTP1.0

• HTTP1.1: Internet Draft <draft-ietf-http-v11-spec-rev-06> (November 18, 1998) http://www.w3.org/Protocols/History.html#HTTP11

• HTTP Status codes: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

• HTTP Intro: http://jmarshall.com/easy/http/• Web info: http://www.webopedia.com