12
HTTP • Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol permits transfer of hypertext documents • the request is usually generated by clicking on a hyperlink in a browser – server responds to the request and sends back requested document PC running Explorer Server running Apache Web server Mac running Navigator HTTP request HTTP request HTTP response HTTP response Different types of machines request the same resource Apache, is just one of many web servers

HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol

Embed Size (px)

Citation preview

Page 1: HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol

HTTP• Here, we examine the

hypertext transfer protocol (http)– originally introduced

around 1990 but not standardized until 1997 (version 1.0)

– protocol permits transfer of hypertext documents

• the request is usually generated by clicking on a hyperlink in a browser

– server responds to the request and sends back requested document

PC runningExplorer

Server running

Apache Webserver

Mac runningNavigator

HTTP request

HTTP request

HTTP response

HTTP response

Different types of machines can request the same resource

Apache, is just one of many web servers

Page 2: HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol

Some Definitions• Client – the machine requesting a resource

– often through a web browser• Server – the machine that responds to requests and

transfers documents to fulfill the requests– usually a dedicated machine running some server software

• Request – message that contains an HTTP method (we will cover these shortly) sent from client to server

• Response –document/file requested, along with a message – or the message alone if the document/file does not exist or

the request was ill-formed or not understood• Header – both request and response are placed into

headers – headers are usually not visible to the user– header requests start with the method (e.g., GET), the

resource requested and the protocol/version– we explore headers in more detail in a few slides

Page 3: HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol

HTTP Methods• The method is the action that the client wishes the server to

perform– GET – request a resource, to be displayed in the web browser

(if possible, else save to disk)– Conditional GET includes

• If-Modified-Since – comes with a specified date, server returns the requested item if it has been modified since that date

• If-Unmodified-Since• If-Match – comes with a condition tested by the server that if true

causes the server to return the resource• If-None-Match• If-Range – return the resource if it falls within a given range

– For example: • GET /index.html HTTP/1.1 • If-Modified-Since: Mon, 11 Jan 2010 12:30:15 GMT

– GET is the most common method– Conditional GETs are used to prevent the server from taking

time or Internet usage when it may not be necessary/desired

Page 4: HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol

Other Methods• HEAD – return the header portion only, not the actual page • PUT – used to upload a page (or content) – must be sent

with the content to be uploaded– can only be used if either the user has been authenticated or the

server does not require authentication (this would be a security flaw if PUT is allowed without authentication)

• POST – same as PUT except that POST appends to a file – this can be used to place data into a bulletin/posting board or

database

• OPTIONS – queries the web server to find out what methods are available for use

• DELETE – used to delete the specified resource• TRACE – used for troubleshooting (trace the route)• CONNECT – used in conjunction with a proxy server

Page 5: HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol

Headers• The header is a portion of the message transmitted

– if a request, the header is the request– if a response, it precedes the resource being returned

• Request headers will include– the method, resource location, protocol and version– host name– user agent (browser) if sent by browser, including version of

browser and preferred language (e.g., English)– what form(s) of encoding is preferred– how long the request should remain active

• Response headers will include– protocol and version, status of request (see next slide)– date/time– server name– last modification date/time– content-type

Page 6: HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol

More on Headers• Four classes of headers

– general headers consist of four parts• Connection indicates whether the TCP connection should close at the

end of the request or response or be persistent (the default)• Date (date/time of when the message was sent)• Transfer-encoding (what if any type of encoding has been applied)• Warning – status code

– request headers are sent when a browser makes a request of a server and may contain the following

• Accept – what types of media are acceptable by the client, provided in MIME format, e.g., text/html, image/png, etc

• Accept-Charset – what character sets are acceptable• Accept-Encoding – what types of encoding *can* be applied• Accept-Language – what language(s) is(are) preferred• From and Host specifiers• Conditions – if-match, if-modified-since, if-range, if-unmodified-since,

range• User-Agent – the type of browser

Page 7: HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol

Continued• Response headers – sent by the server to the requester

(which may be a proxy server, a web browser, another program (e.g., web crawler) or a command via nc or curl for instance) and may contain– Accept-Ranges (if the request had a range header)– ETags – an identifier generated from the file’s inode– Server – information about the server (web server software

and version, platform)• Entity headers – may be sent in response to a document

being sent via post, put, etc– Allow – lists set of methods available for the server– Content-Encoding, Content-Language, Content-Length,

Content-Range, Content-Type – information about the document being sent

– Last-Modified – if the item being sent already existed, last modification information about it

Page 8: HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol

Examples GET / HTTP/1.1 Host: www.alcpress.com User-agent: Mozilla/5.0... Accept */* Accept-Language: en Accept-Encoding: gzip,deflate,compress,identity Keep-Alive: 300 Connection: keep-alive

HTTP/1.1 200 OK Date: Tue, 07 Aug 2001 23:06:18 GMT Server: Apache 1.3.20 Cache-Control: max-age=604800 Expires: Tue, 14 Aug 2001 23:06:18 GMT Last-Modified: Tue, 06 Feb 2001 20:16:28 GMT Etag: 1033e-607-3a7fd5d0 Acept-Ranges: bytes Content-Length: 2357 Keep-Alive: timeout=15, max=100 Connection: keep-alve Content-Type: text/html

[data]

ExampleGET header

Exampleresponsefrom a GETrequest

Page 9: HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol

Status Codes• See Appendix A for the complete list

– 100 codes – informational• 100 – continue, 101 – switching protocols

– 200 codes – success• 200 – request succeeded, 201 – resource created, 202 –

command accepted, 204 – request succeeded but no content sent back, 205 – reset content

– 300 codes – redirection (URL redirected to a different resource)

• 300 – multiple choices, 301 – resource permanently moved, 302 – resource temporarily moved, 305 – use proxy

– 400 codes – client error codes• 400 – bad request, 401 – unauthorized, 402- payment

required, 403 – forbidden, 404 – not found, 405 – method not allowed, 406 – not accepted, 408 – timeout, 410 – gone

– 500 codes – server error codes• 500 – internal server error, 501 – not implemented, 503 –

service unavailable, 504 – gateway timeout

Page 10: HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol

URLs• The URL is the specification of the resource

– [protocol:]//host[:port][path/file][?query]• protocol is typically http but could be https or ftp or other• port defaults to 80 but can be overridden, for instance if the client

knows that a different port should be used to fulfill the given request

• path specifies where to look in the web server’s document space, servers may have defaults if the file is omitted (e.g., index.html, index.php, index.cgi)

• query is used to specify a given location within a file (e.g., a database record)

• URI is a more genetic form of identifier used in the semantic web (the book will use URL & URI interchangeably)– URLs consist only of letters, digits, $, -, _, ., +, !, *, ’, ()– URLs may be case sensitive (true for Linux/Unix servers,

not necessarily true for Windows servers)

Page 11: HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol

Negotiation• In some cases, a request does not precisely match a

resource in which case negotiation may take place– Language negotiation – if a file exists in multiple

languages and the client has specified a preference, the server will respond with the document that fits the most preferred language if possible

• Accept-Language: de, en-us;q=0.7,en;q=0.3– request German first, and if not available, then American English and

finally non-American English

– Content negotiation – preference of types by placing types in prioritized list of MIME types

• Accept: image/png,image/jpg;q=0.8,image/gif;q=0.5

– Content coding – lists what type(s) of encoding can be used to help reduce the message traffic over the Internet

• These may include gzip (or x-gzip), compress (or x-compress), deflate and identity (no encoding)

Page 12: HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol

Other Topics• Caching – to reduce Internet traffic, caching can take

place in three different locations– web browser (client) caching– server caching– proxy caching

• we cover proxy caching in chapter 11

• Cookies – HTTP is a stateless form of communication – you cannot store what is currently going on in the communication– a cookie is a file that stores the state (e.g., passwords,

preferred pages, contents of shopping carts)– since cookie information is meant to be transmitted to a

server, they can represent security holes – what if a cookie is set up by server1 but server2 asks for that information? Cookies can also violate privacy