Upload
owen-garry-harmon
View
212
Download
0
Embed Size (px)
Citation preview
1
Lecture #7-8HTTP – HyperText Transfer
Protocol
HAIT
Summer 2005
Shimrit Tzur-David
2
Common Protocols
• In order for two remote machines to “understand” each other they should – ‘‘speak the same language’’– coordinate their ‘‘talk’’
• The solution is to use protocols• Examples:
– FTP – File Transfer Protocol– SMTP – Simple-Mail Transfer Protocol– HTTP – HyperText Transfer Protocol
3
File System
Proxy Server
Web Server
HTTPRequest
HTTPRequest
HTTP Response
HTTPResponse
www.cs.huji.ac.il:80
http://www.cs.huji.ac.il/~dbihttp://www.cs.huji.ac.il/~dbi
4
DepartmentProxy Server
UniversityProxy Server
IsraelProxy Server
Web Server www.w3.org:80
5
Terminology
• User agent: client which initiates a request (browser, editor, Web robot, …)
• Origin server: the server on which a given resource resides (Web server a.k.a. HTTP server)
• Proxy: acts as both a server and a client
6
Resources
• A resource is a chunk of information that can be identified by a URL (Universal Resource Locator)
• A resource can be– A file– A dynamically created page
• What we see on the browser can be a combination of some resources
7
Universal Resource Locator
• There are other types of URL’s– mailto:<account@site>– news:<newsgroup-name>
protocol://host:port/path#anchor?parameters
http://www.cs.huji.ac.il/~dbi/index.html#info
http://www.google.com/search?hl=en&q=blabla
protocol://host:port/path#anchor?parametersprotocol://host:port/path#anchor?parametersprotocol://host:port/path#anchor?parametersprotocol://host:port/path#anchor?parametersprotocol://host:port/path#anchor?parameters
8
In a URL
• Spaces are represented by “+”• Characters such as &,+,% are encoded in the
form “%xx” where xx is the ascii value in hexadecimal; For example, “&” = “%26”
• The inputs to the parameters are given as a list of pairs of a parameter and a value:
var1=value1&var2=value2&var3=value3
9
war&peace Tolstoy
10
http://www.google.com/search?hl=en&q=war%26peace+Tolstoy
11
Web Servers
• A Web Server is an implementation of HTTP– It runs on some machine
• Serving dynamic Web content requires some server-side programming
• Programmer must understand HTTP and code must manipulate HTTP messages
12
Important Features of HTTP
• Persistent connection (in HTTP 1.1)
• Stateless
• Proxy caching
• Content negotiation– For example, the client and server can agree on a
gzip encoding of the HTML page
13
An HTTP 1.0 Session
• A basic HTTP session has four phases:1.Client opens the connection (a TCP connection)
2.Client makes a request
3.Server sends a response
4.Server closes the connection
14
Nesting in PageIndex.html
Left frame Right frame
Jumping fish Fairy icon HUJI icon
What we see on the browser can be a combination of several resources
What we see on the browser can be a combination of several resources
15
Persistent Connectionsin HTTP 1.1
• If a page has 10 inline images, then 11 HTTP 1.0 sessions are needed to display the page completely in a browser– Each session requires opening a new
TCP/IP connection
• In HTTP 1.1, one persistent TCP/IP connection is sufficient– It takes less time to see the whole page
16
Stateless Protocol
• HTTP is a stateless protocol– Once a server has delivered the requested data to a
client, the server retains no memory of what has just taken place (even if the connection is persistent)
• Server-side programming tools must provide a mechanism for maintaining states
17
The Format of HTTPRequests and Responses
• An initial line– In a request, the first line is a method– In a response, the first line is a status code
• Zero or more header lines
• A blank line, and
• An optional message body (e.g., a file, query data, or query output)
18
Headers
• HTTP 1.0 defines 16 headers– None are required
• HTTP 1.1 defines 46 headers– One header (Host:) is required in requests that are
sent to Web servers– A response does not have to include any header
How do we know who is the host when there is no host header?
19
Sending a Request
> telnet www.cs.huji.ac.il 80>GET /~dbi/index.html HTTP/1.0
[blank line]
20
The Response
HTTP/1.1 200 OKDate: Sun, 11 Mar 2001 21:42:15 GMTServer: Apache/1.3.9 (Unix)Last-Modified: Sun, 25 Feb 2001 21:42:15 GMTContent-Length: 479Content-Type: text/html
<html> (html code …)</html>
21
GET /~dbi/index.html HTTP/1.0
HTTP/1.1 200 OK
HTML code
22
GET /~dbi/no-such-page.html HTTP/1.0
HTTP/1.1 404 Not FoundHTML code
23
GET /index.html HTTP/1.1
HTTP/1.1 400 Bad Request
HTML code
Why is it a Bad Request?
HTTP/1.1 without Host Header
24
HTTP Requests
25
The Format of a RequestThe Format of a Request
method sp URL sp versionheader
cr lf: value cr lf
header : value cr lfcr lf
Entity Body
headerslines
26
Request Example
GET /index.html HTTP/1.1 [CRLF]
Accept: image/gif, image/jpeg [CRLF]
User-Agent: Mozilla/4.0 [CRLF]
Host: www.cs.huji.ac.il:80 [CRLF]
Connection: Keep-Alive [CRLF]
[CRLF]
27
Request Example
GET /index.html HTTP/1.1
User-Agent: Mozilla/4.0
Host: www.cs.huji.ac.il:80
Connection: Keep-Alive
[blank line here]
methodrequest URL
version
headers
28
Common Request Methods
• GET returns the contents of the indicated URL
• HEAD returns the header information for the indicated URL– Useful for finding out info about a URL
without actually retrieving it (less time)
• POST treats the URL as an application and send some data to it– Could be used to process a form
29
GET Request
• A request to get a resource from the Web
• The most frequently used method
• The request has no message body, but parameters can be sent in the request URL
30
HEAD Request
• A HEAD request asks the server to return the response headers only, and not the actual resource (i.e., no message body)
• This is useful for checking characteristics of a resource without actually downloading it, thus saving bandwidth
• Can be used for testing hypertext links for validity, accessibility and recent modification
31
Post Request
• POST request can send data to the server
• POST is mostly used in form-filling– The data filled into the form are translated by the
browser into some special format and sent to a program on the server using the POST command
32
Post Request (cont.)
• There is a block of data sent with the request, in the message body
• There are usually extra headers to describe this message body, like Content-Type: and Content-Length:
• The request URL is a URL of a program to handle the sent data, not a file
• The HTTP response is normally the output of a program, not a static file
33
Post Example
• Here's a typical form submission, using POST: POST /path/register.cgi HTTP/1.0
From: [email protected]
User-Agent: HTTPTool/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 35
home=Ross+109&favorite+flavor=flies
34
HTTP 1.1 Request Headers
• The common request headers of HTTP 1.1 are described in the following slides– Accept– Accept-Encoding– Authorization– Connection– Cookie– Host– If-Modified-Since– Referer– User-Agent
35
Accept Request Headers
• Accept– Specifies the MIME types that the client can
handle (e.g., text/html, image/gif)– Server can send different content to different
clients
• Accept-Encoding– Indicates encodings (e.g., gzip) client can handle
36
Authorization Request Header
• Authorization– User identification for password-protected pages– Instead of HTTP authorization, use HTML forms
to send username/password and store in state (e.g., session object )
37
Connection Request Header
• Connection– Connection: keep-alive means that the
browser can handle persistent connection– Keep-alive is the default in HTTP 1.1– In a persistent connection, the server can reuse the
same socket over again for requests that are very close together from the same client
– Connection: close means that the connection is closed after each request
38
Content-Length Request Header
• This header is only applicable to POST requests
• It specifies the size of the POST data in bytes
39
Cookie Request Header
• Gives cookies previously sent to the client
40
Host Request Header
• Indicates host and port as given in the original URL– Required in HTTP 1.1
41
If-Modified-Since Request Header
• This header indicates that client wants the page only if it has been changed after the specified data
• If-Unmodified-Since is the reverse of If-Modified-Since– It is used for PUT requests (“update this
document only if nobody else has changed it since I generated it”)
42
Referer Request Header
• URL of referring Web page
• Useful for tracking traffic
• It is logged by many servers
• Can be easily spoofed
• Note the spelling error – correct spelling is Referrer, but use Referer
43
User-Agent Request Header
• The value of this header is a string identifying the browser making the request
• Use sparingly
• Again, can be easily spoofed
44
HTTP Responses
45
The Format of a ResponseThe Format of a Response
version spstatus codesp phraseheader
cr lf: value cr lf
header : value cr lfcr lf
Entity Body
headerslines
statusline
46
The Initial Line of a Response
• The initial line of a response is also called the status line
• The initial line consists of– HTTP version– response status code– reason phrase that describes the status code
47
HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354
<html> <body> <h1>Hello World</h1> (more file contents) . . . </body> </html>
Response Example
48
HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354
<html> <body> <h1>Hello World</h1> (more file contents) . . . </body> </html>
Response Exampleversion
message body
headers
reason phrasestatus code
49
Status Codes in Responses
• The status code is a three-digit integer, and the first digit identifies the general category of response: – 1xx indicates an informational message – 2xx indicates success of some kind – 3xx redirects the client to another URL– 4xx indicates an error on the client's part
• Yes, the system blames it on the client if a resource is not found (i.e., 404)
– 5xx indicates an error on the server's part
50
Status Codes 1xx
• The 100 (Continue) Status– Allows a client to determine if the Server is willing
to accept the request (based on the request headers) before the client sends the request body
– The client’s request must have the header
Expect: 100 (Continue)
What is it good for?
51
Status Codes 2xx
Status codes 2xx – Success
• The action was successfully received, understood, and accepted
• Usually upon success a status code 200 and a message OK are sent
• This is the default
52
More 2xx Codes
• 201 (Created)– Location header gives the URL
• 202 (Accepted)– Processing is not yet complete
• 204 (No Content)– Browser should keep displaying previous
document
53
More 2xx Codes
• 205 (Reset Content)– No new document, but the browser should reset
the document view– It is used to force browsers to clear fields of forms– New in HTTP 1.1
54
Status Codes 3xx
Status codes 3xx – Redirection
• Further action must be taken in order to complete the request
• The client is redirected to get the resource from another URL
55
More 3xx Codes
• 301 – Moved Permanently– The new URL is given in the Location header
– Browsers should automatically follow the link to the new URL
• 302 – Moved Temporarily – In HTTP 1.1 “Found” instead of “Moved Temporarily”
• But “Moved Temporarily” is still used
– Similar to 301, except that the URL given in the Location header is temporary
– Most browsers treat 301 and 302 in the same way
56
More 3xx Codes
• 303 – See Other– Similar to 301 and 302, except that if the original
request was POST, the new document (given in the Location header) should be retrieved with GET
– New in HTTP 1.1
57
More 3xx Codes
304 – Not Modified– This is a response to the If-Modified-Since request
header– If the page has been modified, then it should be
returned with a 200 (OK) status code
58
More 3xx Codes
307 – Temporary Redirect– New URL is given in the Location header– Only GET but not POST requests should follow
the new URL– In 303 (See Other), both GET and POST requests
follow the new URL– New in HTTP 1.1
59
Status Codes 4xx
Status codes 4xx – Client error
• The request contains bad syntax or cannot be fulfilled
404 File not found
60
4xx Codes
• 400 – Bad Request– Syntax error in the request
• 401 – Unauthorized• 403 – Forbidden
– “permission denied” to the server to access the page
• 404 – Not Found
61
Status Codes 5xx
Status codes 5xx – Server error
• The server failed to fulfill an apparently valid request
For example,502 Bad gateway
62
5xx Codes
• 500 – Internal Server Error• 501 – Not Implemented• 502 – Bad Gateway• 503 – Service Unavailable
– The response may include a Retry-After header to indicate when the client might try again
• 505 – HTTP Version Not Supported– New in HTTP 1.1
63
Response Headers
64
The Purposes of Response Headers
• Give forwarding location• Specify cookies• Supply the page modification date• Instruct the browser to reload the page after a
designated interval• Give the document size so that persistent (keep-alive)
connection can be used• Designate the type of document being generated• Etc.
65
Cache-Control (1.1) and Pragma (1.0) Response Header
• A no-cache value prevents proxies and browsers from caching the page
• More on this header later, when we will talk about caching
• Don’t use the Pragma header in responses – The meaning of “Pragma: no-cache” is only specified for
requests
• A safer approach is to use both the Pragma header and the Cache-Control header with the no-cache value
66
Connection Response Header
• A value of close instructs the client not to use persistent HTTP connections
• In HTTP 1.1, persistent connections are the default
67
Content-Length Response Header
• It specifies the number of bytes in the response
• It is needed only if a persistent (keep-alive) connection is used
68
Content-Type Response Header
• It gives the MIME (Multipurpose Internet Mail Extension) type of the response document
• MIME types are of the form:– maintype/subtype for officially registered types
– maintype/x-subtype for unregistered types
• Examples: text/html, image/jpeg, application/x-gzip
69
Expires Response Header
• It gives the time at which the document should be considered out-of-date and thus should no longer be cached
• It can be used, for example, if the document is valid only for a short time
70
Last-ModifiedResponse Header
• This header gives the time when the document was last changed
• The date that is given in the Last-Modified response header can be used in later requests in the If-Modified-Since request header
71
Location Response Header
• This header should be included in all responses that have a 3xx status code
• The browser automatically retrieves the document from the new location that is given as the value of this header
72
Refresh Response Header
• The number of seconds until the browser should reload the page
• Can also include the URL of a document that should be loaded (instead of the original document)
• This header is not part of HTTP 1.1 but is an extension supported by Netscape and Internet Explorer
73
Retry-After Response Header
• This header can be used in conjunction with a 503 (Service Unavailable) response to tell the client how soon it can repeat its request
74
Set-Cookie Response Header
• This header specifies a cookie associated with the page; it has several fields:
• Each cookie requires a separate header• Servlets should use the special-purpose addCookie
method of HttpServletRepsonse instead of setting the value of this header directly
• This header is not part of HTTP 1.1 but is widely supported
Set-Cookie: name=value; expires= value; path= value; domain= value; secure
75
WWW-Authenticate Response Header
• This header is always included with a 401 (Unauthorized) status code
• It gives the authentication scheme(s) and parameters applicable to the URL that was requested
76
Server Response Header
• Indicates the name of the vendor of the HTTP server