98
DBI DBI Representation and Management of Data on the Internet

DBI Representation and Management of Data on the Internet

Embed Size (px)

Citation preview

Page 1: DBI Representation and Management of Data on the Internet

DBIDBI

Representation and Management of Data on the Internet

Page 2: DBI Representation and Management of Data on the Internet

HTTPHTTP

HyperText Transfer Protocol

Page 3: DBI Representation and Management of Data on the Internet

In the BeginningIn the Beginning……The Internet

FTP –File Transfer Protocol

SMTP –Simple Mail Transfer Protocol

NNTP –Network-News Transfer Protocol

HTTP –HyperText Transfer Protocol

Let there be a Web

Tim Berners-Lee

Page 4: DBI Representation and Management of Data on the Internet

The Creation of the WebThe Creation of the Web

• Tim Berners-Lee implemented the HTTP protocol in 1990-1 at CERN, the European Center for High-Energy Physics in Geneva, Switzerland.

• The World-Wide Web is based upon – Information representation in HTML (HyperText

Markup Language) documents– Resources Transmission in HTTP (HyperText

Transfer Protocol)

Page 5: DBI Representation and Management of Data on the Internet

Previous HTTP VersionsPrevious HTTP Versions

• HTTP/0.9 used by WWW since 1990• HTTP/1.0 [RFC 1945]

– Supports MIME (Multipurpose Internet Mail Extension) messages [RFC 1341]

• MIME transmits non-textual files by encoding them

– Content negotiation

• HTTP/1.1 [RFC 2068]– Persistent connections– Caching

Page 6: DBI Representation and Management of Data on the Internet

General FeaturesGeneral Features

• Lightness and speed(response time of 100 ms in a hypertext jump)

• Client-Server protocol

• Stateless object-oriented protocol

• Open-ended set of methods and headers

• Typing and negotiation of data representation

Page 7: DBI Representation and Management of Data on the Internet

TerminologyTerminology

• User agent: client which initiates a request (browser, editor, web robot, …)

• Origin server: the server on which a given resource resides

• Proxy: acts as both a server and a client• Gateway: server which acts as intermediary

for other servers• Tunnel: acts as a blind relay between two

connections

Page 8: DBI Representation and Management of Data on the Internet

Client-Server ProtocolClient-Server Protocol

• The browser is the client

• The client sends requests to an HTTP Server

Page 9: DBI Representation and Management of Data on the Internet

Client-Server SessionsClient-Server Sessions

• The HTTP protocol supports a short conversation between browser and server

• The entire conversation is conducted using ASCII characters (8-bit)

• The standard (and default) port for HTTP servers to listen on is 80, though they can use any port

Page 10: DBI Representation and Management of Data on the Internet

HTTP SessionHTTP Session

• A basic HTTP session has four phases:1.Client opens the connection (a TCP connection)

2.Client makes the request

3.Server sends a response

4.Server closes the connection

Page 11: DBI Representation and Management of Data on the Internet

Nested ObjectsNested Objects

• Suppose a client accesses a page containing 10 inline images; to display the page completely would require 11 HTTP sessions

• Some browsers/servers support a feature called keep-alive which can keep the connection open until it is explicitly closed

Page 12: DBI Representation and Management of Data on the Internet

Index.html

Left frame

Jumping fish

Right frame

Fairy icon HUJI icon

Page 13: DBI Representation and Management of Data on the Internet

Stateless ProtocolStateless Protocol

• HTTP is a stateless protocol, which means that once a server has delivered the requested data to a client, the connection is broken, and the server retains no memory of what has just taken place

Page 14: DBI Representation and Management of Data on the Internet

ResourcesResources

• A resource is a chunk of information that can be identified by a URL (Universal Resource Locator)– The most common kind of resource is a file, but

a resource may also be • A dynamically-generated query result• The output of a CGI script, or• An active server page

Page 15: DBI Representation and Management of Data on the Internet

URLURL

• Universal Resource Identifiers [RFC 2396] are used to specify the object of a method– as an address (URL)

– as a name (URN)

URL = “http://” host [“:” port] [path]

IP addresses in URLs should be avoided [RFC 1900]

Page 16: DBI Representation and Management of Data on the Internet

Different URLsDifferent URLs

• There are different types of URL’s– http://<host>:<port>/<path>?

<searchpart>– mailto:<account@site>– news:<newsgroup-name>

Page 17: DBI Representation and Management of Data on the Internet

In a URLIn a URL

• Spaces are represented by “+”

• Characters such as &,+,% are encoded in the form “%xx” where xx is the ascii value in hexadecimal; For example, “&” = “%26”

• The inputs to the parameters are in a list of the following form

Var1=value1&var2=value2&var3=value3

Page 18: DBI Representation and Management of Data on the Internet

War&peace Tolstoy

Page 19: DBI Representation and Management of Data on the Internet

http://www.google.com/search?lr=&safe=off&q=war%26peace+Tolstoy

Page 20: DBI Representation and Management of Data on the Internet

Format of Request and ResponseFormat of Request and Response

• An initial line • Zero or more header lines • A blank line (i.e., a CRLF by itself), and • An optional message body (e.g., a file,

query data, or query output)

Note: CRLF = “\r\n” (usually ASCII 13 followed by ASCII 10)

Page 21: DBI Representation and Management of Data on the Internet

RequestRequest

• A request consists of:– Initial line– Headers– Blank line– Message body

Page 22: DBI Representation and Management of Data on the Internet

Initial Line of a RequestInitial Line of a Request

• The initial line consists of – Method– Path– HTTP Version

Page 23: DBI Representation and Management of Data on the Internet

Request FormatRequest Format

Page 24: DBI Representation and Management of Data on the Internet

Request ExampleRequest Example

GET /courses/dbi/index.html HTTP/1.0

From: [email protected] User-Agent: HTTPTool/1.0 [blank line here] Method Path Version Headers

Initial line

Page 25: DBI Representation and Management of Data on the Internet

Do Not Forget CRLFDo Not Forget CRLF

GET /courses/dbi/index.html HTTP/1.0 [CRLF]

From: [email protected] [CRLF] User-Agent: HTTPTool/1.0 [CRLF][CRLF]

Page 26: DBI Representation and Management of Data on the Internet

Request MethodsRequest Methods• GET returns the contents of the indicated

document– The most frequently used command

• HEAD returns the header information for the indicated document– Useful for finding out info about a resource

without retrieving it

• POST treats the document as a script and sends some data to it

Page 27: DBI Representation and Management of Data on the Internet

More MethodsMore Methods

• PUT replaces the contents of the document with some data

• DELETE deletes the indicated document

• TRACE invokes a remote loop-back of the request. The final recipient SHOULD reflect the message back to the client

• Usually these methods are not allowed

Page 28: DBI Representation and Management of Data on the Internet

GET MethodGET Method

• GET is the most common HTTP method

• It says “give me this resource”

Page 29: DBI Representation and Management of Data on the Internet

GET Requests With a ProxyGET Requests With a Proxy

Proxy Server

Client

Web ServerClient

Web Server

~/dbi/index.html

~/dbi/index.html

www.cs.huji.ac.il

www.cs.huji.ac.il

http://www.cs.huji.ac.il/~dbi/index.html

Page 30: DBI Representation and Management of Data on the Internet

HEAD RequestHEAD Request

• A HEAD request asks the server to return the response headers only, and not the actual resource (i.e., no message body)

• Same as GET but without the message body• This is useful for checking characteristics of

a resource without actually downloading it, thus saving bandwidth

• Used for testing hypertext links for validity, accessibility and recent modification

Page 31: DBI Representation and Management of Data on the Internet

PostPost

• POST request can send data to the server

• POST is mostly used in form-filling– The contents of a form are translated by the

browser into some special format and sent to a script on the server using the POST command

Page 32: DBI Representation and Management of Data on the Internet

Post (cont.)Post (cont.)

• There is a block of data sent with the request, in the message body

• There are usually extra headers to describe this message body, like Content-Type: and Content-Length:

• The request URI is a program to handle the sent data, not a resource to retrieve

• The HTTP response is normally the output of a program, not a static file

Page 33: DBI Representation and Management of Data on the Internet

Post ExamplePost Example

• Here's a typical form submission, using POST:

POST /path/script.cgi HTTP/1.0

From: [email protected]

User-Agent: HTTPTool/1.0

Content-Type: application/x-www-form-urlencoded

Content-Length: 35

home=Ross+109&favorite+flavor=flies

35 characters

Page 34: DBI Representation and Management of Data on the Internet

HeadersHeaders

• HTTP 1.0 defines 16 headers– none are required

• HTTP 1.1 defines 46 headers– one header (Host:) is required in requests

Page 35: DBI Representation and Management of Data on the Internet

HeadersHeaders

• From: – gives the email address of whoever is making

the request or running the program doing so

• User-Agent:– identifies the program that's making the request,

in the form "Program-name/x.xx", • x.xx is the (mostly) alphanumeric version of the

program. • For example, Netscape 3.0 sends the header

"User-agent: Mozilla/3.0Gold"

Page 36: DBI Representation and Management of Data on the Internet

Headers (cont.)Headers (cont.)

• Server: – analogous to the User-Agent: header: – it identifies the server software in the form

"Program-name/x.xx". – For example, one beta version of Apache's

server returns "Server: Apache/1.2b3-dev"

Page 37: DBI Representation and Management of Data on the Internet

Headers (cont.)Headers (cont.)

• If an HTTP message includes a body, there are usually header lines in the message that describe the body. In particular,

• Content-Type: – gives the MIME-type of the data in the body,

such as text/html or image/gif

• Content-Length: – gives the number of bytes in the body

Page 38: DBI Representation and Management of Data on the Internet

Headers (cont.)Headers (cont.)

• Last-Modified: – Gives the modification date of the resource

that's being returned – It's used in caching and other bandwidth-saving

activities – Greenwich Mean Time should be used and the

format isLast-Modified: Fri, 31 Dec 1999 23:59:59

GMT

Page 39: DBI Representation and Management of Data on the Internet

Initial Line of a ResponseInitial Line of a Response

• The initial line of a response is also called the status line.

• The initial line consists of– HTTP version– response status code– reason phrase that describes the status code

Page 40: DBI Representation and Management of Data on the Internet

Response FormatResponse Format

Page 41: DBI Representation and Management of Data on the Internet

HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354

<html> <body> <h1>Hello World</h1> (more file contents) . . . </body> </html>

Headers

Response ExampleResponse ExampleInitial line

Version

Status code

Reason phrase

Message body

Page 42: DBI Representation and Management of Data on the Internet

Status CodeStatus Code

• The status code is a three-digit integer, and the first digit identifies the general category of response: – 1xx indicates an informational message only – 2xx indicates success of some kind – 3xx redirects the client to another URL– 4xx indicates an error on the client's part

• Yes, the system blames it on the client if a resource is not found (i.e., 404)

– 5xx indicates an error on the server's part

Page 43: DBI Representation and Management of Data on the Internet

Status Code 1xxStatus Code 1xx

• The 100 (Continue) Status– Allows a client to determine if the Server

is willing to accept the request (based on the request headers) before the client sends the request body

– The client’s request must have the headerExpect: 100 (Continue)

• 101 Status -- Switching Protocols

Page 44: DBI Representation and Management of Data on the Internet

Status Code 2xxStatus Code 2xx

Status codes 2xx -- Success

• The action was successfully received, understood, and accepted– 200 OK– 201 POST command successful– 202 Request accepted– 203 GET or HEAD request fulfilled– 204 No content

Page 45: DBI Representation and Management of Data on the Internet

Status Code 3xxStatus Code 3xx

Status codes 3xx -- Redirection

• Further action must be taken in order to complete the request– 300 Resource found at multiple locations– 301 Resource moved permanently– 302 Resource moved temporarily– 304 Resource has not modified (since date)

Page 46: DBI Representation and Management of Data on the Internet

Status Code 4xxStatus Code 4xx

Status codes 4xx -- Client error• The request contains bad syntax or cannot be

fulfilled– 400 Bad request from client– 401 Unauthorized request– 402 Payment required for request– 403 Resource access forbidden– 404 Resource not found– 405 Method not allowed for resource– 406 Resource type not acceptable

Page 47: DBI Representation and Management of Data on the Internet

Status Code 5xxStatus Code 5xx

Status codes 5xx -- Server error

• The server failed to fulfill an apparently valid request– 500 Internal server error– 501 Method not implemented– 502 Bad gateway or server overload– 503 Service unavailable / gateway timeout– 504 Secondary gateway / server timeout

Page 48: DBI Representation and Management of Data on the Internet

Response InformationResponse Information

• Description of information– Server Type of server

– Date Date and time

– Content-Length Number of bytes

– Content-Type Mime type

– Content-Language English, for example

– Content-Encoding Data compression

– Last-Modified Date when last modified

– Expires Date when file becomes invalid

Page 49: DBI Representation and Management of Data on the Internet

Manually Experimenting with Manually Experimenting with HTTPHTTP

>host wwwwww.cs.huji.ac.il is a nickname for vafla.cs.huji.ac.il

vafla.cs.huji.ac.il has address 132.65.80.39

vafla.cs.huji.as.il mail is handled (pri=10) by cs.huji.ac.il

>telnet www.cs.huji.ac.il 80

Trying 132.65.80.39…

Connected to vafla.cs.huji.ac.il.

Escape character is ‘^]’.

Page 50: DBI Representation and Management of Data on the Internet

Sending a RequestSending a Request

>GET /~dbi/index.html HTTP/1.0

[blank line]

Page 51: DBI Representation and Management of Data on the Internet

The ResponseThe Response

HTTP/1.1 200 OKDate: Sun, 11 Mar 2001 21:42:15 GMTServer: Apache/1.3.9 (Unix)Last-Modified: Sun, 25 Feb 2001 21:42:15 GMTContent-Length: 479Content-Type: text/html

<html> (html code …)</html>

Page 52: DBI Representation and Management of Data on the Internet

GET /~dbi/index.html HTTP/1.0

HTTP/1.1 200 OK

HTML code

Page 53: DBI Representation and Management of Data on the Internet

GET /~dbi/no-such-page.html HTTP/1.0

HTTP/1.1 404 Not Found

HTML code

Page 54: DBI Representation and Management of Data on the Internet

GET /index.html HTTP/1.1

HTTP/1.1 400 Bad Request

HTML code

Why is it a Bad Request?

HTTP/1.1 without Host Header

Page 55: DBI Representation and Management of Data on the Internet

HTTP 1.1HTTP 1.1

HTTP/1.1 is replacing/has replaced HTTP/1.0 as the new Web protocol

Page 56: DBI Representation and Management of Data on the Internet

ImprovementsImprovements

• Faster response– allowing multiple transactions to take place over a

single persistent connection

– adding cache support

• Faster response for dynamically-generated pages– supporting chunked encoding, which allows a response

to be sent before its total length is known

• Efficient use of IP addresses– allowing multiple domains to be served from a single

IP address

Page 57: DBI Representation and Management of Data on the Internet

Improvements over HTTP 1.0Improvements over HTTP 1.0

• HTTP/1.1 has a number of features/improvements over HTTP/1.0, including– Persistent TCP connections

– Partial document transfers

– Conditional fetch

– Support for nonstandard HTTP/1.0 extensions

– Better support for alternative character sets

– More flexible authentication

– Faster response and great bandwidth savings

– Efficient use of IP addresses (virtual hosting)

Page 58: DBI Representation and Management of Data on the Internet

Non-Persistent ConnectionsNon-Persistent Connections

1 Browser opens TCP connection to port 80 of server (handshake)

2 Browser sends http request message3 Server receives request, locates object,

sends response4 Server closes TCP connection5 Client receives response, parses object6 Repeat 1-4 for each embedded object

Page 59: DBI Representation and Management of Data on the Internet

Persistent ConnectionPersistent Connection

1 Browser opens TCP connection to port 80 of server (handshake)

2 Browser sends http request message3 Server receives request, locates object, sends

response4 Client receives response, parses object5 Repeat 2-4 for each embedded object6 TCP connection closes on demand or timeout

Page 60: DBI Representation and Management of Data on the Internet

Advantages of Persistent Advantages of Persistent ConnectionConnection

• CPU time saved in routers and hosts

• HTTP requests and responses can be pipelined on a connection

• network congestion is reduced

• latency on subsequent requests is reduced

Page 61: DBI Representation and Management of Data on the Internet

PipelinesPipelines

• 2 types of persistent connections– without pipelining

• the client issues a new request only after the previous response has arrived

– with pipelining• client sends the request as soon as it encounters a

reference

• multiple requests/responses

– on the same IP packet, or

– on back-to-back packets

Page 62: DBI Representation and Management of Data on the Internet

Virtual HostsVirtual Hosts

• With HTTP 1.1, one server at one IP address can be multi-homed: – “www.cs.huji.ac.il” and “www.math.huji.ac.il” can live

on the same server

– These are called virtual hosts

– Without this mechanism, we have to use 2 different IP addresses

• It is like several people sharing one phone• An HTTP request must specify the host name (and

possibly port) for which the request is intended

Page 63: DBI Representation and Management of Data on the Internet

ExampleExample

• The request specifies the host:

GET /path/file.html HTTP/1.1

Host: www.host1.com:80

Page 64: DBI Representation and Management of Data on the Internet

Virtual Hosting (cont.)Virtual Hosting (cont.)

• Virtual hosting – reduces hardware expenditures – extends the ability to support additional servers– makes load balancing and capacity planning

much easier

• Without it – each host name requires a unique IP address,

and we are quickly running out of IP addresses with the explosion of new domains

Page 65: DBI Representation and Management of Data on the Internet

The Date HeaderThe Date Header• In HTTP 1.1, servers must include the generation

time of the response in the Date: header • Time values use Greenwich Mean Time (GMT)

and have the format

Date: Fri, 31 Dec 1999 23:59:59 GMT • Date is omitted only in a few cases, e.g., status

code 100 (continue) and some server errors• Servers must synchronize their clocks with a

reliable external standard

Page 66: DBI Representation and Management of Data on the Internet

CachingCaching

Caching improves performance

• Eliminates the need to send requests in many cases (reduces network round-trips), using an expiration mechanism

• Eliminates the need to send full responses in other cases (reduces network bandwidth), using a validation mechanism

Page 67: DBI Representation and Management of Data on the Internet
Page 68: DBI Representation and Management of Data on the Internet

Client CachingClient Caching

client

server

cache

• Client GET /fruit/apple.gif• Server responds with

Last-Modified-Date: ...

• Client caches object and last-modified-date

• Client sendsGET /fruit/apple.gif …If-Modified-Since: …

• Server returns either

304 Not Modified or object

Page 69: DBI Representation and Management of Data on the Internet

Network CachesNetwork Caches

client

client

client

server

server

proxyserver

GET /fruit/apple.gif

GET /fruit/apple.gif

GET /fruit/apple.gif

Page 70: DBI Representation and Management of Data on the Internet

Internet

Benefit of CachingBenefit of Caching

client

client

client

10Mbps LAN

R R

1.5Mbps

server

server15 req/sec100Kbits/req

proxyserver 40% hit rate

Page 71: DBI Representation and Management of Data on the Internet

Expiration ModelExpiration Model

• Servers may provide an expiration time using the Expires header– By checking the expiration time, the cache can

return a fresh response without contacting the server

• If the expiration time is not specified, the cache can heuristically estimate the expiration times (e.g., using header values, such as the Last-Modified time)

Page 72: DBI Representation and Management of Data on the Internet

The Risk in CachingThe Risk in Caching

• Response might not be

“semantically transparent”– the response is different from what would have

been returned by the origin server

• The cache should verify that the copy is fresh (i.e., expiration time has not passed)

• The copy is stale if it is not fresh

Page 73: DBI Representation and Management of Data on the Internet

ValidatorsValidators

• A validator is any mechanism that may help in determining whether a copy is fresh or stale– A strong validator is, for example, a counter

that is incremented whenever the resource is changed

– A weak validator is, for example, a counter that is incremented only when a significant change is made

Page 74: DBI Representation and Management of Data on the Internet

Using the CacheUsing the Cache

• To check whether a copy is fresh, the cache must either– Use the expiration model, or– Compare the Last-Modified time or some

validator with the origin server

• In the second case, the origin server either– Responds with the message 304(Not Modified), or

– Sends a full response with the entity body

Page 75: DBI Representation and Management of Data on the Internet

Cache-Control HeaderCache-Control Header

• Cache-control headers specify directives to the cache – Can be included in either requests or responses

• The server can specify “must-revalidate”– Cache must revalidate with the origin server

that the copy is still fresh

• The client can specify – the max-age of an unvalidated response– The max-stale time of a stale copy

Page 76: DBI Representation and Management of Data on the Internet

Do not Use a CacheDo not Use a Cache

• The Pragma: no-cache request header indicates that the request should not be satisfied from a cache

• Same as the no-cache cash-directive

• Should include both if server is not HTTP/1.1 compliant

• Directive applies to any recipient along the request/response chain

Page 77: DBI Representation and Management of Data on the Internet

If-Modified-Since HeaderIf-Modified-Since Header

• The If-Modified-Since: header is used with a GET request

• If the requested resource has been modified since the given date, the server returns the resource as it normally would (i.e., header is ignored)

• Otherwise, the server returns a 304 Not Modified response, including the Date: header, but with no message body

HTTP/1.1 304 Not Modified Date: Fri, 31 Dec 1999 23:59:59 GMT [blank line here]

Page 78: DBI Representation and Management of Data on the Internet

If-Unmodified-Since HeaderIf-Unmodified-Since Header

• The If-Unmodified-Since: header can be used with any method

• If the requested resource has not been modified since the given date, the server returns the resource as it normally would

• Otherwise, the server returns a 412 Precondition Failed response

HTTP/1.1 412 Precondition Failed [blank line here]

Page 79: DBI Representation and Management of Data on the Internet

Cooperative CachingCooperative Caching

Page 80: DBI Representation and Management of Data on the Internet

Cooperative Caching (cont.)Cooperative Caching (cont.)

• Higher level cache (e.g., national cash)– larger user population – higher hit rates

• Multiple Web cashes which cooperate => Improve overall performance

• Cooperative cashes usually built from clusters – divide the traffic overhead– improve storage capacity

Page 81: DBI Representation and Management of Data on the Internet

Cooperative Caching (cont.)Cooperative Caching (cont.)

• Which cashes should be asked for a particular doc?

• Hash routing (of URLs) -- an object will not be present in more than one cash

Page 82: DBI Representation and Management of Data on the Internet

Hop by HopHop by Hop

• HTTP/1.1 introduces the concept of hop-by-hop headers: – Message headers that apply only to a given

connection, and not to the entire path

– It enables much more power with the usage of proxies (cashes)

Page 83: DBI Representation and Management of Data on the Internet

Hop-by-Hop HeadersHop-by-Hop Headers

• Connection – options that are desired for that particular connection (e.g.,

connection:close)

• Public – lists the set of methods supported by the server

• Proxy-Authenticate– enables authentication methods between two hops

• Transfer-Encoding – compression method between two hops

• Upgrade – additional communication protocols

Page 84: DBI Representation and Management of Data on the Internet

Chunked EncodingChunked Encoding

• Chunked encoding– Transmission of streaming multimedia

• One frame varies in size and composition from the next

– Streaming video• Entire image transmitted in first chunk and

differences from the previous image are transmitted in the next chunk

Wake up, we speak about movies in the Internet

Page 85: DBI Representation and Management of Data on the Internet

CompressionCompression

• Most image formats (GIF, JPEG, MPEG) are precompressed

• Many other data types used in the Web are not precompressed

• Compression could save almost 40% of the bytes sent via HTTP

• There is a need for negotiating the type of encoding of the compressed resource

Page 86: DBI Representation and Management of Data on the Internet

Compression (cont.)Compression (cont.)

• Client sends the header Accept-Encoding– The header indicates the content-encodings that the

client can handle and the ones that the client prefers

• Server Sends– Content-Encoding header – for end-to-end

encoding indication

– Transfer-Encoding header - for hop-to-hop encoding indication (supported only in HTTP/1.1)

Page 87: DBI Representation and Management of Data on the Internet

Content NegotiationContent Negotiation

• Content Negotiation:– the process of selecting the best

representation for a given response when there are multiple representations available

• HTTP supports two kinds of content negotiation:– Server-driven negotiation– Agent-driven negotiation

Page 88: DBI Representation and Management of Data on the Internet

Server-Driven NegotiationServer-Driven Negotiation

The selection is made by the server, based on:– header field in the request (client preferences):Accept-Language / Accept-Encoding

– available representations of the response– other information (i.e., address of the client)

Disadvantages:– Impossible for the server to determine what is best for

the user– Inefficiency (clients should describe their capabilities

in every request)– Complicates implementation of servers

Page 89: DBI Representation and Management of Data on the Internet

Agent-Driven NegotiationAgent-Driven Negotiation

• Selection is made by the client after receiving an initial response from the server– Based on available representations specified

in the initial response– Automatic or manual

• Disadvantages:– needs a second request to obtain the best

alternative representation

Page 90: DBI Representation and Management of Data on the Internet

Protocol SwitchingProtocol Switching

• Protocol switching– Client can specify another protocol more suited

to the data being transferred (e.g., real-time synchronous protocol)

I hate HTTP/1.0I want

another protocol

Page 91: DBI Representation and Management of Data on the Internet

AuthenticationAuthentication

• Many sites require users to provide a username and password in order to access the documents housed on the server

• This requirement provides a mechanism for keeping track of users (more than just a security mechanism)

Page 92: DBI Representation and Management of Data on the Internet

AuthenticationAuthentication

Client Web Server

~/dbi/index.html

www.cs.huji.ac.il

Who are you?

Who are you?~/dbi/index.html

I am DonaldMy password is Duck

response

Page 93: DBI Representation and Management of Data on the Internet

AuthenticationAuthentication

• How does it’s work?– Client sends

• ordinary request message– server responds with

•401 Authorization Required status code •WWW-Authenticate header which specifies how to

perform authentication– Client resends

• the requested message, but this time including the Authorization header (e.g., user-name & password)

– The client continues to add this header for each following request to that server

Page 94: DBI Representation and Management of Data on the Internet

CookiesCookies

• Alternative way to identify browsers• Server response includes the Set-cookie

header that has the attributes– name = VALUE– expires = DATE STRING– domain = DOMAIN NAME– path = PATH– secure

• Client returns cookie with matching URLs

Page 95: DBI Representation and Management of Data on the Internet

CookiesCookies

• Example:– Client contacts a web site for the first time– Server response includes the header:

Set-cookie : 1678453

– Client stores the cookie value and the server name in a special “cookie file”

– For each further request for that server, the client will add the header

Cookie : 1678453

Page 96: DBI Representation and Management of Data on the Internet

Cookies (cont.)Cookies (cont.)

• Usage:– Server requires authentication, but doesn’t want

to hassle a user with a user-name and password– Remembering user’s preferences for

advertising– Cookies enable creating a virtual shopping cart

• Problems– users who access the same site from different

machines

Page 97: DBI Representation and Management of Data on the Internet

Are you HTTP experts nowAre you HTTP experts now??

• Not yet

• There are more headers, for example, that this talk did not cover

• To know more, go to the specifications

Page 98: DBI Representation and Management of Data on the Internet

Additional InformationAdditional Information

• For specifications and additional information:– http://www.w3.org/Protocols/– http://www.w3.org/Protocols/Specs.html– http://www.jmarshall.com/easy/http/– http://wdvl

.com/Internet/Protocols/HTTP/article.html