68
23 January 2007 Kaiser: COMS E6125 1 COMS E6125 Web-enHanced COMS E6125 Web-enHanced Information Management Information Management (WHIM) (WHIM) Prof. Gail Kaiser Prof. Gail Kaiser Spring 2007 Spring 2007

23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

Embed Size (px)

Citation preview

Page 1: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 1

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

Prof. Gail KaiserProf. Gail Kaiser

Spring 2007Spring 2007

Page 2: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 2

Reminders

• Class attendance required!• Preliminary paper proposal January 29th

• Preliminary project proposal March 5th

• Paper must be individual, projects may be teams of 2-5 students

• See advice about team formation at http://york.cs.columbia.edu/classes/cs6125/team_advice.htm

Page 3: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 3

Class Attendance is Required!

• Attendance will be taken at every class meeting, starting TODAY

• Final grade reduced one notch for first miss (e.g., A- -> B+)

• Final grade reduced full letter grade for second miss (e.g., A- -> B-)

• Fail (or drop) course for third miss

Page 4: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 4

Today’s Topic: Basic Mechanics of the Web

• URI (~URL)• HTTP• Client/Server Intermediaries

Page 5: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 5

What is a “URI”?• Uniform Resource Identifier• Compact string of characters, that

conform to a certain syntax, for identifying an abstract or physical resource

• Simple and extensible format• Example:

http://york.cs.columbia.edu/classes/cs6125

Page 6: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 6

What is a “Resource”?• Some piece of information that can be

identified by a URI• The most common kind of resource is a

file• But may also be a dynamically-

generated query result, the output of a script, a document available in several languages, etc.

Page 7: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 7

Uniform Resource Identifier• Uniform: aka Universal, same string can be

used with the same semantic interpretation, even when mechanisms used to access the resource differ

• Resource: Conceptual mapping to an entity or set of entities - not necessarily the entity which corresponds to that mapping at any particular instance in time, not always network “retrievable”

• Identifier: An object that can act as a reference to something that has identity

Page 8: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 8

Key requirement: Transcribability

• Sequence of characters• May be transcribed from non-network

source• Often needs to be remembered by people• Should consist of characters that are most

likely to be able to be typed into a computer, within the constraints imposed by keyboards (and related input devices) across languages and locales

Page 9: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 9

Why do we usually say URL rather than URI?

• A Uniform Resource Locator (URL) refers to the subset of URI that identify resources via a representation of their primary access mechanism (e.g., their network “location”)

• Most popular form of URI

Page 10: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 10

What’s a URI that’s not a URL?

• URN = Uniform Resource Name• Subset of URIs that denote a resource

independent of its current location or the name by which it is known or the mechanism by which it is accessed

• Required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable

• Thus not necessarily retrievable

Page 11: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 11

URN vs. URL Example• Assume a published book (the

resource)• ISBN assigned by the Library of

Congress - this is the URN• Assume the entire contents of the book

were placed on a Web server at http://www.xyz.com/book.gz and an Ftp server at ftp://ftp.xyz.com/book.gz - both of these are URLs

Page 12: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 12

URL Notation• <scheme>://<authority><path>?

<query>

typically, an Internet domainname

specific to the authority, identifies the resource within

the scope of the scheme and authority

a string of information to be interpreted

by the resource

Page 13: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 13

What’s a “domain name”?

• Domain Name System (DNS)– Maps domain names to IP addresses and vice

versa – Hierarchy of DNS servers for top level domains

(.com, .edu, .uk, etc.), second level domains (columbia.edu, ibm.com, etc), and so on

– Eventually finds IP address for individual host (e.g., www.cs.columbia.edu)

• Originated ~1982, for email (gk60@CMUA -> [email protected] -> [email protected])

Page 14: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 14

What is a “scheme”?• <scheme>:<scheme-specific-part> • In a URL, the protocol employed for

retrieval (http, ftp, file, mailto, etc.)• More generally, a specification for

defining the syntax and semantics of the rest of the URI

• Extensible because new schemes can be defined, with their own scheme-specific format after the colon (:)

Page 15: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 15

Example URLs• http://www.ietf.org/rfc/rfc3986.txt • gopher://gopher.quux.org/1/Software/

Gopher

• mailto:[email protected] • news:news.newusers.questions • telnet:cs.columbia.edu

Page 16: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 16

Example Absolute URIs• http://somehost/absolute/URI/

with/absolute/path/to/resource.txt• ftp://somehost/resource.txt• urn:a-rose-by-any-other-name

Page 17: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 17

Example Relative URIs• http://somehost/absolute/URI/with/absolute/

path/to/resource.txt• /relative/URI/with/absolute/path/to/

resource.txt• relative/path/to/resource.txt• ../../../resource.txt• resource.txt• /resource.txt#frag01• #frag01• [empty string]

Page 18: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 18

Relative Addresses

• Allows document trees to be (partially) independent of their location and scheme

• A single set of hypertext documents can be simultaneously traversable via each of the ftp, http and file schemes if the documents refer to each other using relative URIs

• Such document trees can be moved, as a whole, without changing any of the relative references

Page 19: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 19

URI “Standard”• URI is an Internet protocol element

defined currently in RFC 3986 (2005)• Originally RFC1630 (1994)

Page 20: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 20

What is an “RFC”?• Request for Comments • One of a series, begun in 1969, of

numbered Internet informational documents and standards widely followed by commercial software and freeware in the Internet and Unix communities

• All Internet standards are recorded in RFCs

Page 21: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 21

Who keeps track of RFCs?

• IETF = Internet Engineering Task Force• Open, all-volunteer organization, with no

formal membership or membership requirements

• Organized into a large number of working groups, each dealing with a specific topic

• April 1st RFCs, e.g., http://www.apps.ietf.org/rfc/rfc3514.html

Page 22: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 22

What is “W3C”?• World Wide Web Consortium defines data

formats and usage conventions as well as Internet protocols relevant to Web

• Members pay fees depending on country, revenues and non-profit/for-profit status (e.g., $953 vs. $63,500)

• Otherwise organized similar to IETF, but writes “Recommendations” instead of “Requests for Comments”

• http://www.w3.org/

Page 23: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 23

Back to URLs• Most (?) Web documents use the “http”

scheme• What is “http” (HyperText Transfer

Protocol)?

Page 24: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 24

HTTP• The default Internet protocol used to

deliver data on the World Wide Web• Usually through TCP/IP sockets on port

80, but can use any port and can be implemented on top of any reliable networking protocol

• A Web browser (HTTP client) sends requests to an Web server (HTTP server), which sends responses back to the client

Page 25: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 25

What’s “TCP/IP”?• IP = Internet Protocol

– Delivers individual packets from one host to another, based on their IP address (in IPv4, four 8-bit octets as in 128.59.16.20)

– Network routers direct traffic of IP packets

Page 26: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 26

What’s “TCP/IP”?• TCP = Transmission Control Protocol

– Provides an abstraction of reliable, bidirectional connections for the delivery of IP packets to a particular port at a given IP address

– The so-called well known ports (< 1024) are reserved for specific protocols

– By default, HTTP uses port 80; this can change in the URL

– http://www.foo.com:2007/doc.html

Page 27: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 27

HTTP History• HTTP/0.9 (1990) - simple protocol for raw

data transfer• HTTP/1.0 (RFC 1945, 1996) - Allowed

MIME-like messages, containing meta-information about the resources transferred and modifiers on the request/response semantics

• HTTP/1.1 (RFC 2616, 1999)• HTTP Extension Framework (RFC 2774,

2000)

Page 28: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 28

What is “MIME”?• Multipurpose Internet Mail Extensions• Standard representation for “complex”

message bodies (numerous RFCs since 1993)

• Examples include messages with embedded graphics or audio clips, messages with file attachments, messages in Japanese or Russian, signed messages

Page 29: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 29

MIME Header Fields• Mime-Version, Content-Type, Content-

Transfer-Encoding, Content-Description, Content-ID, Content-Location, Content-Disposition, Part Body

• Discrete (text, image, audio) and Multipart (mixed, digest) content types

Page 30: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 30

HTTP Request/Response

HTTPrequest

Port 80

ResponseOther port

Processing

HTTP C

lien

t

Page 31: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 31

HTTP Requests and Responses

• Consist of a start-line, zero or more headers (one per line), an empty line (CRLF) indicating the end of the header fields, and possibly a message-body

• Message body only allowed with certain request methods and response status codes (200 OK vs. 404 NOT FOUND)

Page 32: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 32

Sample HTTP Exchange• To retrieve the file at the URL

http://www.somehost.com/path/file.html

• First open a socket to the host www.somehost.com, port 80 (use the default port of 80 because none is specified in the URL)

Page 33: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 33

Sample• Then, send something like the following

through the socket: GET /path/file.html HTTP/1.0

From: [email protected] User-Agent: HTTPTool/1.0

Accept: text/html, image/gif, image/jpeg [blank line here]

Page 34: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 34

• The server should respond with something like the followingHTTP/1.0 200 OK Server: Apache/1.3.0 (Linux)

Date: Sun, 31 Dec 2006 23:59:59 GMT Last-Modified: Sun, 31 Dec 2006 23:59:58

GMT Content-Type: text/html Content-Length: 1354 <html> <body> <h1>Happy New Year!</h1> (more file contents) . . . </body> </html>

Page 35: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 35

Some Request Headers• From: gives the email address of whoever's

making the request, or running the program doing so (for bots)

• User-Agent: identifies the program that's making the request, in the form "Program-name/x.xx", where x.xx is the alphanumeric version of the program (e.g., browser)– User-Agent: Mozilla/4.0 (compatible; MSIE

6.0; Windows NT 5.1; .NET CLR 1.0.3705)

Page 36: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 36

Some Response Headers

• Server: analogous to User-Agent:, identifies the server software in the form "Program-name/x.xx"– Server: Apache/1.3.12 (Unix)

• Last-Modified: gives the modification date of the resource that's being returned, e.g., for use in caching – Use Greenwich Mean Time, in the format

Last-Modified: Tue, 23 Jan 2007 00:00:01 GMT

Page 37: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 37

Start Line• HTTP Version (0.9, 1.0, 1.1)• URI• Method (request) or Status Code

(response)

Page 38: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 38

HTTP URIs• Up to some bounded length (often

255), or “unbounded”, status code 414 (Request-URI Too Long)

• Equivalence comparisonhttp://abc.com:80/~smith/home.htmlhttp://ABC.com/%7Esmith/home.htmlhttp://ABC.com:/%7esmith/home.html

Page 39: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 39

Request Messages• Method SP Request-URI SP HTTP-Version

CRLF • GET http://www.w3.org/pub/WWW/ TheProject.html HTTP/1.1

• Equivalent to client making TCP connection to www.w3.org on port 80, then sending GET /pub/WWW/TheProject.html HTTP/1.1 Host: www.w3.org

• Host field allows for virtual hosts

Page 40: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 40

What is a “virtual host”?

• Enables the same machine to host multiple domain names, sometimes at the same IP address (name-based virtual hosting)

• Important for website hosting (e.g., www.foo.com maps to /www/foo/site1 and www.bar.com maps to /www/bar/site2), but usually there can be only one secure https website per IP address/port

Page 41: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 41

GET• Retrieve whatever information (in the form of

an entity) is identified by the URI• If the URI refers to a data-producing process,

it is the produced data (given the input parameters after the “?”, if any) that is returned as the entity in the response - not the source text of the process (unless that text happens to be the output of the process)

• http://foo.com/run.cgi?name1=val1&name2=val2

Page 42: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 42

Conditional and Partial GET

• Conditional if the request message includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range header field

• Partial if the request message includes a Range header field

• Don’t retrieve data the client doesn’t need (e.g., at least part and up to date already in cache)

Page 43: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 43

HEAD• Identical to GET except that the server

must not return a message-body in the response - only returns headers

• Often used for testing hypertext links for validity and modification

• Can mark cache entries as stale if certain header information changes (e.g., length, last-modified)

Page 44: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 44

POST• Used to request that the origin server

accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line

• Actual function performed by the POST method is determined by the server, usually dependent on the Request-URI

Page 45: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 45

POST supports several functions

• Annotation of an existing resource• Posting a message to a bulletin board,

newsgroup, mailing list, or similar group of articles

• Providing a block of data, such as the result of submitting a form, to a data-handling process

• Extending a database through an append operation

Page 46: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 46

POST vs. GET• GET can be used to send small amounts

of data to a server, with the data following the ? character

• The rest of the request-URI (before the ?) refers to some kind of processing program

GET /path/script.cgi?field1=value1&field2=value2 HTTP/1.0

Page 47: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 47

PUT and DELETE• Often unsupported (501 Not Implemented)• PUT requests that the enclosed entity be stored

under the supplied Request-URI • May create a new resource at a new URI, or

modify an existing resource already at that URI• DELETE requests that the origin server delete

the resource identified by the Request-URI• May be overridden, e.g., by human

intervention, even if status code indicates successfully completed

Page 48: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 48

OPTIONS and TRACE• OPTIONS allows the client to determine the

requirements associated with a resource, or the capabilities of a server (OPTIONS *), without implying a resource action or initiating a resource retrieval

• TRACE used to invoke application-layer loop-back of the request message, allowing the client to see what is being received at the other end of the request chain for testing or diagnostic information

Page 49: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 49

HTTP is “Stateless”• Server doesn’t remember anything

about client between connections• Not even between requests during the

same persistent connection, except TCP data

• But some state can be encoded in complex URLs or in forms

• Or saved on client in “cookies”

Page 50: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 50

Cookies• Opaque string associated with a website, stored

at the browser • Create in HTTP response with “Set-Cookie: ”• In all subsequent requests to this site, until

cookie’s expiration, the client sends the HTTP header “Cookie: ”

• Name-value pairs– Cookie: user=“alex”

lastvisit=“20070123-11:00”• Interpretation up to the Web application

Page 51: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 51

Response Messages• HTTP-Version SP Status-Code SP

Reason-Phrase CRLF • Example: HTTP/1.0 404 Not Found • Status code: 3-digit integer result code

of the attempt to understand and satisfy the request

• Response phrase: short textual description of the Status-Code

Page 52: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 52

Status Codes• Applications need only understand first

digit, treat others as equivalent to x00• 1xx: Informational - Request received,

continuing process ("100" : Continue, relevant to persistent connections)

• 2xx: Success - The action was successfully received, understood and accepted ("200" : OK)

Page 53: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 53

Status Codes• 3xx: Redirection - Further action must

be taken in order to complete the request ("300" : Multiple Choices)

• 4xx: Client Error - The request contains bad syntax or cannot be fulfilled ("400" : Bad Request)

• 5xx: Server Error - The server failed to fulfill an apparently valid request ("500" : Internal Server Error)

Page 54: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 54

HTTP Request/Response

• In HTTP 1.0, a connection is established by the client prior to each request and closed by the server after sending the response

• Either party may close the connection prematurely, due to user action, automated time-out, or program failure

• Closing of the connection by either or both parties always terminates the current request, regardless of its status

• But TCP connections are expensive

Page 55: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 55

HTTP 1.1 “Persistent Connection”

• Many Web pages consist of several files on the same server

• If an HTTP 1.1 client sends multiple (pipelined) requests through a single connection, the server should send responses back in the same order

• Intermediate responses "100" : Continue

Page 56: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 56

How does the connection finally get

closed?

• If a request includes the "Connection: close" header, that request is the final one for the connection and the server should close the connection after sending the response

• The server should also close an idle connection after some timeout period

Page 57: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 57

Advantages of Persistent Connections

• Requests and responses can be pipelined - a client makes multiple requests without waiting for each response

• Network congestion reduced by fewer packets caused by TCP opens, and by allowing TCP sufficient time to determine the congestion state of the network

• Latency on subsequent requests is reduced since there is no time spent in TCP's connection opening handshake

Page 58: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 58

Basic HTTP Architecture

Page 59: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 59

Intermediary

• Program sitting in the path between HTTP clients and servers

• Acts as a server to clients and as a client to origin servers or other intermediaries

Page 60: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 60

Proxy

• Forwarding agent• Receives request, rewrites all or parts

of the message, and forwards the reformatted request toward the server identified by the URI

• Used for load balancing, anonymizing clients

Page 61: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 61

Gateway• Receiving agent• Acts as a layer above some other server(s)

and, if necessary, translates the requests to the underlying server's protocol

• Example: Web mail accessing an IMAP server– A URL identifies the mail server, mailbox,

password– Converts the HTTP request to an IMAP

request, gets the IMAP response, converts it to HTTP response

Page 62: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 62

Tunnel• Relay point between two connections

without changing the message• Looks at the first line of the HTTP

message to locate the host to be contacted and accept the request

• Simply relays bits between the two connection points

• Does not parse or interpret messages • Used when the communication needs to

pass through a firewall

Page 63: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 63

Transcoder• Modifies data as it passes to clients, e.g., to

filter ads• Particularly useful for wireless and/or

constrained devices– Convert HTML to WML– Modify content to fit small screen– Convert modality of interaction, e.g.,

driving directions from displaying text to playing audio

Page 64: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 64

Caching• Request/response chain is shortened if one of

the participants along the chain has a cached response applicable to request

• Used to reduce latency and network traffic

Page 65: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 65

HTTP 1.1 Caching Support

• Allows a server to determine caching policies in its response– Expires xx-xx-xx yy:yy:yy.yy– Cache-Control: no-store – don’t cache at

all– Cache-Control: no-cache – validate

every time or don’t cache– Cache-Control: private – can’t keep in a

public cache

Page 66: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 66

HTTP 1.1 Chunked Encoding

• Faster response for dynamically-generated pages or very large pages

• Allows the beginning of a response to be sent before its total length is known

• Each chunk is prefixed by its size in bytes• A zero size chunk indicates the end of the

response message• If a server is using chunked encoding it must set

the Transfer-Encoding header to "chunked".

Page 67: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 67

Reminders

• Class attendance required!• Preliminary paper proposal January 29th

• Preliminary project proposal March 5th

• Paper must be individual, projects may be teams of 2-5 students

• See advice about team formation at http://york.cs.columbia.edu/classes/cs6125/team_advice.htm

Page 68: 23 January 2007Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2007

23 January 2007 Kaiser: COMS E6125 68

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

Prof. Gail KaiserProf. Gail Kaiser

Spring 2007Spring 2007