55
Clarke, R. J (2001) L909-06: 1 Office Automation & Intranets BUSS 909 Lecture 6 Web Architecture and Standards

Office Automation & Intranets

  • Upload
    konala

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Office Automation & Intranets. BUSS 909. Lecture 6 Web Architecture and Standards. Notices (1). Assignment 2 is available from the BUSS909 Intranet- includes a Marking Criteria sheet there are files on the intranet that provide information needed for the assignment: - PowerPoint PPT Presentation

Citation preview

Page 1: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 1

Office Automation & Intranets

BUSS 909

Lecture 6Web Architecture and

Standards

Page 2: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 2

Notices (1) Assignment 2 is available from the

BUSS909 Intranet- includes a Marking Criteria sheet

there are files on the intranet that provide information needed for the assignment:Organising Structures and SchemesMedia & Content ClassificationNavigation, Labeling and Searching

Page 3: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 3

Notices (2)

Additional files have been placed on the BUSS909 Intraneta fundamentals of ‘Information Theory and

Systems Theory’ file called sl909-00. ppt

an introduction to different types of services on the internet is available in a file called sl909-03.ppt

Page 4: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 4

Agenda (1)

WWW BasicsWeb Server OverviewWeb Documents & TreesHypertext Transfer Protocol (HTTP)Serving a Web Document- Example

Page 5: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 5

WWW Basics

Page 6: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 6

WWW Basics

WWW and the InternetWeb Client and Web Server SoftwareUniversal Resource Locators (URLs)Hypertext Transfer Protocol (HTTP)Hypertext Markup Language (HTML)

Page 7: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 7

Uniform Resource Locators

Page 8: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 8

Uniform Resource Locators (1)Definition

a Uniform Resource Locator (URL) is the address of a network resource. URLs for the WWW actually contain several components

the first component identifies the URL scheme or protocol being used to transfer information

Page 9: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 9

Uniform Resource Locators (2) Some Popular URL Schemes

Hypertext Transfer Protocol http

HTTP using Secure Sockets Layer (SSL) https

E-mail Address mailto

File Transfer Protocol ftp

Finger protocol finger

Gopher protocol gopher

Wide Area Information Server wais

Usenet news news

Usenet news via Network News Transfer Protocol (NNTP) nntp

Usenet news via SSL-encrypted NNTP snews

Host-specific filenames file

Internet Relay Chat session irc

Telnet interactive session telnet

Page 10: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 10

Uniform Resource Locators (3) Server Name & Resource

the second component identifies the name of a server sitting on the Internet from which a resource is being requested

the third component identifies part of the server’s subdirectory and the file name for a resource- most likely a HTML document

Page 11: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 11

Uniform Resource Locators (4) ‘Complete URL’ to UOW Home Page

URL schemeserver name server’s subdirectory and

resource file name

http://www.uow.edu.au/index.html

Page 12: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 12

Uniform Resource Locators (5) Incomplete URL top UOW Home Page

However, the shorter URL

http://www.uow.edu.au/index.html

points to the ‘home page’ of that serverWeb servers have a default filename

often default.html or index.htmlNote: either this URL or the previous

one enables the user to view the home page for UOW web site

Page 13: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 13

Uniform Resource Locators (6)Omitting the Scheme in Web URLs

Because of the popularity of WWW, the scheme is occasionally omitted

web browsers are able to substitute this parts of web URLs

the URL terra.uow.edu.au is interpreted by Netscape as http://terra.uow.edu.au/

Page 14: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 14

Uniform Resource Locators (7)Partial or Relative Web URLs

a partial or relative URL is one which does not have a protocol, host, port, or path

eg. rsch-ss.htm when referenced by http://www.uow.edu.au/commerce/buss/

research.htm

is a relative form of

http://www.uow.edu.au/commerce/buss/rsch-ss.htm

Page 15: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 15

Uniform Resource Locators (8)Anchors in Web URLs

Web URLs support the use of a # sign after the HTML filename to indicate an anchor

for example, http://www.uow.edu.au/residences/

inter_house/#Facilities refers to the “Facilities” section of the document inter_house.htm

Page 16: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 16

Uniform Resource Locators (9)Preserving State Information in URLs ...

WWW is inherently statelessonce a request from a client is

answered by a HTTP server, the transaction is effectively concluded

the transaction’s current status is lost, that is normally not recorded for future transactions

Page 17: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 17

Uniform Resource Locators (10)… Preserving State Information in URLs ...

state information must be available for many uses like:electronic commerce across internet

(shopping carts), extranet (EDI), etcresearching on the web with search

engines which generally involves multiple attempts at converging on a small set of useful sources

Page 18: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 18

Uniform Resource Locators (11)… Preserving State Information in URLs ...

however, state can be preserved for the duration of a user’s session by placing additional information into the URL

this information is typically sent to the CGI-BIN area on the server- the CGI-BIN area is where user provided executable routines are placed for execution during a user’s session

Page 19: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 19

Uniform Resource Locators (12)… Preserving State Information in URLs ...

conventions exist for passing state information to CGI routines

search parameters can form state information- for example, search term “intranets” can be sent as a parameter to the query routine located in the CGI bin of Ultavista search engine

Page 20: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 20

Uniform Resource Locators (13)… Preserving State Information in URLs

Everything after the ? is the parameter string that is past to the query routine located on the Altavista site

http://www.altavista.com/cgi-bin/ query?pg=q&kl=XX&q=intranets&search=Search

Page 21: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 21

Web Server Overview

Page 22: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 22

Web Server Overview

Web Server ComponentsRelationship to HTTPLimits of Web Servers

Page 23: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 23

Web Documents & Trees

Page 24: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 24

Web Documents & Trees

MIME file extensions and typesDocuments, Links and AnchorsDocument Tree Organisation

Page 25: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 25

Hypertext Transfer Protocol

Page 26: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 26

Hypertext Transfer Protocol

browser and server communicate using HTTPsimple set of rules designed to be

suitable for hypermedia systems distributed across networks

must understand this protocol in order to understand the WWW

HTTP defines a simple request-response ‘conversation’

Page 27: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 27

Hypertext Transfer Protocol

HTTP does define how to correctly format the request and the responsethe client- often but not necessarily a

browser- is the requesting program and establishes a connection to the receiving program or server

the server replies with a response including the requested information if possible

Page 28: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 28

Hypertext Transfer Protocol

HTTP does not define:how the network connection is made or

managed, orhow the information is actually transmitted

(this is done by lower-level protocols such as TCP/IP)

HTTP requests consist of a method, a Universal Resource Identifier (URI), a protocol version, and other information

Page 29: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 29

Hypertext Transfer Protocol HTTP Requests: Methods ...

HTTP Methods- commonly supported methods include:GET- which returns the object;

retreives the informationHEAD- returns only information about

the object, but not the object itselfPOST- send information to be stored on

the server (eg. input to scripts)

Page 30: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 30

Hypertext Transfer Protocol ... HTTP Requests: Methods

some HTTP methods are not supported by many browsers because they may put the integrity of the server at risk: PUT- send a new copy of an existing objectDELETE- permanently remove an object

other medthos may be added to the standard in the future- HTTP is extensible and has evolved- slowly

Page 31: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 31

Hypertext Transfer Protocol HTTP Requests: Information Client -> Server

User-Agent: kind of browser making requestIf-Modified-Since: the object is returned only

if it is newer than a specified date (can save the cost of a retrieval)

Accept: the MIME types and formats the browser has been congigured to accept (can save the cost of downloading an unreadable document)

Authorization: user password etc. as required

Page 32: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 32

Serving Documents- Example

Page 33: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 33

Serving Documents- Example 1: Server waits for a new request

httpd program waits for a clients request to arrive from somewhere on the Internet

server listens to a port until someone calls it and until that occurs it is dormant

Page 34: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 34

Serving Documents- Example2: Request arrives from client ...

ultimately a request is sent by a client to the server either by typing a URL or selecting a HTML anchor

the network software (client) locates the server computer and sets up a 2-way network connection from the client to the server

Page 35: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 35

Serving Documents- Example... 2: Request arrives from client

client can locate servers by the use of Internet protocols and the name service (DNS) to locate and initiate a connection with the server

once the connection is established the client sends the HTTP request:

GET /sample.htm HTTP/1.0

sent over the network in ASCII, server receives it and saves it

Page 36: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 36

Serving Documents- Example3: server parses the request ...

server decodes the request using HTTP protocol to determine what to do

there are three important pieces of information:the method instructs the server as to

what action should be taken. The GET method is used to locate and read the file and return it to the client ...

Page 37: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 37

Serving Documents- Example... 3: server parses the request

the document (/sample.htm) can be fetched by the server because it knows where it is in the document tree, and the

browser protocol being used (HTTP/1.0) so that the contents can eventually be returned to the client sent back over the same connection as the request. (Note that the server need not find the client on the Internet or make a new connection)

Page 38: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 38

Serving Documents- Example4: Read other information (if necessary) ...

the httpd program reads the rest of the requests needed

using HTTP/1.0 the browser is expected to send additional information about itself to the server

this meta-information describes the browser and its capabilities which may be needed by the server to reply to the request

Page 39: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 39

Serving Documents- Example... 4: Read other information (if necessary)

for example:User-agent: Mosaic for X Windows/2.4

Accept: text/plain

Accept text/html

Accept: image/*

indicates the browser is Mosaic configured to display text, and any kind of image

Page 40: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 40

Serving Documents- Example5: Do the requested method ...

Assuming no errors, the httpd program executes the request

to GET a document requires looking up the file /sample.htm in its document tree using its standard operating system

there are two alternative courses of action depending on sucess or failure

Page 41: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 41

Serving Documents- Example... 5: Do the requested method (Success) ...

the httpd daemon sends a result code and the information that describes the type of information expected by the clientas the document is found a code 200 (everything

is OK) is sent and the document will followthe information is a HTML document so the

Content-type: text/htm; the document is 1066 bytes long so the Content-length: 1066

the server software and the file date are also included

Page 42: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 42

Serving Documents- Example... 5: Do the requested method (Success)

the header sent to the client might look something like this:

HTTP/1.0 200 Document followsServer: NCSA/1.4Date: Thu, 20 Jul 1996 22:00:00 GMTContent-type: text/htmlContent-length: 1066Last-modified: Thu, 20 Jul 1996 20:38:40 GMT

Page 43: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 43

Serving Documents- Example5: Do the requested method (Failure)...

if the requested file could not be found or read then the status code will not be 200

the most common problem is that the name of the requested file is misspelt so the server cannot find it

if the requested file was called smple.htm it would not be found- the server would send a status code 403

Page 44: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 44

Serving Documents- Example... 5: Do the requested method (Failure)...

the response might look like this:HTTP/1.0 403 Not Found

Server: NCSA/1.4

Date: Thu, 20 Jul 1996 22:00:00 GMT

Content-type: text/htm

Content-length: 0

Page 45: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 45

Serving Documents- Example6: Finish Up

when the file is completely sent or an error message is sent,the httpd server has finished its work- it closes

the file if it was open, and closes the network port which terminates the network connection

the client receives and formats the data- the server knows nothing

the httpd server listens for another request (go back to step 1)

Page 46: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 46

Web Server Operations

Page 47: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 47

Web Server Operations

a web server has a collection of information in a document tree and it serves it according to the HTTP protocol

web servers are reactive programs waiting until a request is made; it attempts to make it, this is repeated etc.

the previous example is only slightly simplified

Page 48: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 48

Web Server OperationsHandling Multiple Requests (1)

if a server processes one request at a time, but can receive many simultaneous requests then delays will occur- an image may take several seconds to serve without a priority scheme, small jobs that can

be serviced quickly take inordinate amount of time to serve

with a large number of hits servers can go down- backlog can be too great

Page 49: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 49

Web Server OperationsHandling Multiple Requests (2)

web servers are therefore designed to handle as many requests as possible simultaneously

several strategies are available to do this (the last two are are more difficult unless special software is used): clone a copy of the httpd program for each

request- very easy under UNIX multithreading the httpd program spreading the work amongst several helper

programs

Page 50: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 50

Web Server OperationsCloning Servers (1)

each request is processed by a new copy of the httpd program

the original server called the parent immediately returns to listening for another request

the new copy called the child performs the processing

Page 51: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 51

Web Server OperationsCloning Servers (2)

the parent passes the network connection to the adult at the time that it is first spawned

when the has services the request, it terminates forever

the web server hardware may have many copies of the httpd program running simultaneously

Page 52: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 52

Web Server OperationsMultithreaded Execution

many mechanisms can be used for implementing this approach server may monitor the progress of several

connections, switching between them as necessary

when a lengthy process is in operation the server may switch to another pending task

when the pending processes is complete it can return to the previous lengthy process

server closes the network connections of any finished processes

this can be an extremely efficient method

Page 53: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 53

Web Server OperationsServers as Cooperating Sets of Programs

the httpd server itself can be made a set of cooperating programs specialised to perform particular tasks

One program reads the requests fro the network, another allocates them to specialised helper programs

the scheme is very efficient, the number of helpers can be adjusted to meet the number of requests, the type of requests (generally less common) or the size of the system

Page 54: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 54

Web Server OperationsMultiple Web Services on the same Servers

more than one web service can run on the same computer

any number of httpd programs can run on a UNIX machine as long as they have a unique port number the following web services are on the same computer

but different ports (the superuser sets up port 80 servers, but users can own and operate unrestricted ports above 1024):

http://www.rods.org/index.htm (port 80)http://www.rods.org:8080/index.htm (port 8080)http://www.rods.org:8081/index.htm (port 8081)

Page 55: Office Automation & Intranets

Clarke, R. J (2001) L909-06: 55

Web Server OperationsEstablishing a Two-Way Network Connection

client must look up the network address of the server using its name

the client’s system software sends a packet back to the server, requesting a connection

the server’s system software sends a packet back to the client, agreeing to set up a connection

the client program is connected to the new network connection

the server program is connected to the new network connection