Upload
herbert-hudson
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
1
Web Standards and the HyLiFe Project
(including authentication and distributed searching)
Brian Kelly Email Address
UK Web Focus [email protected]
UKOLN URL
University of Bath http://www.ukoln.ac.uk/UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Systems Committee of the Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries Programme and the European Union. UKOLN also receives support from the University of Bath where it is based.
2
UK Web Focus / W3C
UK Web Focus:• JISC funded post based at UKOLN (Bath Univ)• Advises UK HE community on web issues• Represents JISC on W3C
W3C (World Wide Web Consortium):• International consortium, with headquarters at
MIT, INRIA and Keio University (Japan)• Coordinates development of web protocols• Four domains:
• Architecture • Technology & Society• User Interface • Web Accessibility
3
What Are Your Interests?
What interests do you have in web standards and technologies?
4
Contents
• Introduction• Web Standards Overview• Web Standards:
• Data Formats• Transport• Addressing
• Metadata• Distributed Searching • Authentication• Deployment Issues• Questions
Aims of Talk• To give brief overview
of web architecture• To describe
developments to web standards
• To review emerging developments with metadata, distributed searching and authentication
• To briefly address implementation models
Aims of Talk• To give brief overview
of web architecture• To describe
developments to web standards
• To review emerging developments with metadata, distributed searching and authentication
• To briefly address implementation models
5
Standardisation
W3C• Produces W3C
Recommendations on Web protocols
• Managed approach to developments
• Protocols initially developed by W3C members
• Decisions made by W3C, influenced by member and public review
IETF• Produces Internet
Drafts on Internet protocols• Bottom-up approach to developments• Protocols developed by
interested individuals• "Rough consensus and working
code"
ISO• Produces ISO
Standards• Can be slow moving
and bureaucratic• Produce robust
standards
Proprietary• De facto standards• Often initially appealing
(cf PowerPoint, PDF)• May emerge as
standards
PNGHTMLZ39.50Java?
PNGHTMLZ39.50Java?
PNGHTMLHTTP
PNGHTMLHTTP
HTTPURNwhois++
HTTPURNwhois++
HTML extensionsPDF and Java?
HTML extensionsPDF and Java?
6
The Web Vision
Tim Berners-Lee's (and W3C's) vision for the Web:• Evolvability is critical
• Automation of information management: If a decision can be made by machine, it should
• All structured data formats should be based on XML
• Migrate HTML to XML
• All logical assertions to map onto RDF model
• All metadata to use RDF
See keynote talk at WWW 7 conference at <URL: http://www.w3.org/Talks/1998/0415-Evolvability/slide1-1.htm>
7
Web Protocols
Web initially based on three simple protocols:
• Data FormatsHTML (HyperText Markup Language) provides the data format for native documents
• AddressingURLs (Uniform Resource Locator) provides an addressing mechanism for web resources
• TransportHTTP (HyperText Transfer Protocol) defines transfer of resources between client and server
Data FormatHTML
AddressingURL
TransportHTTP
8
HTML History
HTML 1.0 Unpublished specification.
HTML 2.0 Spec. based on innovations from NCSA (forms and inline images!)
HTML 3.0 Proposed spec. (renamed from HTML+).Very comprehensive Failed to complete IETF standardisation Little implementation experience
Proprietary Introduction of proprietary HTML elements by Netscape and Microsoft
HTML 3.2 Spec. based on description of mainstream innovations in marketplace
HTML 4.0 Current recommendation
DilemmaProprietary extensions
cause problems.But experiments
are needed
1998
1994
1997
1995
1992
9
HTML 4.0, CSS 2.0 and DOMHTML 4.0 used in conjunction with CSS 2.0 (Cascading Style Sheets) and the DOM provides an architecturally pure, yet functionally rich environment
HTML 4.0 - W3C-Rec• Improved forms• Hooks for stylesheets• Hooks for scripting
languages• Table enhancements• Better printing
CSS 2.0 - W3C-Rec• Support for all HTML
formatting • Positioning of HTML
elements• Multiple media support
CSS Problems• Changes during CSS development• Netscape & IE incompatibilities • Continued use of browsers with
known bugs
CSS Problems• Changes during CSS development• Netscape & IE incompatibilities • Continued use of browsers with
known bugs
DOM - W3C-Rec• Document Object Model• Hooks for scripting
languages• Permits changes to
HTML & CSS properties and content
10
HTML Limitations
HTML 4.0 / CSS 2.0 have limitations:• Difficulties in introducing new elements
– Time-consuming standardisation process (<ABBREV>)
– Dictated by browser vendor (<BLINK>, <MARQUEE>)
• Area may be inappropriate for standarisation:– Covers specialist area (maths, music, ...)– Application-specific (<STUD-NUM>)
• HTML is a display (output) format• HTML's lack of arbitrary structure limits
functionality:– Find all memos copied to John Smith– How many unique tracks on Jackson Browne CDs
11
XML
XML:• Extensible Markup Language• A lightweight SGML designed for network use• Addresses HTML's lack of evolvability• Arbitrary elements can be defined (<STUDENT-NUMBER>, <PART-NO>, etc)
• Agreement achieved quickly - XML 1.0 became W3C Recommendation in Feb 1998
• Support from industry (SGML vendors, Microsoft, etc.)
• Support in Netscape 5 and IE 5
12
XML Concepts
Well-formed XML resources:Make end-tags explicit: <LI>...</LI>
Make empty elements explicit: <IMG .../>
Quote attributes <IMG SRC="logo" HEIGHT="20"
Use consistent upper/lower case
Valid XML resources:
Need DTD
XML Namespaces:Mechanism for ensuring unique XML elements:<?xmlns:FOO="http://foo.org/1998-001" prefix="i">
<P>Insert <i:PART>M-471</i:PART></P>
13
XML Deployment
Ariadne issue 15 has article on "What Is XML?"
Describes how XML support can be provided:
• Natively by new browsers
• Back end conversion of XML - HTML
• Client-side conversion of XML - HTML / CSS
• Java rendering of XML
Examples of intermediaries
See http://www.ariadne.ac.uk/issue15/what-is/See http://www.ariadne.ac.uk/issue15/what-is/
14
XLink, XPointer and XSL
XLink will provide sophisticated hyperlinking missing in HTML:
• Links that lead user to multiple destinations• Bidirectional links• Links with special behaviors:
– Expand-in-place / Replace / Create new window– Link on load / Link on user action
• Link databases
XPointer will provide access to arbitrary portions of XML resource
XSL stylesheet language will provide extensibility and transformation facilities (e.g. create a table of contents)
EnglandFrance
<commentary xml:link="extended" inline="false"> <locator href="smith2.1" role="Essay"/> <locator href="jones1.4" role="Rebuttal"/> <locator href="robin3.2" role="Comparison"/> </commentary>
<commentary xml:link="extended" inline="false"> <locator href="smith2.1" role="Essay"/> <locator href="jones1.4" role="Rebuttal"/> <locator href="robin3.2" role="Comparison"/> </commentary>
15
XML Update
Data / SchemasXML-Data: Submitted to W3C Jan 98 (Obsolete?)Document Content Description: Submitted Aug 98XSchema: Independent effort
Programming InterfaceDOM level 1: W3C Recommendation, May 98
Style & PresentationCSS level 2: W3C Recommendation, May 98Extensible Style Language: Working Draft, Aug 98
Relationship to Other ResourcesXLink , XPointer: Working Drafts, Mar 98XML Namespaces: Working Draft, Aug 98
Query LanguagesXML Query Language: Submitted to W3C Aug 98XQL: Independent effort
16
Addressing
URLs (e.g. http://www.bristol-poly.ac.uk/depts/music/) have limitations:
• Lack of long-term persistency– Organisation changes name– Department shut down or merged– Directory structure reorganised
• Inability to support multiple versions of resources (mirroring)
URNs (Uniform Resource Names):• Proposed as solution• Difficult to implement (no W3C activity in this
area)
17
Addressing - Solutions
DOIs (Document Object Identifiers):• Proposed by publishing industry as a solution• Aimed at supporting rights ownership• Business model needed
PURLs (Persistent URLs):• Provide single level of redirection
Pragmatic Solution:• URLs don't break - people break them• Design URLs to have long life-span
Further information:<URL: http://www.ukoln.ac.uk/metadata/resources/urn/>
<URL: http://hosted.ukoln.ac.uk/biblink/wp2/links.html>
18
TransportHTTP/0.9 and HTTP/1.0:
Design flaws and implementation problems
HTTP/1.1: Addresses some of these problems 60% server support Performance benefits! (60% packet traffic reduction) Is acting as fire-fighter Not sufficiently flexible or extensible
HTTP/NG: Radical redesign using object-oriented technologies Undergoing trials Gradual transition (using proxies) Integration of application (distributed searching?)
19
MetadataMetadata - the missing architectural component from the initial implementation of the web
Metadata - RDF
PICS, TCN,
MCF, DSig,
DC,...
AddressingURL
Data formatHTML
TransportHTTP
Metadata Needs:• Resource discovery• Content filtering• Authentication• Improved navigation• Multiple format support• Rights management
Metadata Needs:• Resource discovery• Content filtering• Authentication• Improved navigation• Multiple format support• Rights management
20
Metadata Examples
DSig (Digital Signatures initiative):• Key component for providing trust on the web• DSig 2.0 will be based on RDF and will support signed
assertion:– This page is from the University of Bath– This page is a legally-binding list of courses
provided by the University
P3P (Platform for Privacy Preferences):• Developing methods for exchanging Privacy Practices
of Web sites and user
Note that discussions about additional rights management metadata are currently taking place
21
RDF
RDF (Resource Description Framework):• Highlight of WWW 7 conference
• Provides a metadata framework ("machine understandable metadata for the web")
• Based on ideas from content rating (PICS), resource discovery (Dublin Core) and site mapping (MCF)
• Applications include:– cataloging resources – resource discovery– electronic commerce – intelligent agents– digital signatures – content rating– intellectual property rights – privacy
• See <URL: http://www.w3.org/Talks/1998/0417-WWW7-RDF>
22
RDF ModelRDF:
• Based on a formal data model (direct label graphs)
• Syntax for interchange of data
• Schema model
Resource ValuePropertyType
Property
page.html £0.05Cost
11-May-98
ValidUntil
RDF Data Model
page.html £0.05
11-May-98
Property
Cost
InstanceOf
ValidUntil
ValuePropObj
Cost
PropName Note names may change before release of W3C recommendations
Note names may change before release of W3C recommendations
23
RDF Example
Example of Dublin Core metadata in RDF<rdf:RDF xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#"xmlns:dc="http://purl.org/dc/elements/1.0/">
<rdf:RDF> <rdf:Description RDF:HREF="page.html"> <dc:Creator>John Smith</dc:Creator> <dc:Title>John’s Home Page</dc:Title> </rdf:Description></rdf:RDF>
24
Browser Support for RDF
Mozilla (Netscape's source code release) provides support for RDF.
Mozilla supports site maps in RDF, as well as bookmarks and history lists
See Netscape's or HotWired home page for a link to the RDF file.
Trusted 3rd
Party Metadata
Embedded Metadata
e.g. sitemaps
Image from http://purl.oclc.org/net/eric/talks/www7/devday/Image from http://purl.oclc.org/net/eric/talks/www7/devday/
25
RDF Conclusion
RDF is a general-purpose framework RDF provides structured, machine-
understandable metadata for the Web Metadata vocabularies can be developed
without central coordination Role for eLib projects in defining schemas?
RDF Schemas describe the meaning of each property name
Signed RDF is the basis for trust
26
Distributed Searching
Distributed searching important for the DNER (Distributed National Electronic Resource)
ROADS prototype provides cross-searching using whois++
ROADS prototype provides cross-searching using whois++
http://prospero.ahds.ac.uk:8080/ahds_live/
AHDS prototype provides cross-searching using Z39.50
AHDS prototype provides cross-searching using Z39.50
27
Distributed Searching Issues
Providing access to resources by software rather than by humans raises several issues:
• Loss of visibility of service / value-added web services• Possible performance problems• Information overload• Finding the service
Solutions:• Giving visibility and pointers in results sets• Service metadata:
– Service only available for cross-searching by non AC.UK users outside peak hours
• Need for agreed metadata standards (profiles, rights issues, …)
28
Collection Description Work
Collection Description Group:• UKOLN involvement in producing list of attributes for
collection level description (in the library, museum, archival sense), which includes databases of Internet resource descriptions such as SOSIG.
• Work of interest to clumps and hybrid libraries.• WG membership: Dan Brickley (ROADS), Andy
Powell (ROADS), Matthew Dovey (Music Online, MALIBU), Verity Brack (RIDING), Dennis Nicholson (BUBL/CAIRNS) and David Kay (FD)
• See <URL: http://www.ukoln.ac.uk/metadata/cld/>
• Collection Description eLib supporting study due out in Oct. Will define attribute set (cf Dublin Core)
29
Relevant Protocols
Number of formats and protocols could be used to implement distributed searching:• Z39.50
ISO standard. Well-known in library world, but heavy-weight
• whois++Lightweight IETF standard. Used in several ANR gateways, but not widely deployed
• LDAPLightweight version of X.500 directory service.
• HTTP/NG?Opportunity to develop new solution using object-oriented technologies based on above experiences?
30
Protocols & Collections
Which formats and protocols are relevant to collection descriptions for use by software developers?
XML: Structured data formats should be based on XML - W3C
RDF: All metadata applications should be based on RDF - W3C
IETF WebDav:Requirement for distributed authoring include author metadata and collection definitions.
31
IETF WebDav
WebDav:• Web Distributed Authoring and Versioning
• An IETF Application Area
• Relevant proposals:
– "WebDAV Advanced Collections Protocol" – "Requirements for Advanced Collection
Functionality in WebDAV "
– "Requirements for DAV Searching and Locating"
• See <URL: http://www.ietf.org/html.charters/webdav-charter.html> and <URL: http://www.ietf.org/ids.by.wg/webdav.html>
32
How Metadata Could Be Used
Database Description• Music resources, including ...
Policy (Terms & Conditions / Resource and Service)• For licensing reasons, access is restricted to authorised HEIs• For performance reasons, access restricted to UK HEI between
9.00-17.00• The service logo must be included in results set, unless results
only come from service• Permission for cross-searching restricted to other eLib projects• You're only allowed to link to the main entry point
Individual• Give me HTML or PDF resources, not Word, …• I'm blind. Include ACSS in results and deliver a sitemap
Client Software • My browser doesn't support XML,so send me HTML
33
Deployment ModelsToday integration with cross-searching services uses technologies such as CGI on top of HTTP.
It is difficult to provide rich functionality, due to the simplicity of HTML and HTTP.
HTTP/NG may provide closer integration between applications and the web.
NOTE need for open authentication system (public key infrastructure / DSig?)
Web server
Distributed Searching
Loose integration
Web Server
Z39.50 server
RDFdefn.
RDFdefn.
RDFdefn.
Explaindatabase
whois++ server
Centroids
34
What's Needed?
In order to deploy distributed cross-searching in an open, application-independent way we need:
• Metadata in a machine-readable format - RDF
• Syntax for describing the metadata - see RDF pages at <URL: http://www.w3.org/RDF/>
• Language for processing metadata - see XML-QL, A Query Language for XML at <URL: http://www.w3.org/Submission/1998/12/>
• An open authentication infrastructure
Issues:• Timescales • Costs
• Software support • Protocol support
• Short-term pragmatic solution vs long-term purer solution
35
Authentication
Deployment of an open, scaleable, flexible authentication system is difficult & expensive
Current solutions include:• Server-based username and password schemes
• IP-based schemes
• Athens - Based on replicated Sybase application See <URL: http://www.athens.ac.uk/>
• W3C DSig work - Digital Signatures Initiative. See <URL: http://www.w3.org/DSig/>
• Other Public Key developments - e.g. reports of Post Office involvement, statements from Tony Blair, EU, ..
"In May 1998 the Commission published its proposal for a "European Parliament and Council Directive on a Common
Framework for Electronic Signatures" (COM(1998)297)."
36
Certificates
Should we be looking into using commercially-supported digital ids, such as Verisign's?• Can purchase server
ID for $349• End user certificates
available
http://www.verisign.com/
37
Browser Support
Browsers such as IE provide support for certificates:
Use certificates to positively identify yourself, certificate authorities andpublishers
Trust sites, people and publishers with credentials issued by the following Certifying Authorities
Trust sites, people and publishers with credentials issued by the following Certifying Authorities
You have designated the following software publishers and credential agencies as trustworthy. Windows software can install software .. certified by these publishers with asking you first
You have designated the following software publishers and credential agencies as trustworthy. Windows software can install software .. certified by these publishers with asking you first
38
Using Digital Keys
Diagram taken from a Versign White Paper
Diagram taken from a Versign White Paper
Client initiates a connection
Server responds, sending the client its digital ID. The server might also request the client's
digital ID for client authentication.
The client verifies the server's digital ID. If requested the client sends its digital ID in response
to the server's request.
When authentication is complete the client sends the server
a session key encrypted using the server's public key.
Clienthello
ServerDigital ID
ClientDigital ID
Server
Once a session key is established, securecommunications commence between client & server
SessionKey
39
Authentication & EUA search for "digital signature" at <URL: http://www.open.gov.uk/> provided interesting hits:
• DTI Briefing Paper on "Encryption and Digital Signatures" at <URL: http://www.dti.gov.uk/eurobrief/3encrypt.htm>
• European Internet Forum Policy Papers at <URL: http://www.ispo.cec.be/eif/#digital>
• "Towards A European Framework for Digital Signatures And Encryption" at <URL: http://www.ispo.cec.be/eif/policy/97503toc.html>
Will we see development of an open authentication infrastructure funded through Fifth Framework?
• See http://www.cordis.lu/fifth/src/comm.htm
40
Further Information Further Reading:
• Microsoft Security Advisor at <URL: http://microsoft.com/security/>
• JISC Reports at <URL: http://www.jisc.ac.uk/pub/index.html#issues>
• WWW Security FAQ at <URL: http://www.w3.org/Security/Faq/>
41
Intermediaries can provide functionality not available at client:
• DOI support• XML support• Format conversion
Intermediaries can provide functionality not available at client:
• DOI support• XML support• Format conversion
Deployment IssuesMore sophisticated deployment techniques can be adopted to overcome deficiencies in simple model
HTML resource
browserWeb server
Web server simply sends file to clientFile contains redundant information (for old browsers) plus client interrogation support
HTML / XML /
databaseresource browser
Server proxy
Client proxy
Original Model
Sophisticated Model
IntelligentWeb server
Example of an intermediary
42
Conclusions
To conclude:• Standards are important, especially for national
initiatives, such as eLib• Proprietary solutions are often tempting because:
– They are available– They are often well-marketed and well-supported– They may become standardised– Solutions based on standards may not be properly
supported by applications
• Metadata is an important new protocol area• Metadata work to support distributed searching is
beginning• Intermediaries may have a role to play in deploying
standards-based solutions
43
Question Time
Any questions?