Upload
pauline-mcdonald
View
212
Download
0
Embed Size (px)
Citation preview
A centre of expertise in digital information management
Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents
BackgroundDublin Core is the name given to a standard set of core metadata elements used for resource discovery.Metadata has an important role to play in many digital library applications. The Dublin Core standard has been widely adopted in many digital library applications.
The ProblemLack of compliance with standards is well-known in Web applications, particularly with HTML.Despite the availability of a range of HTML validation tools, these do not appear to be widely used and many Web authors appear to check their documents simply by viewing in Web browsers.There is a danger that Dublin Core metadata embedded in HTML will fail to comply with standards – a possibly which is more likely due to the lack of a visual display of Dublin core metadata.
The ProblemLack of compliance with standards is well-known in Web applications, particularly with HTML.Despite the availability of a range of HTML validation tools, these do not appear to be widely used and many Web authors appear to check their documents simply by viewing in Web browsers.There is a danger that Dublin Core metadata embedded in HTML will fail to comply with standards – a possibly which is more likely due to the lack of a visual display of Dublin core metadata.
A centre of expertise in digital information management
A Simple Approach To Validation
Use of DC-dotDC-dot is a popular Web-based tool for creating and managing Dublin Core metadata. DC-dot can also be used to carry out simple validation of
Dublin Core embedded in HTML resources.
FindingsUse of DC-dot across a digital library programme showed that the entry points contained various errors in the representation of Dublin Core:
• Use of DC.Author rather than DC.Creator• Incorrect format of date field• Incorrect use of delimiters
FindingsUse of DC-dot across a digital library programme showed that the entry points contained various errors in the representation of Dublin Core:
• Use of DC.Author rather than DC.Creator• Incorrect format of date field• Incorrect use of delimiters
Limitations of DC-dotDC-dot has several limitations:
• It only performs basic validation
• It was not designed primarily as a validation tool
• It cannot be easily extended (e.g. Applied with other application profiles)
A centre of expertise in digital information management
Using An RDF Validator
Use of An RDF ValidatorAn alternative tested was to make use of W3C's online Dublin Core to RDF XLST transformation service and the RDF validator. This approach made use of several online services which were chained together:
• Tidy to convert project home page to XHTML format• Dublin Core to RDF XLST transformation service to
convert embedded Dublin Core elements to RDF format• RDF validation service to validate the RDF format
Comments This approach helped by providing a visual display of the Dublin Core metadata.It was noticed, for example, that one page contained an invalid identifier: http:/www.foo.ac.uk/... rather than http://www.foo.ac.uk/...However since the RDF validation service has no understanding of the semantics of the Dublin Core metadata, this approach has its limitations .
Comments This approach helped by providing a visual display of the Dublin Core metadata.It was noticed, for example, that one page contained an invalid identifier: http:/www.foo.ac.uk/... rather than http://www.foo.ac.uk/...However since the RDF validation service has no understanding of the semantics of the Dublin Core metadata, this approach has its limitations .
A centre of expertise in digital information management
dcmeta: An XSLT Approach
Use of XSLTWe have pioneered use of XSLT to provide validation of Dublin Core metadata embedded in HTML resources.
The XSLT approach:• Creates a report on
DC metadata embedded in an XHTML document
• Is designed with knowledge of the Dublin Core semantics by checking against an application profile of the DC Metadata Element Set.
The profile is a set of rules which specify:• Permitted DC properties (e.g. only the 15 core DC
elements are allowed)• Minimum/maximum permitted occurrences of a
specified property (e.g. only one occurrence of DC.Title permitted)
• Permitted encoding schemes (e.g. DC.Subject properties should have the scheme "LCSH")
• Permitted values (e.g. DC.Publisher must have the value "UKOLN")
A centre of expertise in digital information management
Conclusions
SummaryThis poster summarises a number of approaches to validating Dublin Core metadata embedded in HTML resources. The poster also describes initial work in the development of an XSLT-based tool for validation.
Future WorkThe XSLT stylesheet is available as open source, and we invite interested parties to develop this work further.Areas in which the tool could be developed include:
• Development of the Web interface to the tool• Allowing local rules to be included• Deploying the tool as a bookmarklet• Deploying the tool as a "Web Service"
ImplementationThe service is available at <http://www.ukoln.ac.uk/metadata/dcmeta/>
Contact DetailsFor further information please contact Pete Johnston, UKOLN by sending email to <[email protected]>