5
A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background Dublin Core is the name given to a standard set of core metadata elements used for resource discovery. Metadata has an important role to play in many digital library applications. The Dublin Core standard has been widely adopted in many digital library applications. The Problem Lack of compliance with standards is well-known in Web applications, particularly with HTML. Despite the availability of a range of HTML validation tools, these do not appear to be widely used and many Web authors appear to check their documents simply by viewing in Web browsers. There is a danger that Dublin Core metadata embedded in HTML will fail to comply with standards – a possibly which is more likely due to the lack of a visual display of Dublin core metadata.

A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background Dublin

Embed Size (px)

Citation preview

Page 1: A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background Dublin

A centre of expertise in digital information management

Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents

BackgroundDublin Core is the name given to a standard set of core metadata elements used for resource discovery.Metadata has an important role to play in many digital library applications. The Dublin Core standard has been widely adopted in many digital library applications.

The ProblemLack of compliance with standards is well-known in Web applications, particularly with HTML.Despite the availability of a range of HTML validation tools, these do not appear to be widely used and many Web authors appear to check their documents simply by viewing in Web browsers.There is a danger that Dublin Core metadata embedded in HTML will fail to comply with standards – a possibly which is more likely due to the lack of a visual display of Dublin core metadata.

The ProblemLack of compliance with standards is well-known in Web applications, particularly with HTML.Despite the availability of a range of HTML validation tools, these do not appear to be widely used and many Web authors appear to check their documents simply by viewing in Web browsers.There is a danger that Dublin Core metadata embedded in HTML will fail to comply with standards – a possibly which is more likely due to the lack of a visual display of Dublin core metadata.

Page 2: A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background Dublin

A centre of expertise in digital information management

A Simple Approach To Validation

Use of DC-dotDC-dot is a popular Web-based tool for creating and managing Dublin Core metadata. DC-dot can also be used to carry out simple validation of

Dublin Core embedded in HTML resources.

FindingsUse of DC-dot across a digital library programme showed that the entry points contained various errors in the representation of Dublin Core:

• Use of DC.Author rather than DC.Creator• Incorrect format of date field• Incorrect use of delimiters

FindingsUse of DC-dot across a digital library programme showed that the entry points contained various errors in the representation of Dublin Core:

• Use of DC.Author rather than DC.Creator• Incorrect format of date field• Incorrect use of delimiters

Limitations of DC-dotDC-dot has several limitations:

• It only performs basic validation

• It was not designed primarily as a validation tool

• It cannot be easily extended (e.g. Applied with other application profiles)

Page 3: A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background Dublin

A centre of expertise in digital information management

Using An RDF Validator

Use of An RDF ValidatorAn alternative tested was to make use of W3C's online Dublin Core to RDF XLST transformation service and the RDF validator. This approach made use of several online services which were chained together:

• Tidy to convert project home page to XHTML format• Dublin Core to RDF XLST transformation service to

convert embedded Dublin Core elements to RDF format• RDF validation service to validate the RDF format

Comments This approach helped by providing a visual display of the Dublin Core metadata.It was noticed, for example, that one page contained an invalid identifier: http:/www.foo.ac.uk/... rather than http://www.foo.ac.uk/...However since the RDF validation service has no understanding of the semantics of the Dublin Core metadata, this approach has its limitations .

Comments This approach helped by providing a visual display of the Dublin Core metadata.It was noticed, for example, that one page contained an invalid identifier: http:/www.foo.ac.uk/... rather than http://www.foo.ac.uk/...However since the RDF validation service has no understanding of the semantics of the Dublin Core metadata, this approach has its limitations .

Page 4: A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background Dublin

A centre of expertise in digital information management

dcmeta: An XSLT Approach

Use of XSLTWe have pioneered use of XSLT to provide validation of Dublin Core metadata embedded in HTML resources.

The XSLT approach:• Creates a report on

DC metadata embedded in an XHTML document

• Is designed with knowledge of the Dublin Core semantics by checking against an application profile of the DC Metadata Element Set.

The profile is a set of rules which specify:• Permitted DC properties (e.g. only the 15 core DC

elements are allowed)• Minimum/maximum permitted occurrences of a

specified property (e.g. only one occurrence of DC.Title permitted)

• Permitted encoding schemes (e.g. DC.Subject properties should have the scheme "LCSH")

• Permitted values (e.g. DC.Publisher must have the value "UKOLN")

Page 5: A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background Dublin

A centre of expertise in digital information management

Conclusions

SummaryThis poster summarises a number of approaches to validating Dublin Core metadata embedded in HTML resources. The poster also describes initial work in the development of an XSLT-based tool for validation.

Future WorkThe XSLT stylesheet is available as open source, and we invite interested parties to develop this work further.Areas in which the tool could be developed include:

• Development of the Web interface to the tool• Allowing local rules to be included• Deploying the tool as a bookmarklet• Deploying the tool as a "Web Service"

ImplementationThe service is available at <http://www.ukoln.ac.uk/metadata/dcmeta/>

Contact DetailsFor further information please contact Pete Johnston, UKOLN by sending email to <[email protected]>