Archives hub ead 2010_extended

Preview:

DESCRIPTION

Extended version of Archives Hub presentation

Citation preview

Introduction to EAD (extended version)

Lisa Jeskins and Bethan RuddockArchives HubMimas

By the end of today’s session we will have given you an introduction to:

• what interoperability means• what XML is, what it does and why it is important• EAD structure and syntax• EAD and hierarchies• UK Archives Discovery Network (UKAD)

Objectives

Interoperability

the ability of two or more systems or components to exchange information and to use the information that has been exchanged

(IEEE Standard Computer Dictionary )

What is Interoperability?

the ability to exchange/share data

integration of information resources presented in different formats

within a domain or across domains

advantages of cross-searching

XML facilitates interoperability

About Interoperability

Data exchange standards such as:

◦Z39.50

◦SRU

Types of interoperability

user can easily search across and retrieve resources from a wealth of systems

moving beyond individual websites for individual resources (silo approach)

End result…

http://www.ukoln.ac.uk/interop-focus/

◦to explore, publicise and mobilise the benefits and practice of effective interoperability across diverse information sectors

Interoperability Focus

An Introduction to XML

Extensible Markup Language

XML is a grammatical system for creating languages: ◦ a meta-language

Use XML to design your own markup language, consisting of meaningful tags that describe the data they contain

Create a language for describing…anything

What is XML?

XML does not do anything itself. It is pure information wrapped in XML tags

You must use other means to send, receive or display the data

Something to remember about XML

XML XML technologies

is used by to createDetailed description to view in a browser

Summary entry to view in a browser

PDF for print

XML is not about content, though there might be certain restrictions on content

XML is essentially about structure

Creating a consistent structure via XML tagging enables content to be easily identified (by machines) and used flexibly

XML provides structure

XML: elements

<title> Alice in Wonderland </title>

*XML allows you to define your tags*

<book>Alice in Wonderland</book>

<filmtitle>Alice in Wonderland</filmtitle>

<tag> content </tag>

Attributes are simple name/value pairs associated with an element

<tag attribute_name=“attribute_value”>content</tag>

<language>English</language>

<language langcode=“eng”>English</language>

<date normal=“2004”>20 Sept 2004</date>

XML attributes

XML Syntax

<tag attribute_name=”attribute_value”>content</tag>

<tree>hornbeam</tree>

<tree type=”deciduous”>hornbeam</tree>

<date normal=”2004”>20 May 2004</date>

<date>20 May 2004</date>

This is an XML element

<trees><tree type=“deciduous”>

<species>oak</species><fruit>acorn</fruit>

</tree><tree type=“coniferous”>

<species>pine</species><fruit>pine cone</fruit>

</tree></trees>

Nested elements

<catalog><cd>

<title>OK Computer</title><artist type=“band”>Radiohead</artist><genre>pop</genre><year>1997</year>

</cd>

<cd><title>Stanley Road</title><artist type=“solo”>Paul Weller</artist><genre>pop</genre><year>1995</year>

</cd></catalog>

XML example

<title>Stanley Road</title><artist>Paul Weller</artist><type>solo</type><genre>pop</genre><year>1995</year>

Alice in WonderlandLewis Carroll1 volumehardback

Content

Title Alice in Wonderland

Author Lewis Carroll

Extent 1 volume

Format hardback

Content in a database

<books><title>Alice in Wonderland</title><author>Lewis Carroll</author><extent>1 volume</extent><format>hardback</location></books>

XML: Structure

a root element is required<catalog>

…..all your tags and content…</catalog>

closing tags are required

case matters

XML must be well-formed

elements must be properly nested

<physdesc><extent>10 boxes</extent></physdesc>

<physdesc><extent>10 boxes</physdesc></extent>

XML must be well-formed (2)

attribute values must be enclosed in quotation marks, e.g. langcode=“fre”

element names must obey some basic rules◦ e.g. cannot start with numbers or punctuation characters,

cannot contain spaces ◦ e.g. <cd name> or <?name> would be incorrect

XML must be well-formed (3)

Marking up a recipe

Look at the following recipe for Chocolate Brownies – How would use XML to mark this up?

(I’m reliably informed the recipe works!)

375g butter 375g dark chocolate 1 tablespoon vanilla extract 6 eggs 500g sugar 225g plain flour

Preheat the oven to 180°C, 350°F or gas mark 4. Grease a swiss roll tin or oblong baking dish. Melt the chocolate and butter in a bowl over a saucepan of hot water. Add the vanilla and set the mixture aside until it is lukewarm.

Whisk the eggs and sugar into the mixture. Sift in the flour and baking powder and fold gently until the mixture is just combined. Pour into the greased tin and bake for 20 to 30 minutes until the brownie is cooked around the edges, but still soft in the middle.

Cool and cut into squares. Makes 48 brownies

Chocolate Brownies

<recipe><title>Chocolate Brownies</title>

<ingredients><item>375g butter</item><item>375g dark chocolate</item><item>1 tablespoon vanilla extract</item><item>6 eggs</item><item>500g sugar</item><item>225g plain flour</item></ingredients>

<method><p>Preheat the oven to <temp>180°C, 350°F or gas mark 4</temp>.Grease a swiss roll tin or oblong

baking dish. Melt the chocolate and butter in a bowl over a saucepan of hot water. Add the vanilla and set the mixture aside until it is lukewarm. Whisk the eggs and sugar into the mixture.</p>

<p>Sift in the flour and baking powder and fold gently until the mixture is just combined. Pour into the greased tin and bake for <bakingtime>20 to 30 minutes</bakingtime> until the brownie is cooked around the edges, but still soft in the middle.</p>

<p>Cool and cut into squares.</p></method><serving>Makes 48 brownies</serving></recipe>

Possible XML markup for recipe

<ingredient>375 g butter</ingredient>

Or

<ingredient><item>375 g butter</item>

</ingredient>

Or

<ingredient><type>butter</type><quantity>375 g</quantity>

</ingredient>

Exchanging recipes..?

http://www.archiveshub.ac.uk/temp/recipe.xml

Displaying the recipe online

Valid XML: rules specify elements and attributes used and how used

Valid XML provides consistency and facilitates the exchange of data

Valid XML is important for displaying, processing and exchanging XML in a wider environment

Valid XML

A Document Type Definition or Schema defines the building blocks of an XML document

It specifies elements and attributes and defines how they can be used

People can agree to use a common DTD/Schema for interchanging data

Document Type Definitions

<?xml version="1.0" encoding="UTF-16"?><!ELEMENT recipe (title, intro?, ingredients+, method, serving*)><!ELEMENT title (#PCDATA)><!ELEMENT intro (#PCDATA)><!ELEMENT ingredients (item+)><!ELEMENT item (#PCDATA)><!ELEMENT method (p+)><!ELEMENT p (#PCDATA | temp | bakingtime)*><!ELEMENT temp (#PCDATA)><!ELEMENT bakingtime (#PCDATA)><!ELEMENT serving (#PCDATA)>

Recipe DTD

Schemas perform the same task as DTDs

Schemas use XML syntax

Schemas support complex data types

Easier to describe allowable content

One XML document can point to more than one schema

Schemas

<?xml version="1.0"?><notexmlns="http://www.w3schools.com"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3schools.com note.xsd">

<note> <to>Rachel</to> <from>John</from>

<heading>Reminder</heading> <body>Don't forget the concert!</body>

</note>

A simple XML document

<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com" elementFormDefault="qualified">

<xs:element name="note"> <xs:complexType>

<xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/>

</xs:sequence> </xs:complexType> </xs:element> </xs:schema>

Example of a simple Schema

What about display?

XML file DTD or Schema Valid XML

Blue Elephant Papers

……………………

…………

Blue Elephant Papers

Browse List

Use XML technologies – for displaying, retrieving, transforming, manipulating

XSLT – Extensible Stylesheet Language for Transformations

Many technologies available to manipulate XML documents

Displaying XML

transformation involves the reading in of an XML file and an XSLT file to a processor, which can then generate some output – typically HTML

Transformation of XML

XSLT

XML

processorHTML output

HTML is ONLY for display, typically in a Web browser

HTML tags do not describe the content

HTML cannot easily be extracted by machines for different purposes

XML tags can be specified by anyone; HTML tags are prescribed

HTML and XML (1)

HTML and XML (2)

HTML: <h1> Papers of Peter Rowe </h1>XML: <title> Papers of Peter Rowe </title>

HTML: <b> 21 May 2004 </b>XML: <date> 21 May 2004 </date>

International standard, supported by the W3C

It is open, licence free and platform neutral

It is human and machine readable

XML documents are text documents

Why use XML?

XML does not determine the presentation of the data◦ use stylesheets to present XML data◦ with proprietary systems content is inextricably bound up

with format

Hierarchical structure – good for archive descriptions!

More reasons to use XML...

XML is the main basis for defining data exchange languages

Meaningful tags facilitate extraction – data can be manipulated as required

...and for data exchange

All publicly funded bodies should use XML for data exchange (e-GIF)

XML has been widely adopted commercially as well as in the public sector

The Government mandates XML

XML is:◦ simple◦ flexible◦ great for data exchange

XML must be: ◦ well-formed ◦ valid

DTDs and Schemas:◦ to create valid XML◦ provide tags, attributes and rules

XML requires other XML technologies◦ e.g. stylesheets can transform XML for display

Summary

EAD: An introduction

EAD = Encoded Archival Description

EAD is XML for finding aids

A data structure standard – not a content standard

A structure that allows finding aids to be indexed, searched, retrieved and navigated

Compatible with ISAD(G)

What is EAD?

EAD is:

Flexible enough to deal with all types of finding aids: single or multi-level, long or short, lists or calendars etc.

Used to create new finding aids as well as converting old ones to standardised form

Used to share data between systems

What is EAD?

EAD is maintained and developed by an international working group

Develops and publishes documentation and tools: tag library, guidelines, EAD Cookbook, websites

EAD Working Group - EADWG

EAD structure

<ead>

<eadheader></eadheader>

<archdesc><did></did>

</archdesc>

</ead>

Basic EAD file structure

<ead> EAD root element<eadheader> EAD file information wrapper

</eadheader>

<archdesc> Finding aid wrapper

<did></did> Core collection information wrapper

</archdesc></ead>

Basic EAD file structure

EAD beetle

<archdesc>

<eadheader>

<did>

sub-fonds descriptions

<eadheader><eadid><filedesc>

<titlestmt><titleproper>

<profiledesc> <revisiondesc>

<eadheader>

EAD file informationIdentifier

TitleCreationRevision

Within <archdesc> there are elements for:

Description Presentation Hierarchy

Finding aid elements

<archdesc><did><scopecontent> <bioghist> <arrangement> <controlaccess>

Descriptive elements

Archival descriptionDescriptive informationScope and ContentBiographical/Admin. HistoryArrangementAccess points

<did><unitid><unititle><unitdate><origination><repository><physdesc>

<extent><genreform><physfacet>

<physloc><container><abstract>

</did>

Descriptive informationReferenceTitleCovering datesCreator(s)RepositoryPhysical description

ExtentFormPhysical Facet

LocationContainer typeBrief description

<did> elements

<archdesc level="fonds"> <did> <unitid>GB 0001 Foster</unitid> <unittitle>Papers of Dr Foster</unittitle> <unitdate normal = "1820-1833">1820-1833</unitdate> <repository>University of Gloucestershire</repository> <physdesc> <extent>1 box</extent> <physfacet>Four folders of letters, 230 folios</physfacet> </physdesc> <langmaterial><language langcode=“eng”>English<language> </langmaterial> <origination>Dr Foster</origination> </did>

Hub <did> EAD2002

<acqinfo><custodhist><appraisal><processinfo><accruals><altformavail><accessresrict><userestrict>

<prefercite>

Acquisition informationCustodial historyAppraisal and selectionProcess InformationAccruals information CopiesAccess restrictionsUser restrictionsCitation information

Administrative information elements

<bibliography><fileplan><otherfindaid><relatedmaterial><separatedmaterial><index>

Publication noteClassification schemeOther finding aidsRelated materialSeparated material Keywords

Additional information elements

<controlaccess><name><corpname><persname><famname><geogname><occupation><function><genreform><subject>

Controlled access headingsNames (general)Corporate body namePersonal nameFamily namePlace name OccupationsFunctions (administrative)Genre and FormSubject

<controlaccess> elements

<head><p>; <lb> <emph>; <blockquote><list><item>;<chronlist><chronitem>;

<ref>; <ptr>; <dao>

HeadingsLayoutItalics and quotesLists

References, pointersand links to digital objects

Presentation elements

<head><p>; <lb> <emph>; <blockquote><list><item>;<chronlist><chronitem>;

<ref>; <ptr>; <dao>

HeadingsLayoutItalics and quotesLists

References, pointersand links to digital objects

Presentation elements

NB: EAD is NOT about the presentation of your finding aids, but about their

syntax. Separate software will take care of the display of the information.

ISAD(G) (v.2)

3.1.1 Reference code(s)

3.1.2 Title3.1.3 Dates of creation3.1.4 Level of description3.1.5 Extent of the unit3.2.1 Name of creator3.2.2 Administrative/Biographical

history3.2.3 Custodial history3.2.4 Immediate source of acquisition3.3.1 Scope and content3.3.2 Appraisal, destruction and

scheduling

EAD 2002

<unitid> countrycode and repositorycode attributes

<unittitle><unitdate><archdesc> and <c> level attribute

<physdesc>, <extent><origination><bioghist>

<custodhist><acqinfo>

<scopecontent><appraisal>

ISAD(G) to EAD

3.3.3 Accruals 3.3.4 System of arrangement3.4.1 Access conditions3.4.2 Copyright/Reproduction3.4.3 Language of material3.4.4 Physical characteristics3.4.5 Finding aids3.5.1 Location of originals3.5.2 Existence of copies3.5.3 Related units of description

3.5.4 Publication note3.6.1 Note

<accruals><arrangement><accessrestrict><userestrict><langmaterial><phystech><otherfindaid><originalsloc><altformavail><relatedmaterial> and <separatedmaterial>

<bibliography><odd>

ISAD(G) to EAD

EAD version 1 DTD

EAD 2002 DTD

EAD 2002 Schema

Available from http://www.loc.gov/ead/

Human-readable version: EAD Tag Library (Society of American Archivists)

EAD DTD

Library of Congress Official EAD site: http://www.loc.gov/ead/

Tag Library: http://www.loc.gov/ead/tglib/index.html

EAD Roundtable Help Pages: http://www.archivists.org/saagroups/ead/

EAD Documentation

EAD and hierarchy

ISAD(G) states that to be a conformant archival description a finding aid must:

Be hierarchical◦ Description from the general to the specific◦ Information relevant to the level of description◦ Linking of descriptions (logical sequence)◦ Non-repetition of information

Contain a minimum set of data elements

EAD and ISAD(G)

Recommended elements for lower level descriptions:◦ reference code ◦ title ◦ date(s) ◦ extent of the unit of description ◦ level of description

Lower level elements

ISAD(G) levels: Fonds Sub-fonds Series Sub-series File Item

EAD levels:<archdesc><dsc><c01><c02><c03><c04> <c05>

EAD and Hierarchy

<ead>…<archdesc>

[collection level description here]◦ <dsc>

<c01>[series] description 1<c02>[file] description 1</c02><c02>[file] description 2

<c03>[item] 1</c03><c03>[item] 2</c03>

</c02></c01><c01>[series] description 2....

◦ </dsc></archdesc>

</ead>

Representing hierarchies

c02 c02

c03 c03

c01

<c01 level = "subfonds"><did>

<unitid>GB 0324 MS 54</unitid><unittitle>Correspondence files</unittitle><unitdate>1920-1945</unitdate><physdesc><extent>4 files</extent></physdesc>

</did><scopecontent>…</scopecontent>

<c02 level = "series"><did>…</did><scopecontent>…</scopecontent>

</c02>

</c01>

Nesting items

EAD supports two ways of representing levels

<c> is used in A2A, <c0*> on the Hub

Slightly easier to use <c0*>, as the numbers give you more of an idea of the level you are working at

<c> or <c0*>?

<dsc type="combined">

<c level="series"> <did> <unitid>Series 1</unitid><unittitle>Correspondence</unittitle> </did><scopecontent>[...]</scopecontent>

<c level="subseries"> <did> <unitid>Subseries 1.1</unitid> <unittitle>Outgoing Correspondence</unittitle> </did>

<c level="file"> <did> <unittitle>AbbingerAldrich</unittitle> </did> </c> </c> </c> </dsc>

Hierarchy <c> tag

XML is a meta-language for creating mark-up languages

XML files require other technologies for display, processing, etc.

For archive finding aids EAD is the DTD/Schema to use

Summing-up

It is XML, which is an international standard

It is a simple and effective way of structuring content and providing meaning

Machines can manipulate the content in all sorts of ways

It is a great format to store finding-aids

EAD is a good thing because…

Cross-searching initiatives

Effective cross-searching requires:

◦Interoperability

which requires

◦Common standards

Cross-searching

UK Archives

UKAD: http://www.ukad.org/

To promote the opening up of data and to offer capacity for such a cross-searching capability across the UK archive networks and online repository catalogues

To lead and support resource discovery through the promotion of relevant national and international standards

To support the development and use of name authorities

UK Archives Discovery Network

To advocate for the reduction of cataloguing backlogs and the retro-conversion of hard-copy catalogues

To promote access to digitized and digital archives via cross-searching resource discovery systems.

To work with other domains and potential funders to promote archive discovery

UKAD

Fairly loose structure

Meetings about twice a year

Forum for discussion, sharing, connecting and collaborating

Creating a framework for activities (matrix)◦ International/national/regional◦ Meeting UKAD objectives, e.g. open up data; standards-based resource

discovery; retro-conversion

UKAD activities

Not many UK archives currently using EAD as a storage format

EAD will increasingly be used as an export format from proprietary database systems like CALM, for use in XML-based gateways such as Aim25 and the Archives Hub

New software becoming available all the time, which makes it easier to create, search and display XML – much of this is open source and often free

EAD in the real world

Differences in how EAD is used

Encourages interoperability but still requires work to ensure seamless cross-searching

EAD is flexible and includes a large number of tags which has advantages and disadvantages

EAD in the Hub and Aim25

XML is an international standard for sharing information

EAD is the XML language for archival finding aids

EAD is not a content standard

Use ISAD(G) for content guidelines and thesauri or authority files for index terms

Summing-up

You have used the Archives Hub’s EAD editor to create EAD records

XML Editors, such as XMetal or XMLspy can provide help with validating and with selecting tags and attributes

EAD will become increasingly important

Summing-up

Any Questions?

Recommended