Upload
dwight-young
View
216
Download
0
Embed Size (px)
Citation preview
LBSC 670
Information Organization
Review
• Information, Organization, Representation– Documents– Metadata records
• Standards– Structure, communication, syntax, value, retrieval
• Dublin Core– Simple DC (15 elements)– Qualified DC (95 elements)
Today
• Evaluation (quickly)
• Review– HTML metadata/encoding– Dublin Core
• XML – Encoding– Schemas
Group Discussion
• http://en.wikipedia.org/wiki/Xml#Critique_of_XML– Read through the benefits/drawbacks
section – Can you think of some good/bad uses of
XML?– Based on your experience so far with XML
could you ever see a ‘regular’ user feeling comfortable with this technology?
Evaluation
• Metadata evaluation methods
• Greenberg Review (2002)– Toezer (1999)
• Accuracy, completeness, consistency, timeliness, and intelligibility
– Rothenberg (1996) • Correctness, appropriateness
– Zeng (1993)• Specificity, exhaustivity, record completeness
• Completeness, specificity, exhaustivity• Did the record capture essential elements of
the object?• Does the encoded record differentiate
appropriately between elements?• Document/Index surrogation, retrieval
• Is this a surrogate/abstraction and not a codification of the resource?
• Is the level of surrogation/abstraction appropriate for storage/retrieval/use goals?
Evaluating Representation
• Accuracy, consistency• Are the details of abstraction correct? • Is the content represented/encoded accurately?
• Utility, effectiveness, timeliness• Is the representation appropriate for a given
audience and use?• Does the representation solve an information
need?
Evaluating Representation
HTML metadata
• Examples– <meta> tags– Content of descriptive tags <title><h1>– Querystring - ?var1=cat&var2=dog– Get/Post requests
• Uses– Display, Automatic indexing, application
development
Encoding Schemes (HTML)
<html><head>
<title>A Sample HTML page</title></head><body>
<h1>A page header</h1><p>The content of the body element
</p></body>
</html>
Qualified Dublin Core (Recap)
• 95 Elements (Registry)
• Expansion of scope/purpose
• Multiple encoding models (HTML/XHTML, XML, RDF)
• Addition of Application Profile concept
Dublin Core Record<?xml version=“1.0” encoding=“UTF-8”?><!DOCTYPE qualifiedDublinCore PUBLIC '-//OCLC//DTD QDC v.1//EN'> <qualifiedDublinCore>
<Creator>Alliance of Baptists</Creator><Title>Alliance of Baptists Records, 1987 - 2001</Title><Source/><Language>en</Language><Coverage>2.1 linear feet</Coverage><Description>[Lots of text omitted]<Description><Rights>Collection is open.</Rights><Subject>Alliance of Baptists</Subject><Subject>Southern Baptist Alliance</Subject><Contributor/><Publisher>Z. Smith Reynolds Library, Wake Forest University</Publisher><Type>text/xml</Type><Relation>Alan P. Neely Papers, Z. Smith Reynolds Library, Relation><Identifier>http://zsr.wfu.edu/collections/digital/ead/allianceofbaptists.xml</Identifier>
</qualifiedDublinCore>
Encoding Schemes (XML)
• Required syntax– Document type declaration
• <!DOCTYPE food SYSTEM "food.dtd"> • Optional but useful
– Processing instruction• <?xml version="1.0"?>
– Elements• <elementname></elementname> notation• All elements must be closed <ex></ex> or <ex/>
– Attributes• <element attribute=“attributevalue”/>• Attributes must be enclosed in “”
– Text• <ex>fieldcontent</ex>
Encoding Schemes (XML) - 2
• File attributes– Elements are repeatable (if allowed by
DTD/Schema)– An XML file can contain multiple “records” – Certain characters like need to be
represented by escape sequences:• & = &• < = <• > = >
Encoding Schemes (XML) - 3
<?xml version="1.0"?> <rss version="2.0">
<channel> <title>Sample RSS File</title><link>http://urltofile.xml</link> <description>This is a sample</description> <language>en-us</language> <pubDate>Tue, 10 Jun 2003 04:00:00 GMT</pubDate> <item>…</item><item>…</item><item>…</item><item>…</item>
</channel>
</rss>
XML Components
• XML data model definition– Document Type Definitions (DTD)– XML Schema– Application Profiles
• XSL processing instructions– Transformation to new format– Powerful enough to serve as a simple
application development platform
DC recommendations
1. Use Schemas, not DTDS
2. Use Namespaces
3. Encode properties as XML elements and values as the content of those elements.
4. Property names for the 15 DC elements should be all lower-case.
5. Multiple property values should be encoded by repeating the XML element for that property.
DC recommendations (2)
6. Element refinements should be treated in the same way as other properties
7. Encoding schemes should be implemented using the 'xsi:type' attribute
6. dc:identifier xsi:type="dcterms:URI"
8. Element refinements and encoding schemes should use the names specified in the DCMI recommendation
9. Encode language references using the 'xml:lang' attribute
Exercise – Creating XML
• Tour of Exchanger• Creating Dublin Core XML from a
library catalog record• Creating a schema based on our XML
record• Validating our XML record
DTDs• DTD
<?xml version="1.0"?> <!DOCTYPE dublin [
<!ELEMENT dublin (title, creator, date, description)>
<!ELEMENT title (#PCDATA)> <!ELEMENT creator (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT description (#PCDATA)>
]>
• XML File<dublin>
<title>Ambient Findability</title> <creator>Peter Morville</creator> <date>2005</date> <description>A book about information use</description>
</dublin>
XML Schema<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com" elementFormDefault="qualified">
<xs:element name=“dublin"> <xs:complexType>
<xs:sequence> <xs:element name=“title" type="xs:string"/> <xs:element name=“creator" type="xs:string"/> <xs:element name=“date" type="xs:string"/> <xs:element name=“description" type="xs:string"/>
</xs:sequence> </xs:complexType>
</xs:element></xs:schema>
XML Schema features
• Ability to define datatypes• string, decimal, integer, boolean, date, time
• Ability to validate/restrict data content
• Enumeration, fractions, length, min/max values, min/max length, patterns (date/time), total digits, whitespace
• Ability to add content lists• Restrict to certain values• Link to external value schema
XML Namespaces
<?xml version="1.0"?> <metadata
xmlns="http://example.org/myapp/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://example.org/myapp/
http://example.org/myapp/schema.xsd" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/">
<dc:title> UKOLN </dc:title> <dcterms:alternative> UK Office for Library and
Information Networking </dcterms:alternative> <dc:subject> national centre </dc:subject>
XML validation
• XML document validation– DTDs (document structure)– XML Schema (document structure, element
value & format)
• XML element validation– Scheme content declaration
Complex examples
• Schema example– http://www.westinkriebel.com/Public/RSS20
.xsd
• Using Namespaces– http://dublincore.org/documents/dc-xml-gui
delines/
Recap
• Assignment 1
• Next Week
MARC metadata
• Definition– Machine Readable Catalog Record– Combination of content, value, and
encoding standard
• History– Created by Henriette Avram in 1968– Managed by the Library of Congress
MARC metadata
• The encoding standard– Variable length record– Set leader defines position of fields in record– Fixed fields in leader codifies format information– Variable length fields provide descriptive content
• Examples– System ready example record (LC)– Uses of MARC fields by OCLC
• More information– More information from LC
Encoded MARC record01802cam 22003371a
4500001001800000003000800018005001700026006001900043007001500062008004100077015001400118035002100132035001800153040005100171041001300222043002100235050001700256082001700273245011000290260006600400300002100466505053300487533015701020650002501177651003301202700004401235776003201279830004801311856009201359949001301451ASPS00000161/nwldVaAlASP20061114120112.0m | d | cr |n ---||a|a730321s1955 mnu 000 0 eng aGB56-6680 9(DLC) 55009368 a(OCoLC)585815 aDLCcODaUdOCoLCdMnHidUkdPBfGdDLCdVaAlASP1 aenghnor an-us---ae-no---00aE184.S2bB55 a325.2481097300aLand of their choiceh[electronic resource] :bthe immigrants write home /cedited by Theodore C. Blegen. a[Minneapolis, Minn.] :bUniversity of Minnesota Press,c1955. a463 p. ;c24 cm.0 aThe immigrant image of America -- The "sloopfolk" arrive -- Westward to El-a-noy -- Wisconsin is the place -- The Atlantic crossing -- Scouting the promised land -- Spreading the gospel -- Journeying toward new horizons -- Ordeal and debate -- Appraising the American scene -- The transatlantic gold rush -- Cheerful voices at mid-century -- More than a ballad -- A humorist in Canaan -- A lady grows old in Texas -- In defense of the southwest -- From a frontier parsonage -- The beautiful land -- The glorious new Scandinavia.I0aElectronic reproduction.bAlexandria, VA :cAlexander Street Press,d2002.f(North American women's letters and diaries).nAvailable via World Wide Web. 0aNorwegian Americans. 0aUnited StatesxCivilization.1 aBlegen, Theodore Christian,d1891-1969.1 cOriginalw(DLC) 55009368 0aNorth American women's letters and diaries.40zAccess restricted to subscribers.uhttp://www.aspresolver.com/aspresolver.asp?NWLD;S16101aER_NAWLD
Text formatted MARC• =LDR 01802cam 22003371a 4500• =001 ASPS00000161/nwld• =003 VaAlASP• =005 20061114120112.0• =006 m\\\\|\\\d\|\\\\\\• =007 cr\|n\---||a|a• =008 730321s1955\\\\mnu\\\\\\\\\\\000\0\eng\\• =015 \\$aGB56-6680• =035 \\$9(DLC) 55009368• =035 \\$a(OCoLC)585815• =040 \\$aDLC$cODaU$dOCoLC$dMnHi$dUk$dPBfG$dDLC$dVaAlASP• =041 1\$aeng$hnor• =043 \\$an-us---$ae-no---• =050 00$aE184.S2$bB55• =082 \\$a325.24810973• =245 00$aLand of their choice$h[electronic resource] :$bthe immigrants write home /$cedited by Theodore C. Blegen.• =260 \\$a[Minneapolis, Minn.] :$bUniversity of Minnesota Press,$c1955.• =300 \\$a463 p. ;$c24 cm.• =505 0\$aThe immigrant image of America -- The "sloopfolk" arrive -- Westward to El-a-noy -- Wisconsin is the place -- The Atlantic
crossing -- Scouting the promised land -- Spreading the gospel -- Journeying toward new horizons -- Ordeal and debate -- Appraising the American scene -- The transatlantic gold rush -- Cheerful voices at mid-century -- More than a ballad -- A humorist in Canaan -- A lady grows old in Texas -- In defense of the southwest -- From a frontier parsonage -- The beautiful land -- The glorious new Scandinavia.
• =533 I0$aElectronic reproduction.$bAlexandria, VA :$cAlexander Street Press,$d2002.$f(North American women's letters and diaries).$nAvailable via World Wide Web.
• =650 \0$aNorwegian Americans.• =651 \0$aUnited States$xCivilization.• =700 1\$aBlegen, Theodore Christian,$d1891-1969.• =776 1\$cOriginal$w(DLC) 55009368 • =830 \0$aNorth American women's letters and diaries.• =856 40$zAccess restricted to subscribers.$uhttp://www.aspresolver.com/aspresolver.asp?NWLD;S161• =949 01$aER_NAWLD
MARC leader
http://www.oclc.org/support/documentation/worldcat/records/subscription/1/1.pdf
MARC variable fields
• 245 14 $a The MARC record: $b revealed and detailed– Field tag: 245– Indicators: 14– Subfield: $a, $b– Contents
MARC value standards
• Fields & Values – Fields, Indicators, Subfields– More information from OCLC
• Content standards– AACR2– RDA
• Development started in 2004, slated for release in 2009
• An enjoyable article on the development of RDA
MARCXML
• Encoding mechanisms (elements, attributes)
• Value? Problems solved/created?
Microformats
• Intentionally simple metadata standards designed for interoperability and metadata communication
• Some principles from Technorati's wiki• a way of thinking about data; • adapted to current behaviors and usage patterns; • highly correlated with semantic xhtml; • a set of simple data formats that many are actively developing
and implementing. a way of thinking about data;
• Examples– Hcard– XFN– RSS/Atom– OpenURL
INLS 520 – Fall 2007Erik Mitchell
MARC value standards
• Fields & Values – Fields, Indicators, Subfields– More information from OCLC
• Content and encoding standards– AACR2– RDA
• Development started in 2004, slated for release in 2009
• An enjoyable article on the development of RDA
INLS 520 – Fall 2007Erik Mitchell
MARC metadata
• Definition– Machine Readable Catalog Record– Combination of content, value, and
encoding standard
• History– Created by Henriette Avram in 1968– Managed by the Library of Congress
INLS 520 – Fall 2007Erik Mitchell
MARC metadata
• The encoding standard– Variable length record– Set leader defines position of fields in record– Fixed fields in leader codifies format information– Variable length fields provide descriptive content
• Examples– System ready example record (LC)– Uses of MARC fields by OCLC
• More information– More information from LC
INLS 520 – Fall 2007Erik Mitchell
Encoded MARC record01802cam 22003371a
4500001001800000003000800018005001700026006001900043007001500062008004100077015001400118035002100132035001800153040005100171041001300222043002100235050001700256082001700273245011000290260006600400300002100466505053300487533015701020650002501177651003301202700004401235776003201279830004801311856009201359949001301451ASPS00000161/nwldVaAlASP20061114120112.0m | d | cr |n ---||a|a730321s1955 mnu 000 0 eng aGB56-6680 9(DLC) 55009368 a(OCoLC)585815 aDLCcODaUdOCoLCdMnHidUkdPBfGdDLCdVaAlASP1 aenghnor an-us---ae-no---00aE184.S2bB55 a325.2481097300aLand of their choiceh[electronic resource] :bthe immigrants write home /cedited by Theodore C. Blegen. a[Minneapolis, Minn.] :bUniversity of Minnesota Press,c1955. a463 p. ;c24 cm.0 aThe immigrant image of America -- The "sloopfolk" arrive -- Westward to El-a-noy -- Wisconsin is the place -- The Atlantic crossing -- Scouting the promised land -- Spreading the gospel -- Journeying toward new horizons -- Ordeal and debate -- Appraising the American scene -- The transatlantic gold rush -- Cheerful voices at mid-century -- More than a ballad -- A humorist in Canaan -- A lady grows old in Texas -- In defense of the southwest -- From a frontier parsonage -- The beautiful land -- The glorious new Scandinavia.I0aElectronic reproduction.bAlexandria, VA :cAlexander Street Press,d2002.f(North American women's letters and diaries).nAvailable via World Wide Web. 0aNorwegian Americans. 0aUnited StatesxCivilization.1 aBlegen, Theodore Christian,d1891-1969.1 cOriginalw(DLC) 55009368 0aNorth American women's letters and diaries.40zAccess restricted to subscribers.uhttp://www.aspresolver.com/aspresolver.asp?NWLD;S16101aER_NAWLD
INLS 520 – Fall 2007Erik Mitchell
Text formatted MARC• =LDR 01802cam 22003371a 4500• =001 ASPS00000161/nwld• =003 VaAlASP• =005 20061114120112.0• =006 m\\\\|\\\d\|\\\\\\• =007 cr\|n\---||a|a• =008 730321s1955\\\\mnu\\\\\\\\\\\000\0\eng\\• =015 \\$aGB56-6680• =035 \\$9(DLC) 55009368• =035 \\$a(OCoLC)585815• =040 \\$aDLC$cODaU$dOCoLC$dMnHi$dUk$dPBfG$dDLC$dVaAlASP• =041 1\$aeng$hnor• =043 \\$an-us---$ae-no---• =050 00$aE184.S2$bB55• =082 \\$a325.24810973• =245 00$aLand of their choice$h[electronic resource] :$bthe immigrants write home /$cedited by Theodore C. Blegen.• =260 \\$a[Minneapolis, Minn.] :$bUniversity of Minnesota Press,$c1955.• =300 \\$a463 p. ;$c24 cm.• =505 0\$aThe immigrant image of America -- The "sloopfolk" arrive -- Westward to El-a-noy -- Wisconsin is the place -- The Atlantic
crossing -- Scouting the promised land -- Spreading the gospel -- Journeying toward new horizons -- Ordeal and debate -- Appraising the American scene -- The transatlantic gold rush -- Cheerful voices at mid-century -- More than a ballad -- A humorist in Canaan -- A lady grows old in Texas -- In defense of the southwest -- From a frontier parsonage -- The beautiful land -- The glorious new Scandinavia.
• =533 I0$aElectronic reproduction.$bAlexandria, VA :$cAlexander Street Press,$d2002.$f(North American women's letters and diaries).$nAvailable via World Wide Web.
• =650 \0$aNorwegian Americans.• =651 \0$aUnited States$xCivilization.• =700 1\$aBlegen, Theodore Christian,$d1891-1969.• =776 1\$cOriginal$w(DLC) 55009368 • =830 \0$aNorth American women's letters and diaries.• =856 40$zAccess restricted to subscribers.$uhttp://www.aspresolver.com/aspresolver.asp?NWLD;S161• =949 01$aER_NAWLD
INLS 520 – Fall 2007Erik Mitchell
MARC variable fields
• 245 14 $a The MARC record: $b revealed and detailed– Field tag: 245– Indicators: 14– Subfield: $a, $b– Contents
INLS 520 – Fall 2007Erik Mitchell
MARC leader
http://www.oclc.org/support/documentation/worldcat/records/subscription/1/1.pdf
INLS 520 – Fall 2007Erik Mitchell
MARC fields (1)
• 001-007 Leader/fixed fields• 010-035 Identifying numbers• 050-099 Call Numbers• 100-130 Names• 210-247 Title• 250-270 Edition, imprint, etc• 300-362 Physical, publication
info.
INLS 520 – Fall 2007Erik Mitchell
MARC fields (2)
• 500-599 Notes & contextual info.
• 600-699 Subject headings, names
• 700-799 Added entries
• 800-830 Series added entries
• 856 Electronic access
• 900-999 Local information
INLS 520 – Fall 2007Erik Mitchell
Example MARC fields (1)
• =LDR 01802cam 22003371a 4500• =001 ASPS00000161/nwld• =003 VaAlASP• =005 20061114120112.0• =006 m\\\\|\\\d\|\\\\\\• =007 cr\|n\---||a|a• =008 730321s1955\\\\mnu\\\\\\\\\\\000\0\eng\\• =015 \\$aGB56-6680• =035 \\$9(DLC) 55009368• =035 \\$a(OCoLC)585815
INLS 520 – Fall 2007Erik Mitchell
MARC leader (006)Position Field Value
00-04 Logical Record Length 018005 RecStat (Record Status) c06 Type (type of record) a07 BLvl (Bibliographic level) m08 Ctrl (type of control) \09 Character Coding Scheme10 Indicator Count11 Subfield Code Count12-16 Base Address of data17 ELvl (Encoding Level) 118 Desc (Descriptive catalog form AACR2/ISBD) a19 Linked Record Requirement20 Length of Len-of-field21 Length of starting character 22 Transaction type code in hex23 Undf
INLS 520 – Fall 2007Erik Mitchell
008 Field (Leader – 2)Position Field Value
00–05 Entered Date added to WorldCat 730321 06 DtSt Date Type s 07–10 Dates (Date 1) 1955 11–14 Dates (Date 2) \\\\ 15–17 Ctry(Required if avail.) mnu 18–34 Format specific
(See Summary of 008 and 006 Field Bytes.) 18 Illustrations acde22 Audience e23 Form r24 Nature of Contents bcde28 Gpub (Government Publication) \29 Conf (conference Publication) 030 Fest (Festschrift) 031 Indx (does the resource have an index) 133 LitF (literary form) m34 Biog (Is the work biographical) \
35–37 Lang(Mandatory) eng 38 MRec Modified Record \ 39 Srce (Mandatory)Cataloging source \
INLS 520 – Fall 2007Erik Mitchell
Example MARC fields (2)
• =050 00$aE184.S2$bB55• =082 \\$a325.24810973• =245 00$aLand of their
choice$h[electronic resource] :$bthe immigrants write home /$cedited by Theodore C. Blegen.
• =260 \\$a[Minneapolis, Minn.] :$bUniversity of Minnesota Press,$c1955.
• =300 \\$a463 p. ;$c24 cm.
INLS 520 – Fall 2007Erik Mitchell
Example MARC fields (3)
• =505 0\$aExtracted notes fields.• =650 \0$aNorwegian Americans.• =651 \0$aUnited States$xCivilization.• =700 1\$aBlegen, Theodore Christian,
$d1891-• =830 \0$aNorth American women's letters • =856 40$zAccess restricted to
subscribers.$uhttp://www.aspresolver.com/as presolver.asp?NWLD;S161
• =949 01$aER_NAWLD
MARC Exercises
• Introduction to MARCEdit– If you can’t use MARCEdit – use a text
editor & follow this standard:• =245 04 $a content $b more content
– Tour of the application– Exercise 1 – create a MARC record– Exercise 2 – decompile/compile MARC
records, batch edit
INLS 520 – Fall 2007Erik Mitchell
Warwick Framework
• Components– Container– Package
• Metadata set• Indirect link• Another container
• Origins / Definition– Beginnings: Came out of DC discussions in 1995/6– Goal: to promote interoperability, define context of the DC
metadata, come up with a way of ‘contextualizing’ DC description
– Definition: A general model that describes the various parts of a complex object, including the various categories of metadata.-http://www.cs.cornell.edu/wya/DigLib/MS1999/glossary.html