32
XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian [email protected]

XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian [email protected]

Embed Size (px)

Citation preview

Page 1: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

XML Basics for Digital Humanists

Alabama Digital Humanities CenterSeptember 19 & 23, 2011Instructor:Shawn Averkamp, Metadata [email protected]

Page 2: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

What is XML?

eXtensible

Markup

Language

Page 3: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Language• XML is a language for structuring data. (other

methods of structuring data: database, excel spreadsheet, etc.)

• Not a data model, but a way of encoding a data model or knowledge domain so that it is machine-processable.

• XML is composed of syntax rules (just like any other language).

Page 4: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Markup• XML uses “markup” to structure data.• XML uses labels within angle brackets (like in

HTML) to “tag” text.

Page 5: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Ingredients3 avocados1/4 cup onions1/4 teaspoon garlic salt12 corn tortillas1 bunch fresh cilantro leavesjalapeno pepper sauce

Page 6: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

<ingredients> <ingredient qty=“3”>avocados</ingredient> <ingredient qty=“1/4” unit=“cup”>onions,diced</ingredient> <ingredient qty=“1/4” unit=“t”>garlic salt</ingredient> <ingredient qty=“12”>corn tortillas</ingredient> <ingredient qty=“1”>fresh cilantro leaves</ingredient> <ingredient>jalapeno pepper sauce</ingredient></ingredients>

element

attribute

Elements = things we care aboutAttributes = properties of those things

Page 7: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

eXtensible• You can extend your data model with other

XML data models (“schemas”).

Page 8: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

<mods>

<titleInfo>

<title>Pac-man shaped magnetic tunnel junctions for magnetic flip flops for space applications</title>

</titleInfo>

<name type="personal">

<namePart>Red Ghost<namePart>

<role>

<roleTerm>Author</roleTerm>

</role>

</name>

<name type="personal">

<namePart>Dot Chomper<namePart>

<role>

<roleTerm>Advisor</roleTerm>

</role>

</name>

<abstract>Pac-man shaped magnetic tunnel junctions are proposed for CMOS-based magnetic flip flops for space applications…</abstract>

<extension>

<etd:degree>Ph.D.</etd:degree>

<etd:discipline>Electrical and Computer Engineering</etd:discipline>

</extension>

</mods>

The etd schema (in red) “extends” the mods schema

Page 9: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Where is XML?

XML drives applications and information you use every day:•RSS feeds (Real Simple Syndication) for blogs, podcasts, more•iTunes stores your music library metadata and usage data in XML•Google uses XML to display geographic data in Google Maps and Earth (more info: http://code.google.com/apis/kml/documentation/kml_tut.html )

Page 10: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

What’s XML good for?

• Sharing/exchanging data online• Storing data• Controlling data display• Syndication

Page 11: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

The XML Family

XML The document language

XPath Language for navigating XML documents

XSD Schema language

XSLT (XML Stylesheet Language Transformations) Language for transforming XML into other formats (HTML, text, other XML documents)

XQuery Language for querying XML (similar to SQL database querying)

XForms Language for creating web input forms

Page 12: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

XML in the Humanities

• TEI– Shakespeare Quartos Archive:

http://www.quartos.org/– Lewis & Clark Journals:

http://lewisandclarkjournals.unl.edu/

• Syriac Reference Portal: http://www.syriac.ua.edu/

Page 13: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Getting Started• Open Oxygen • Open movies.xml example (in left sample.xpr sidebar) or

paste code below into a new document

<?xml version="1.0" encoding="UTF-8"?><movies> <movie id="1"> <title>The Green Mile</title> <year>1999</year> </movie> <movie id="2"> <title>Taxi Driver</title> <year>1976</year> </movie> <movie id="3"> <title>The Matrix: Revolutions</title> <year>2004</year> </movie> <movie id="4"> <title>Shrek II</title> <year>2004</year> </movie></movies>

Page 14: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Well-formedness

XML documents must be “well-formed” to be machine-readable. •XML documents must have a root element•XML elements must have a closing tag•XML tags are case sensitive•XML elements must be properly nested•XML attribute values must be quoted

Page 15: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Exercise 1Copy and paste the following code into a new XML document in Oxygen. Correct all errors necessary to make this a well-formed XML document. <movie id=1> <title>The Green Mile<title> <year>1999</year> </movie> <movie id="2"> <title>Taxi Driver</title> <year>1976</year> </movie> <movie id="3"> <title>The Matrix: Revolutions</title> <Year>2004</year> </movie> <movie id="4"> <title>Shrek II</title> <year>2004</movie> </year>

Page 16: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

<!-- Comments -->

Enclose comments within double-hyphen/angle bracket notation:<!-- a brief comment -->

<!--This is a very long block of comments…… … … more comments… … … comments…(still more comments here…)-->

Page 17: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

5 special symbols

To use the following characters in a text value, you must replace them with these entities:

& &amp;

< &lt;

> &gt;

“ &quot;

‘ &apos;

Page 18: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Exercise 2

In your movies.xml document, add another movie to the collection. Add a comment somewhere in the document (or “comment out” a block of elements). When you’ve finished, check for well-formedness (blue check icon).

Page 19: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

XML Schemas

Schemas describe the syntax rules for encoding a data model in XML:– Allowable elements, attributes, and values– Element types -- simple or complex

• Simple – contains a value• Complex – contains other elements

– Constraints of elements, attributes, and values• Repeatability (how many instances of each element allowed)• Obligation (is the element or attribute mandatory?)

– Datatypes of values (integer, string, date, etc.)

Page 20: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

<movies xmlns="http://example.com/schema.xsd"> <movie id="1"> <title>The Green Mile</title> <year>1999</year> </movie> <movie id="2"> <title>Taxi Driver</title> <year>1976</year> </movie> <movie id="3"> <title>The Matrix: Revolutions</title> <year>2004</year> </movie> <movie id="4"> <title>Shrek II</title> <year>2004</year> </movie></movies>

Page 21: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

XML Schemas

• Schemas are themselves XML files but with a .xsd file extension.

• In our XML document, we reference the schema by using a “namespace”

Page 22: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Namespaces

The namespace is the unique identifier for the schema.

<mods xmlns=“http://www.loc.gov/mods/v3”> <titleInfo> <title>Pac-man shaped magnetic tunnel junctions for magnetic flip flops for space applications</title> </titleInfo>……</mods>

Page 23: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Namespace prefixes

When two or more schemas are used in an XML document, we use “prefixes” to distinguish between the elements of each.

<mods xmlns="http://www.loc.gov/mods/v3" xmlns:etd="http://www.ndltd.org/standards/metadata/etdms/1.0/">…… <dateIssued>2011</dateIssued> <extension> <etd:degree>Ph.D.</etd:degree> <etd:discipline>Electrical and Computer Engineering</etd:discipline> </extension></mods>

Page 24: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Valid XML

To be “valid” an XML document must:•Be well-formed•Include the schema declaration in the root element (e.g., <mods xmlns=“http://www.loc.gov/mods/v3”>)

•Conform to the rules of the schema

Page 25: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Exercise 3

Copy and paste the code on the next slide into a new XML document in Oxygen. Add a <name> element to the document, then validate (red check icon). If it validates, then introduce an error into your document to see what error messages Oxygen gives you.

Page 26: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

<mods xmlns="http://www.loc.gov/mods/v3" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:etd="http://www.ndltd.org/standards/metadata/etdms/1.0/" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-4.xsd" version="3.4">

<titleInfo> <title>Pac-man shaped magnetic tunnel junctions for magnetic flip flops for space applications</title> </titleInfo> <name type="personal"> <namePart>Red Ghost</namePart> <role> <roleTerm>Author</roleTerm> </role> </name> <name type="personal"> <namePart>Dot Chomper</namePart> <role> <roleTerm>Advisor</roleTerm> </role> </name> <abstract>Pac-man shaped magnetic tunnel junctions are proposed for CMOS-based magnetic flip flops for space applications…<abstract> <originInfo> <dateIssued>2011</dateIssued> </originInfo> <extension> <etd:degree>Ph.D.</etd:degree> <etd:discipline>Electrical and Computer Engineering</etd:discipline> </extension></mods>

Page 27: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Using and creating schemas

• Always start with the data model!• Decide what entities and properties are

important to you and your project before choosing or creating a schema.

Page 28: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Things to consider

• Are there existing schemas that meet your needs? • Are there commonly used schemas within your field? • If you find a schema that almost meets your needs, can

you extend it to cover the entire scope of what you want to model?

• Who (or what software applications) will you be sharing the data with?

• What kind of functionality do you want to support? Indexing? Flexible display? Visualizations?

Page 29: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Tailor schemas to meet your needs

• You can make schema rules more strict (but not more lax)

• Extend schemas with other schemas (Your primary schema must allow extensions)

• If you expect use of your XML data to be very limited, you can change the schema. (Not recommended if you plan to share your data widely or beyond your own software applications)

Page 30: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Documentation

• Data dictionaries, markup guidelines, best practices are important, especially if you have assistants entering your data.

• Examples of documentation:– MODS guidelines:

http://www.loc.gov/standards/mods/userguide/generalapp.html

– UVa Library TEI guidelines: http://www.lib.virginia.edu/digital/reports/teiPractices/dlpsPractices_postkb.html

Page 31: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Exercise 4Work together to create a data model for a dictionary (or a knowledge domain of your choosing). What should the root element be? What are the elements that will be contained within the root? What are the attributes* (properties) of each of your elements?

Create an instance of your data model in XML. What adjustments or enhancements would you need to make for your schema to be extensible?

*How do you know when something should be an attribute or an element? There is often no wrong answer to this. Use your best judgment—if you think you will not need to further refine a property (for instance, in our recipe example we would not need to refine quantity or unit any further), an attribute is probably the best choice.

Page 32: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu

Resources

• Books, tutorials, and other resources: http://www.lib.ua.edu/digitalhumanities/xml-resources

• http://www.xml.com/