Upload
arella
View
54
Download
0
Tags:
Embed Size (px)
DESCRIPTION
More Text Encoding Initiative (TEI). 6/30 XML + XSLT for Libraries. Today. Basic anatomy of TEI Capturing the structure of source documents Capturing more than the structure Building personographies Using TEICorpus In class continue Assignment 5: Mark up digital texts in TEI. - PowerPoint PPT Presentation
Citation preview
More Text Encoding Initiative (TEI)
6/30
XML + XSLT for Libraries
Today
• Basic anatomy of TEI• Capturing the structure of source
documents• Capturing more than the structure• Building personographies• Using TEICorpus• In class
continue Assignment 5: Mark up digital texts in TEI
Basic anatomy of TEI
• <TEI> is the root element
• <teiHeader> - where the metadata about the digital document you are creating goes– this element is similar to <eadheader> in EAD
• <text> - where the transcription of the source document is captured
Required elements of <teiHeader>
• <fileDesc> - a wrapper element for capturing these required elements:
<titleStmt> - title of your TEI document (not the original document you are transcribing)
<publicationStmt> - for publication information about your TEI document
<sourceDesc> - for describing the original document you are transcribing
<teiHeader> examples
• While there are several required elements inside <teiHeader>, the structure of these elements is pretty flexible– A less structured example that uses <p> tags:
http://slis.uiowa.edu/~jlee/239/sampledocs/sampleTEIbook.xml
– A more structured example that uses more detailed tags such as <msIdentifier>: http://slis.uiowa.edu/~jlee/239/sampledocs/NoblePostcardsTEI.xml
Capturing the structure of your source document
Determining the level of your markup
• We will be transforming our TEI documents to web display as HTML.
• The more structure you capture in your transcription, the more flexible your display options will be later.
The <text> element
• <text> contains a single text of any kind
• You decide the scope of the <text> element– A poem?– A play?– An essay?– A collection of essays?
The <div> element
• Within <text>, <div> is used to describe some discrete structure of the source document
• You decide what <div> should represent:– One poem? One stanza of a poem? – One book? One chapter?
Sample <div> structure
• In this example,<div> represents one chapter:<text><body> <div> <head type="chapter">Chapter 1</head> <p>In this chapter, we will focus on….</p> </div> <div> <head type="chapter">Chapter 2</head> <p>In chapter one, you learned….</p> </div></body></text>
The <group> element• For more complex source documents, use <group> tags to capture
a series of <text> elements• For example, encoding a book of poems and using <text> for each
poem and <div> to capture stanzas– <text>
<front> <!-- biographical notice by editor --> </front> <group> <text> <!-- first poem --> </text> <text> <!-- second poem --> </text> </group></text>
The <ab> element
• The anonymous block element, <ab>, is used to encode a discrete chunk of text
• It is generally used to describe paragraph-like elements, like <p> tags in HTML
Encoding line breaks
• To retain original breaks in texts:
– encode them with line break <lb/> elements within anonymous block <ab> elements
<ab>Line one of text <lb/> Line two of text</ab>
– encode them with separate <ab> elements<ab>This is the first paragraph…</ab>
<ab>This is the second paragraph…</ab>
Encoding more than the structure of your source document…
Capturing images
• To include an image of the source document, use the <facsimile> element before <text> element:
<facsimile>
<graphic url="http://digital.lib.uiowa.edu/u?/noble,1184"/>
</facsimile>
*The URL points to a publicly accessible image file
Identifying names
Use <name>, <orgName>, or <persName> element anywhere within the transcription
<div> <p>As I haven't time to write a letter I will just drop you a postal. How
is <persName>Hattie</persName>? I have got a cold but that's all. this postal is kinda dirty but I got cause it is just what we will do isn't it. Just wait we'll let them know you're not dead. ha ha</p>
<signed>bye. <persName>Golda</persName></signed></div>
Identifying places
• <placeName> for geo-political place names– <placeName>Rochester, NY</placeName>
– <placeName> <settlement type="city">Rochester</settlement>, <region type="state">New York</region></placeName>
• <geoName> for places named in terms of geographic features such as mountains, lakes, or rivers, independently of geo-political units– <geogName type="river">Mississippi River</geogName>
Identifying dates
• <date> contains a date in any format• <time> contains a phrase defining a time of day
in any format. • the attribute @when normalizes the date or time
in a standard form, e.g. yyyy-mm-dd.– <date when="1945-10-24">24 Oct 45</date>– <date when="1996-09-24T07:25:00Z">September
24th, 1996 at 3:25 in the morning</date> – <time when="1999-01-04T20:42:00-05:00">Jan 4
1999 at 8 pm</time>
Other elements can record date + time information
• Normalized dates and times can be expressed for other elements through attributes– A complete table of “date-able” elements:
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.datable.html
• For example:
<birth when=“1981-01-23”>January 23, 1981</birth>
Expressing date spans and ambiguous dates
• @notBefore specifies the earliest possible date for the event
• @notAfter specifies the latest possible date for the event
• @from indicates the starting point of the period• @to indicates the ending point of the period
<residence notBefore-iso="1907-09-09" notAfter-iso="1910-09-06"></residence>
Elements applicable to correspondence
• <opener> groups together dateline, byline, salutation, and similar phrases appearing as a preliminary group at the start of a division, especially of a letter.
• <closer> groups together salutations, datelines, and similar phrases appearing as a final group at the end of a division, especially of a letter.
• <dateline> contains a brief description of the place, date, time, etc. of production of a letter, newspaper story, or other work, prefixed or suffixed to it as a kind of heading or trailer.
• <salute> contains the salutation in the opening/closing of a letter, preface, etc.
• <signed> contains the closing signature
Sample use of <opener> and <closer>
• <div type="letter" n="14"> <head>Letter XIV: Miss Clarissa Harlowe to Miss Howe</head> <opener> <dateline>Thursday evening, March 2.</dateline> </opener> <p>On Hannah's depositing my long letter ...</p> <p>An interruption obliges me to conclude myself in some hurry, as well as fright, what I must ever be,</p> <closer> <salute>Yours more than my own,</salute> <signed>Clarissa Harlowe</signed> </closer></div>
• (Taken from http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DS.html#DSOC)
Building a personography
• A personography is a list of normalized biographical data about persons tagged in your TEI document
• It can be referenced in multiple TEI documents
• It can be used to enhance search + browse tools
The <listperson> element
• Personographies are contained within <sourceDesc> in the header
• @xml:id is used to uniquely identify a person
<listPerson> <person>
<persName xml:id="HJ"><forename>Hattie</forename> <surname>Jacobs</surname></persName>
<sex>female</sex> <residence notBefore-iso="1907-09-09" notAfter-iso="1910-09-06"></residence></person>
</listPerson>
Referencing personography data in the transcription
• Use @ref to refer to the @xml:id you assigned to that person <address>
<addrLine>
Miss <persName ref="#HJ">Hattie Jacobs</persName>
</addrLine> <settlement>Madrid</settlement> <region>Iowa</region></address>
Other global lists
• Similarly, you can use @xml:id create a global list of other elements– <listPlace>– <listOrg>– <listBibl>– <listEvent>
Using <teiCorpus>
• <teiCorpus> can be used as a wrapper root element for multiple <TEI> documents
• <teiCorpus> has its own global header for capturing metadata about all of the <TEI> documents it contains
• Example – postcards: http://slis.uiowa.edu/~jlee/239/sampledocs/NoblePostcardsTEI.xml
Take a break
In class
• Continue Assignment 5: Mark up digital texts in TEI
• If you have finished encoding the basic structure in your TEI documents:– try enhancing your markup with name, date, and
place information– try nesting your TEI documents within one
<teiCorpus> document– try building a personography