21
INTRODUCTION At the TEI and XML in Digital Libraries Workshop that was held at the Library of Congress in July 1998, several working groups were formed to consider various aspects of the Text Encoding Initiative. Group 1 was charged to recommend some best practices for TEI header content and to review the relationship between the Text Encoding Initiative header and MARC. To this end, representatives of the University of Virginia Library and the University of Michigan Library gathered in Ann Arbor in early October to develop a recommended practice guide. Our work was assisted by similar efforts that had taken place in the United Kingdom under the auspices of the Oxford Text Archive the previous year. The following document represents a draft of those recommended practices. It has been submitted to various constituencies for comment Definition: Text Encoding Initiative: defines a general-purpose scheme that makes it possible to encode different textual views. “Grew out of technology based textual analysis applications employed by Humanities scholars” e.g, tracing the use of the word ‘love’ in the genre poems within a specific historical

Tei Header

Embed Size (px)

Citation preview

Page 1: Tei Header

INTRODUCTION

At the TEI and XML in Digital Libraries Workshop that was held at the Library of

Congress in July 1998, several working groups were formed to consider various

aspects of the Text Encoding Initiative. Group 1 was charged to recommend

some best practices for TEI header content and to review the relationship

between the Text Encoding Initiative header and MARC. To this end,

representatives of the University of Virginia Library and the University of

Michigan Library gathered in Ann Arbor in early October to develop a

recommended practice guide. Our work was assisted by similar efforts that had

taken place in the United Kingdom under the auspices of the Oxford Text Archive

the previous year. The following document represents a draft of those

recommended practices. It has been submitted to various constituencies for

comment

Definition:

Text Encoding Initiative: defines a general-purpose scheme that makes it

possible to encode different textual views. “Grew out of technology based textual

analysis applications employed by Humanities scholars” e.g, tracing the use of

the word ‘love’ in the genre poems within a specific historical period. Focus has

been on text capture (in electronic form from already existing text in another

medium) rather than text creation, i.e., no other text copy exists. Assumes texts

and works on texts have a common core of textual features.

Page 2: Tei Header

Encoding:

SGML (ISO 8879) and ISO 646 (7-bit character set standard). Encodings for

different views of text; alternative encodings for the same text features;

mechanisms for user-defined extensions to the scheme. The Guidelines make it

possible to encode many different views of the text, simulataneously if

necessary. TEI Guidelines are not prescriptive: few features are mandatory, but

the Guidelines define a core set of tags. Extensible. The focus is on the capture

of text that already exists in another medium rather than text creation.

TEI Header is a set of descriptions prefixed to a TEI encoded document that

specifies four components:

• file description (a full bibliographic description),

• encoding description (level of detail of the analysis-the aim or purpose for which

an electronic file was encoded; editorial principles and practices used during the

encoding of the text),

• text profile (classificatory and contextual information such as the text’s subject

matter; the languages and sublanguages used, the situation in which it was

produced, the participants and their setting),

• revision history (history of changes during the electronic files’ development).

contains bibliographic information supporting resource discovery, and data

management portions supporting use of the resource.

http://libraries.mit.edu/guides/subjects/metadata/standards/tei.html

Page 3: Tei Header

HISTORY

The TEI was established in 1987 to develop, maintain, and promulgate

hardware- and software-independent methods for encoding humanities data in

electronic form. Over nearly three decades the TEI has been extraordinarily

successful at achieving its objective and it is now widely used by scholarly

projects and libraries around the world.

Although a comprehensive history of the TEI has not yet been written, all known

documentary resources about the TEI are stored in the Archive. If you (or others

you know) have electronic copies of any original TEI documents not available

here, please get in touch.The archive of the TEI-L discussion list is a rich

resource for historical information, as is the archive of the now defunct TEI-TECH

mailing list, which can be downloaded in its entirety.

Origins of the TEI

When the Text Encoding Initiative (TEI) was originally established, scholarly

projects and libraries attempting to take advantage of digital technology seemed

to be faced with an overwhelming obstacle to creating sustainable and shareable

archives and tools: the proliferating systems for representing textual material.

These systems seemed almost always to be incompatible, often poorly designed,

and multiplying at nearly the same rapid rate as the electronic text projects

themselves. This situation was inhibiting the development of the full potential of

computers to support humanistic inquiry by erecting barriers to access, creating

new problems for preservation, making the sharing of data (and theories) difficult,

and making the development of common tools impractical.

Part of the problem was simply a lack of opportunity for sustained communication

and coordination, but there were more systemic forces at work as well. Longevity

Page 4: Tei Header

and re-usability were clearly not high on the priority lists of software vendors and

electronic publishers, and proprietary formats were often part of a business

strategy that might benefit a particular company, but at the expense of the

broader scholarly and cultural community. At the end of the eighties there was a

real concern that the entrepreneurial forces which (then as now) drive information

technology forward would impede such integration by the proliferation of mutually

incompatible technical standards.

In November 1987 a meeting at Vassar College was convened to address these

problems. Sponsored by the Association for Computers in the Humanities and

funded by the National Endowment for the Humanities, it brought together a

diverse group of scholars from many different disciplines and representing

leading professional societies, libraries, archives, and projects in a number of

countries in Europe, North America, and Asia. At this meeting the intellectual

foundation for Text Encoding Initiative was articulated. The organization of the

actual work of developing the TEI Guidelines was then undertaken by the three

TEI sponsoring organizations: The Association for Computers in the Humanities,

the Association for Literary and Linguistic Computing, and the Association for

Computational Linguistics. A Steering Committee was organized from

representatives of the sponsoring organizations, and an Advisory Board of

delegates from various professional societies was formed. To lead the actual

work two editors were chosen and four working committees appointed. By the

end of 1989 well over 50 scholars were already directly involved and the size of

the effort was growing rapidly.

The initial phase resulted in the release of the first draft (known as "P1") of the

Guidelines in June 1990. A second phase, involving an additional 15 working

groups making revisions and extensions, immediately began and released its

results throughout 1990–1993. Then, after another round of revisions,

extensions, and supplements, the first official version of the Guidelines (‘P3’) was

released in May 1994. Early on in this process a number of leading humanities

textbase projects adopted the Guidelines — while they were still very much a

Page 5: Tei Header

moving target of rapidly changing drafts — as their encoding scheme, identifying

problems and needs and contributing proposed solutions.

In addition, workshops and seminars were conducted to introduce the wider

community to the Guidelines and ensure a steady source of experience to

support continuing development. As more scholars became acquainted with the

Guidelines, comments, corrections, and requests for extensions arrived from

around the world. In the end there were nearly 200 scholars from many

disciplines, professions, and countries in the core group that was developing the

TEI Guidelines.

The TEI Consortium

In January of 1999, the University of Virginia and the University of Bergen

(Norway) presented a proposal to the TEI Executive Committee for the creation

of an international membership organization, to be known as the TEI Consortium,

which would maintain, continue developing, and promote the TEI. This proposal

was accepted by the TEI Executive Committee, and shortly thereafter, Virginia

and Bergen added two other host institutions with longstanding ties to the TEI:

Brown University and Oxford University.

This group then formulated an Agreement to Establish a Consortium for the

Maintenance of the Text Encoding Initiative which was the basis on which a

transition group comprising representatives from the three original sponsoring

organizations of the TEI, as custodians of rights in the TEI, and from the

incoming Host Organizations set about the job of drafting and incorporating the

TEI Consortium during 2000.Incorporation was completed during December of

2000, and the first Board members took office during January of 2001.

The goal of establishing the TEI Consortium was to maintain a permanent home

for the TEI as a democratically constituted, academically and economically

independent, self-sustaining, non-profit organization. In addition, the TEI

Consortium was intended to foster a broad-based user community with sustained

Page 6: Tei Header

involvement in the future development and widespread use of the TEI

Guidelines. In both of these goals the creation of the Consortium has proven a

positive step. Inasmuch as the original goal of the TEI was to promote

collaborative research on electronic texts, by making the encoding system no

longer an obstacle to such work, the Consortium's efforts are similarly directed

towards making the TEI encoding system as effective a tool for creating,

archiving, and sharing textual data as possible. For its members, the TEI

Consortium provides valuable services to assist them in the creation and use of

digital resources, and to help them stay abreast of rapidly changing technologies

and practices.

Following the establishment of the TEI Consortium, a critical priority was the

release of an XML version of the TEI Guidelines, updating P3 to enable users to

work with the emerging XML toolset. The P4 version of the Guidelines was

published in June 2002. It was essentially an XML version of P3, making no

substantive changes to the constraints expressed in the schemas apart from

those necessitated by the shift to XML, and changing only corrigible errors

identified in the prose of the P3 Guidelines. However, given that P3 had by this

time been in steady use since 1994, it was clear that a substantial revision of its

content was necessary, and work began immediately on the P5 version of the

Guidelines. This was planned as a thorough overhaul, involving a public call for

features and new development in a set of crucial areas including character

encoding, graphics, manuscript description, standoff markup, and the language

in which the TEI Guidelines themselves are written. The P5 version of the

Guidelines is scheduled to be released at the end of 2007.

Page 7: Tei Header

OBJECTIVE

1) Review notes and documents prepared by Manuscript Description work

group concerning collation.

2) Review the needs and practices of those parts of the TEI community (and

relevant parts of the potential TEI community: i.e. those who would use

the TEI if it included provision for this kind of encoding) likely to use

facilities for encoding collation and physical document structure.

3) Propose a detailed work plan to improve and extend upon the

recommendations currently provided by TEI P4 in these areas. The work

plan will be determined by agreement of the working group but is expected

to address at least the following:

provision for encoding basic structural information about each page in the

document (i.e. its identification with respect to the collation of the entire

document), this information being associated directly with the individual

page.

provision for encoding a summary of structural information about the

document as a whole (i.e. an equivalent of a collational formula, encoded

in the TEI header)

Page 8: Tei Header

provision for several types of commentary on the physical document

structure (e.g. information, both structured and unstructured, such as

measurements, identification, and description of features of paper or

typography; summaries of printing history; identification of cancels, etc.);

provision for several types of derived analytical perspectives on the

physical document structure (e.g. reconstructions of individual formes,

bifolia, other higher-order structures) using stand-off markup (e.g. <join> ),

and provision for where this information should be located within the

encoded document.

in concert with the Manuscript Description workgroup, harmonization of

treatment of collation and physical document structure for printed books

and manuscripts, at least to ensure that no redundant or incompatible

recommendations are made in either section of the Guidelines.

4) Respond to comments on relevant other work that may be routed to this

work group by the editors.

Page 9: Tei Header

FUNCTION

1) A TEI Header can serve many publics. Headers can be created in a text

center and reflect the center's standards, or they can serve as the basis

for other types of metadata system records produced by other agencies.

Headers can function in detached form as records in a catalog, as a title

page inherent to the document, or as a source for index displays.

2) In addition, a header may describe a collection of documents, a single

item, or a portion of an item. Variances in TEI Header content can result

from making different choices of what is being described.

3) A TEI Header may not have a one to one correspondence with a MARC

record. One TEI Header may have multiple MARC analytic records, or one

MARC record may be used to describe a collection of TEI documents with

individual headers.

4) A TEI Header serves several purposes. It may contain an historical

background on how the file has been treated. It can extend the information

of a classic catalog record. The Text Center and/or cataloging agency can

act as the gatekeeper for creators by providing standards for content.

5) Does the TEI Header act as the electronic title page or as a catalog

record? Is it integral to the document it describes or independent?

Depending on the community being served, the TEI elements will reflect

the interest of that community. Nonetheless, it is possible to describe a set

of "best practices" that will produce compatible content while

accommodating this variety of purposes. Compatibility of content

encourages a more understandable set of results when information about

assorted items is displayed as a set of search results, a contents list, or an

Page 10: Tei Header

index, and it allows for more reasonable conversion of content information

from TEI tags to elements of other metadata sets when this action seems

advisable.

6) It is a traditional practice of librarianship to agree upon where in a

document and in what order of preference one should look to identify the

title, author, etc., of that document. This permits a certain consistency in

terminology and allows for a certain amount of authentication of content.

We recommend the following preferences to those who create headers

and to those who attempt to use headers to create traditional catalog

records that are compliant with AACR2 and ISBD(ER) rules.

7) As a member of the academic community, the header creator/editor has a

responsibility to verify, whenever humanly possible, the intellectual source

for an electronic document that presents itself without any information

regarding its source or authorship.

http://www-personal.umich.edu/~jaheim/

teiguide.html

Page 11: Tei Header

BENEFITSThere are several tangible benefits of membership in the TEI Consortium, and

the TEI is in the process of developing additional benefits as well. One of the

most important benefits, which is difficult to quantify, is the fact that support for

the TEI helps ensure that this important community standard will continue to be

available and supported for the future, and that its development keeps pace with

the needs of the text encoding community. Other, more specific benefits, include

the following:

1) TEI annual meeting and conference

The TEI annual meeting and conference is a central event in the TEI

community and an excellent opportunity to meet with other TEI projects

and users and learn more about new developments in the TEI world.

Registration is free to current members and subscribers.

2) Voting in TEI elections

All TEI member institutions have a vote in TEI elections, which is cast by

their designated elector at the TEI annual meeting.

3) Discounts on software

The TEI works to negotiate discounts with vendors of software. Currently

TEI members and subscribers are entitled to a 20% discount on the

popular <oXygen/> XML editor, which comes bundled with TEI schemas

and stylesheets. Members and subscribers may obtain a discount code by

contacting the TEI at [email protected].

Page 12: Tei Header

4) Discounts on training and consultation

TEI members and subscribers are entitled to receive discounts from

participating institutions on TEI training workshops and consultation.

5) Free printed copy of the TEI Guidelines

All TEI members receive a free copy of each new printed release of the

TEI

Guidelines.

The TEI continues to explore additional opportunities for membership benefits,

such as discounts on vendor rates for digitization services. Any new benefits will

be announced on TEI-L and at this site.

http://www.tei-c.org/Membership/benefits.xml?style=printable

Page 13: Tei Header

CONCLUSIONThe above overview hopefully demonstrates the comprehensive nature of the

TEI Header as a mechanism for documenting electronic texts. The emergence of

the electronic text over the past decade has presented librarians and cataloguers

with many new challenges. Existing library cataloguing procedures, while

inadequate to document all the features of electronic texts properly, were used

as a secure foundation onto which additional features directly relevant to the

electronic text could be grafted. Chapter Nine of AACR2 (Anglo-American

Cataloguing Rules) requires substantial updating and revision, as it assumes that

all electronic texts are published through a publishing company and cannot

adequately catalogue texts which are only published on the Internet. The TEI

Header has proved to be an invaluable tool for those concerned with

documenting electronic resources; its supremacy in this field can be measured

by the increasing number of electronic text centres, libraries, and archives which

have adopted its framework. The Oxford Text Archive has found it indispensable

as a means of managing its large collection of disparate electronic texts, not only

as a mechanism for creating its searchable catalogue, but as a means of creating

other forms of metadata which can communicate with other information systems.

Ironically it is the same generality and flexibility offered by the TEI Guidelines

(P3) on creating a header which have hindered the progress of one of the main

goals of the TEI and the hopes of the electronic text community as a whole,

namely the interoperability and interchangeability of metadata. Unlike the Dublin

Core element set, which has a defined set of rules governing its content, the TEI

Header has a set of guidelines, which allow for widely divergent approaches to

header creation. While this is not a major problem for individual texts, or texts

Page 14: Tei Header

within a single collection, the variant way in which the guidelines are interpreted

and put into practice make easy interoperability with other systems using TEI

Headers more difficult than first imagined. As with the Dublin Core element set,

what is required is the wholescale adoption of a mutually acceptable code of

practice which header creators could implement. One final aspect of the TEI

Header which is a cause of irritation to those creating and managing TEI

Headers and texts; the apparent dearth of affordable and user-friendly software

aimed specifically at header production. While this has long been a general

criticism of SGML applications as a whole, the TEI can in no way be held to

blame for this absence, as it was not part of the TEI remit to create software.

However it has contributed to the relatively slow uptake and implementation of

the TEI Header as the predominant method of providing well structured metadata

to the electronic text community as a whole. Until this situation is adequately

resolved the tools on offer tend to be freeware products designed by people

within the SGML community itself, or large and very expensive purpose-built

SGML aware products aimed at the commercial market.

Page 15: Tei Header

http://www.slais.ubc.ca/COURSES/libr500/2000-2001-wt1/www/L_Little-Wolfe/tei.htm

1. To specify a common interchange format for machine readable

texts.

2. To provide a set of recommendations for encoding new textual

materials. The recommendations would specify both what features are

to be encoded and how those features are to be represented. 

3. To document the major existing encoding schemes, and develop a

metalanguage in which to describe them. (from The ACH/ACL/ALLC

Text Encoding Initiative: An Overview by Susan Hockey