30
Shreve 1 11/7/200 4 Kent State University Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Embed Size (px)

Citation preview

Page 1: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve1 11/7/2004

Kent State University Kent State University Gregory M. ShreveGregory M. Shreve

Internationalizing Digital Libraries:Towards A Standards-Based Strategy

Page 2: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve2 11/7/2004

Kent State University Kent State University

Digital libraries may contain resources in many languages. Accessible through the Internet, libraries may be consulted by individuals in other cultural/linguistic "locales" seeking resources in their own languages or searching across languages for resources in languages other than their own.

Multilingual Modalities

USERS

RESOURCES

Page 3: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve3 11/7/2004

Kent State University Kent State University

In order to enable the efficient and effective acquisition, storage and retrieval of cross-cultural and cross-linguistic resources, a digital library has to be designed from the outset to allow for heterogeneous linguistic and cultural content. The design process is called “internationalization.” The most effective internationalization strategies are standards-based.

Internationalization

Internationalization: a design process intended to enable subsequent linguistic and cultural

adaptation

I18N

Page 4: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve4 11/7/2004

Kent State University Kent State University Internationalization Strategy

An internationalization strategy for a Digital Library involves:

(1) determining the metadata elements, attributes, value spaces and values that are culturally and linguistically dependent and are to be rendered in multiple languages.

(2) creating a mechanism for internationalization that provides administrative control, cross-language tools capability, authority for keywords (terms), translations and translation equivalents.

(3) providing an internationalization scheme that offers reusability and scalability and interfaces with relevant national and international standards.

Other issues are important (different writing systems and character sets of resources) and different display preferences (interface, resources), but we do not deal with these in this paper.

Page 5: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve5 11/7/2004

Kent State University Kent State University

Localization is the preparation of locale-specific versions of a digital library resource or collection and consists of the translation of textual material into the language and textual conventions of the target locale and the adaptation of non-textual materials and delivery / display mechanisms to take into account the cultural requirements of that locale.

internationalization localizationtranslation

Internationalization is an “upstream” engineering process that should precede localization. Its aim is to make subsequent localization/translation easier, more efficient, and less costly.

Internationalization & Localization

adaptation

Page 6: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve6 11/7/2004

Kent State University Kent State University

internationalization

localization

Internationalization & Localization

creationstoringrenderingdistributionacquisitionretrieval

DocumentProcesses

Standards-BasedInternationalizationStrategies

controlled languageterminology controlcontent / display separationcultural stylesheetingexchange standardsauthority managementconcept-orientation

reusability, scalability, authority, control, quality, accessibility, acceptability, accuracy

Page 7: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve7 11/7/2004

Kent State University Kent State University Internationalization

Foci of Internationalization in a Digital Library:

reusability

scalability

authority / quality

accessibility

1. resource content2. metadata content3. metadata elements4. interface elements5. keywords (terms)6. vocabularies

accuracy / acceptability

translations

I18N solution

equivalence

cross-language

target culture(s)

control target document

Page 8: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve8 11/7/2004

Kent State University Kent State University Internationalization

Loci of Internationalization in a Digital Library:

DL resource content(new and existing translations, equivalents)

DL resource metadata & description (element labels, content, vocabularies)

DL interface(localized dialogs, help, messages, menus)

DL tools(x-language: search, glossaries, taxonomies, thesauri)

Page 9: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve9 11/7/2004

Kent State University Kent State University Parallel Metadata: Inline Parallel

As discussed in my ASIST 2003 presentation, there are two I18N approaches to support localizing a DL. The first approach is inline parallel and involves providing multiple local versions of, for instance, a title or keyword data element in a resource record. The data elements are flagged as “local” versions via the lang attribute. This is the most common localization method. Note that “equivalence” is assumed via adjacency and no authority is provided.

Page 10: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve10 11/7/2004

Kent State University Kent State University Inline Parallel: Flawed

reusability

scalability

authority / quality

accessibility

accuracy / acceptability

Because this method stores local equivalents of metadata content inline with the original content in the resource record itself, it does not provide for reusability.

It is not easily scalable because multiple translations of the same or identical items will exist in different places, leading to redundancy and difficulties in maintenance and quality control.

Because there is no schema and system for documenting and managing translations, the source, authority and quality of equivalents and translations cannot be assured.

Because authority and quality cannot be assured, accessibility, accuracy and acceptability cannot be assured.

The approach does not provide control.

NO

control

Page 11: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve11 11/7/2004

Kent State University Kent State University Parallel Metadata: External Parallel

A more fruitful approach, provides references to standards-based external objects. The external objects can be translation memories (for translations of titles, descriptions or other textual content) or standard (e.g. ISO 12620) glossaries (for multilingual equivalents of data element names and their possible restricted vocabulary values).

DigitalLibrary

Resources

Translation Memory

ISO 12620 Glossary

text segments

terms

STANDARDS

BASED

Page 12: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve12 11/7/2004

Kent State University Kent State University Parallel Metadata: External Parallel

<tuv xml:lang=“en-US” creationdate=“20031012” creationid=“Shreve” > <seg>Thermal analysis of anisotropic bodies</seg></tuv>

<tuv xml:lang=“zh-CH” creationdate=“20031012” creationid=“Shreve”> <seg> </seg> </tuv>

TMX-Compliant Translation Memory

<Title lang=“en-US” hastranslation=“true”>Thermal analysis of anisotropic bodies</Title>

Optional LOMAttribute

TMX = Translation Memory Exchange. A translation memory is a database of “aligned” textsegments that are translations of one another. It maintains linguistically “parallel” texts.

Page 13: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve13 11/7/2004

Kent State University Kent State University Parallel Metadata: External Parallel

Translation memories and glossaries are the most common external localizing objects, but the growing use of statistically based corpus linguistics to create language resources will also make it possible to utilize other monolingual and multilingual resources in Digital Libraries. Standards for representing and storing some of these new language resources do not yet exist.

Corpus

Page 14: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve14 11/7/2004

Kent State University Kent State University Parallel Metadata: External Parallel

Corpus

OntologiesThesauriTaxonomies

For instance, multilingual ontologies, thesauri and taxonomies could be constructed from term analysis of DL document corpora.

Page 15: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve15 11/7/2004

Kent State University Kent State University Internationalizing Metadata

Internationalizing a DL not only involves providing and controlling translations of the content and metadata descriptive elements.

Internationalizing a metadata schema also involves determining the elements and element attributes that could affect the scheme’s ability to be used for classification, search, retrieval, and reuse of learning objects in multicultural and multilingual contexts.

An internationalization strategy begins with specifying all metadata elements that are culturally and linguistically dependent. Ideally, internationalization is a goal during initial schema development. Unfortunately, as with IEEE-LOM, internationalization may involve existing data elements in a pre-existing schema. Additions and modifications to the elements and element set may be necessary recommended.

Page 16: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve16 11/7/2004

Kent State University Kent State University Culturally Dependent Metadata

LOM element 2.3.3. Date ( is an example of a culturally dependent meta-data element. CEN (European Committee for Standardization) suggests extensions to “internationalize” Date:

<DATETIME>2003-12-25</DATETIME><DATETIMELOCALE>

<LOCALE>US</LOCALE><SOURCE>http://standards.org/us/calendarSpecs.pdf</SOURCE><LOCALIZEDDATETIME>12/25/03</LOCALIZEDDATETIME>

</DATETIMELOCALE><DATETIMELOCALE>

<LOCALE>UK</LOCALE><LOCALIZEDDATETIME>25/12/03</LOCALIZEDDATETIME>

</DATETIMELOCALE><DATETIMELOCALE>

<LOCALE>AE</LOCALE><SOURCE>http://standards.org/ae/calendarNumSpecs.pdf</SOURCE><LOCALIZEDDATETIME>1/11/1424</LOCALIZEDDATETIME>

</DATETIMELOCALE><DATETIMELOCALE>

<LOCALE>AE</LOCALE><SOURCE>http://standards.org/ae/calendarTextSpecs.pdf</SOURCE><LOCALIZEDDATETIME>1 Dhu’l-Qa’dah 1424</LOCALIZEDDATETIME>

</DATETIMELOCALE>

Addresses

Calendar

Currency

Date

Numbers

Telephone

Time

Page 17: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve17 11/7/2004

Kent State University Kent State University Culturally Dependent Metadata Values

Some “universal” metadata elements have values that may be very culturally dependent. For instance, LOM 5.6 Educational. Context has a value space [school, higher education, training, other] that is not only extremely limited, but derives from a single cultural context. Different countries have different educational systems. The LOM values are often not applicable or do not have a real correspondence.1

Page 18: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve18 11/7/2004

Kent State University Kent State University Culturally Dependent Metadata Values

Although CEN has suggested simply “enlarging” the value space for such elements, true internationalization of these “system” dependent elements would involve providing a locale specification for the element so that a specific vocabulary could be retrieved.

<education locale=‘de-DE’><context> value space

</context></education>

KindergartenGrundschuleHauptschuleRealschuleGesamtschuleGymnasium…

KindergartenElementary SchoolMiddle SchoolHigh School…

<education locale=‘en-US’><context> value space

</context></education>

The ISO 639 language codes and the ISO 3166country codes do notallow for even more “local” localization.

In Germany, for instance, the Bavarian school system differs from the German “norm.”

Page 19: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve19 11/7/2004

Kent State University Kent State University Metadata: Translation?

Creating locale-specific value spaces for more “universal” data elements is a complex task. Localized value spaces cannot be achieved by simply translating the existing or default values.

KindergartenElementary School

Middle SchoolHigh School

en-US

KindergartenGrundschuleHauptschuleRealschuleGesamtschuleGymnasium…

Some values may have one-to-one equivalence. Others do not. Middle school (junior high) may include one or more of Hauptschule / Realschule / Gymnasium / Gesamtschule. The values imply different age ranges, different educational objectives and values and different social structures.

de-DE

Page 20: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve20 11/7/2004

Kent State University Kent State University Restricted Vocabularies

LectureVorlesungConferenciaConferenzaFöreläsningarForedragδιάλεξη

Multilingual / multicultural restricted vocabularies must be developed as standards by in-country domain experts. Equivalence should be contolled, standardized and authoritative.

exercisesimulationquestionnairediagramfiguregraphindexslidetablenarrative textexamexperimentproblem statementself assessmentlecture

LOM 5.2 Learning Resource Typeelement value space

authoritative equivalence

European Treasury Browser Controlled Vocabulary

validated mapping

Page 21: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve21 11/7/2004

Kent State University Kent State University Restricted Vocabularies

Multilingual / multicultural restricted vocabularies should be concept-based. For two vocabulary items to be equivalent they should represent the same concept. The concepts should be documented in authoritative multilingual glossaries such as those specified in ISO 12620. Such glossaries provide one of the bases for external parallel metadata methods.

concept

label

lecture Vorlesung Conferencia Conferenza Föreläsningar

Foredragδιάλεξη

ISO 12620 Glossary

Page 22: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve22 11/7/2004

Kent State University Kent State University

Concept objects are the core of terminology glossaries. They organize both monolingual and multilingual data. Organized into terminology glossary databases for computer-assisted translation, they are indispensable in today’s language industry.

Concept Object

Page 23: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve23 11/7/2004

Kent State University Kent State University KOS, Glossary and Concept

When concepts are documented in authoritative multilingual glossaries they can also provide the basis for KOS (knowledge organization systems) of use in concept-mediated monolingual and multilingual browsing and searching in DLs.

Page 24: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve24 11/7/2004

Kent State University Kent State University

• A terminology is concept-oriented.

• A terminology is documented in a glossary, not a dictionary.

• A terminology glossary is organized by concept, not by linguistic label.

• A term is the word, lexical string, or linguistic label used to designate a single concept in the language / culture / subculture of a special subject field.

• A glossary documents the multiple words or lexical strings (in a single language or in multiple languages) that designate a single concept.

• A glossary thus organizes synonyms (monolingual) and equivalents (multilingual) of a concept.

• The organization of a terminology system / glossary reflects the knowledge organization system of the domain it describes. It is also a Knowledge Organization System (KOS) document.

ISO 12620 Terminology Glossary

Page 25: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve25 11/7/2004

Kent State University Kent State University ISO 12620 Data Categories I

Glossary

concept description

concept <termEntry id="boundary conditions">

<descrip type="subjectField">Computational Materials Science</descrip>

<descrip type="definition"> Those physical and/or mechanical conditions existing around the surfaces and limits of a structural body.</descrip>

<admin type="source"> Composite Materials Dictionary: http://composite.about.com/library/glossary/blglossary-d.htm </admin>

language set

concept relations <descrip type="superordinateconcept" target="boundary "> boundary </descrip>

administration <admin type="originatingPerson">Adriana Luchian</admin>

ISO 12620 Data Categories

term (label) description

<langSet xml:lang="en-us">

Page 26: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve26 11/7/2004

Kent State University Kent State University

<tig><term>boundary conditions</term><date>4/12/03</date><descrip type="context">For solids with spatial discontinuities, such as bounded solids or those containing holes, crack, interfaces, etc., we need to satisfy some prescribed boundary conditions.</descrip><admin type="source">Computational Materials Science Corpus, Kent State University, March, 2003</admin></tig>

language set

term (label) information

<langSet xml:lang="en-us">

<tig><term>conditions limites</term><date>4/12/03</date><descrip type="context">Elles ont été appliquées au cas d'un objet impénétrable " mou " (condition de Dirichlet sur son contour) par C. Rozier et objet " dur " (condition de Neumann sur son contour) par E. Bocly et moi-même immergé dans un guide d'onde dont les parois sont impénétrables (la condition limite à la surface est de Dirichlet et sur le fond de Neumann).</descrip><admin type="source">Computational Materials Science Corpus, Kent State University, March, 2003</admin></tig>

language set

term (label) information

<langSet xml:lang=“fr-fr">

<termNote type='transferComment'>...</equivalence

ISO 12620 Data Categories II

Page 27: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve27 11/7/2004

Kent State University Kent State University ISO TC 37 Glossaries

Thomas Baker, in his discussion of the Dublin Core in multiple languages, laments the lack of “comprehensive dictionaries” for metadata labels and vocabularies.2

Many issues in multilingual, multicultural DL development revolve around cultural variation in concept description and concept systems (KOS) and establishing linguistic authority (access to authoritative terms, documentation of authority and availability of authoritative equivalents). What we really need to support DL metadata schemas is not a “dictionary,” but standards-based external internationalization strategies such as TMX translation memories and multilingual terminology glossaries as defined by ISO TC 37’s ISO 12620 and other standards.

ISO TC 37: Standardization of principles, methods and applications relating to terminology and other language resources.

Page 28: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve28 11/7/2004

Kent State University Kent State University ISO TC 37 Glossaries and Searching

A concept-based multilingual glossary can be implemented to support cross-language searching. A glossary can provide authority for keyword selection where multilingual equivalents are then included in “parallel” in the resource record. Alternatively, a glossary-based DL can make it unnecessary to include more than one local term in the resource record.

query Keyword L1 Glossary

Keyword L2

Keyword L3

Keyword L4concept-mediated multilingual search

Page 29: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve29 11/7/2004

Kent State University Kent State University Data Element Names

A glossary can also be implemented to provide localized labels for data element names. In the event there are “local” versions of a schema (a Dublin Core or IEEE-LOM not in English) that need to be equated for software exchange, or data elements that need to be explained (training, help files) or used in an interface (resource submission form) a glossary can provide authoritative multi-language labels for a canonical data element name and its attributes.

Glossary

token L1 element name

L2 element name

L3 element name

labelcanonical element name

or identifier

Page 30: 11/7/2004 Kent State University Shreve 1 Gregory M. Shreve Internationalizing Digital Libraries: Towards A Standards-Based Strategy

Shreve30 11/7/2004

Kent State University Kent State University

1. European Committee for Standardization. 2003. CEN Workshop Agreement 14643. Internationalisation of the IEEE Learning Object Metadata. ICS 03.180; 35.060; 35.240.99.

2. Baker, Thomas. 1997. Metadata Semantics Shared Across Languages: Dublin Core in languages other than English. http://dublincore.org/documents/multilingual-semantics/

3. European Schoolnet. Recommended data model format to be used as a standard by national systems to include national/local resources in the EU Treasury Browser. http://www.en.eun.org/etb/survey/d4.2.pdf

Conclusion & References

Adding multilingual and multicultural metadata to a DL involves:

1. Determining the metadata elements, attributes , value spaces and values that are culturally dependent and, if the display and interface are to be localized, those metadata elements that are to be rendered in multiple languages;

2. Providing external parallel strategies for localization;3. The external parallel system is a more robust localization

approach, providing control, administrative tools, authoritative terminology, and authority for translations and equivalents.

4. The external parallel system offers reusability, scalability and leverages the strengths of international standards.