32
Linking Text References to Linking Text References to Relevant Digital Resources Relevant Digital Resources Over The Web Over The Web Electronic Electronic Corpora for Corpora for Ancient Ancient Languages Languages Prague, Prague, November November 16 16 th th -17 -17 th th 2007 2007 Matteo Romanello [email protected] University “Ca' Foscari” of Venice

M.Romanello Ecal Presentation

Embed Size (px)

Citation preview

Page 1: M.Romanello Ecal Presentation

Linking Text References to Linking Text References to

Relevant Digital Resources Relevant Digital Resources

Over The WebOver The Web

Electronic Electronic

Corpora for Corpora for

Ancient Ancient

LanguagesLanguages

Prague,Prague,

NovemberNovember

1616thth -17 -17thth

20072007

Matteo Romanello [email protected]

University “Ca' Foscari” of Venice

Page 2: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 2/32

• Topic: how to link secondary sources to corpora of ancient languages texts?

• Goal: to give scholars reading the Digital Library's primary and secondary sources more powerful research tools and a richer reading experience

• Focus: references to Canonical Texts in XHTML

• Examples' Scope: Classical (Greek and Latin) literature

A Microformat for Canonical Texts References

Page 3: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 3/32

• A few of on-line secondary sources (journal articles and monographies) available as (X)HTML

• A few of on-line authoritative and born-digital journals: e.g. Classics@ published by the Harvard's Center for Hellenic Studies

• Some On-line Text Corpora (Perseus and other minor scattered collections)

• Some resources and reviews of electronic resources for humanists, reviews of books...

• Research blogs

Digital Library on Classics: the State of the Art

Page 4: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 4/32

Scenario 1

John is a scholar on Greek Literature and wants to find all on-line articles or electronic resources related to the verse he is focusing on (Hom. Il. 20.249).

Then he submits to Google a query like 'Hom. Il. 20.249' and what Google retrieves is not pertinent or interesting. Ordinary search engine are just a text based (no semantics, language dependent etc.).

John would have a more precise or specialized search engine available, perhaps capable of understanding the semantic of the reference he typed in as query string.

Current e-scholarship scenarios (1)

Author of Iliad and Odissey

Homer Homère

Omero

n...

Page 5: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 5/32

Scenario 2

John's colleague points out to him that Gregory Nagy within a passage of 2nd chapter mentions the passage John is interested about. John finds an on-line version of the book and open it up in his browser...

Current e-scholarship scenarios (2)

Page 6: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 6/32

In order to have a significant e-reading experience, John would be able to read the cited verse in its context, to compare the text of that verse as recorded in different manuscripts, to read the same passage in a given translation or read a commentary on it.

Current e-scholarship scenarios (3)

Page 7: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 7/32

New e-scholarship scenarios (1)

• Semantic understanding of text references by web browser

• Research of resources pertinent to the author, the work or the precise text passage referred to

Page 8: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 8/32

New e-scholarship scenarios (2)

• Value added services (VAS) for scholars

– Reference linking– Related resources– Targeted and

semantic-oriented search

– Different exemplars of a work

• Problems:

1) To build a distributed library

2) To provide VAS linking secondary to primary sources

Page 9: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 9/32

• Find new constructive paradigms to take advantage of net's properties

• In a network environment:– Library universally distributed and with higher

granularity – Provide reference linking

• Reference linking to primary sources (from references in secondary sources):– Ex. move from the citation Hom. Il. 1.1 to all available

translations, comparing critical editions and finding related resources

From printed to digital libraries

Page 10: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 10/32

• TLG (1970s) -> mass storage and rapid retrieval

• Perseus (1980s) -> richer media and higher level data structures

• DLs + web protocol -> convergence of– XML related technologies:

• TEI (encoding)• XML Db (storage of structured data)• Query capabilities over http protocols

– Web services communication over REST protocol– Success of a distributed architecture (cfr. OAI-MHP)

Which protocol? Canonical Text Services protocol

The evolution of ancient languages corpora

Page 11: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 11/32

• CTS web protocol:– new paradigm for building electronic corpora– gives hierarchical access to works as XML-TEI files– lies on the model described by FRBR– developed by Neel Smith et al. at Harvard's CHS– Built on the Registry Services Protocol (v. 1.0.rc1) ->

authority lists

• Some CTS related projects:– Perseus' CTS interface– Multitext Homer

A new paradigm for building on-line corpora: the CTS protocol (1)

Page 12: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 12/32

• Text Server CTS-compliant

• Texts: XML TEI

• Textgroup and Works are identified by URNs

• Collections described by authority lists

A new paradigm for building on-line corpora: the CTS protocol (2)

Page 13: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 13/32

Reference Linking in the Digital Library

Page 14: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 14/32

• Two very loosely coupled systems

• No born-digital equivalent to printed references

• Most of projects use an internal linking system:– Worthy degree of hypertextuality– Fairly closed systems of hard-linked resources

• Digital references == strings– No semantic information– No aware information processing– Disambiguation of abbreviations and implicit

statementes is left to the reader

Linking primary to secondary sources on-line: state of the art

Page 15: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 15/32

• Problem: provide a digital companion to printed references– to express references in a simple and semantic way

• exploiting the opportunities given by the digital medium• Separating semantics from presentational matters

• Solution: – mapping references to requests compliant to the

protocol to build a distributed library (CTS)– embedding chunks of semantic information within

XHTML docs

• Implementation: Microformats (from Web 2.0)

• Goal: to design a Microformat for Canonical Text references

A digital companion to printed canonical texts references

Page 16: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 16/32

• Mfs = a bottom-up way to Semantic Web (real world semantics or lower-case semantic web)

• Used within blogs for friendships, geographical data, reviews...

• Firefox 3 -> native support for Microformats (microformatted content display integrated in the UI)

• Not the only way to embed metadata inside common tag elements– RDFa <http://www.w3.org/TR/xhtml-rdfa-primer/>

proposed by W3C

Microformats or RDF?

Page 17: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 17/32

Microformats vs RDF

Microformats

Page 18: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 18/32

Microformats or RDF?

Page 19: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 19/32

Microformats or RDF?

Page 20: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 20/32

• Microformats are not:– A new language– An attempt to change

everyone's current behavior

• Goals:– Make data reusable and

interoperable among webservices and mashup applications

• Microformats are:– XHTML (POSH)

compounds– A set of design

principles for formats– set of simple open data

formats built upon existing and widely adopted standards

Microformats: definition

Page 21: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 21/32

1. Politics

2. like Aristotle claims

3. Politics of Aristotle

4. Artist. Pol. 1304B

5. Line 1 of the first book of Homer's Iliad

6. Hom. Il. I 1

7. Α 1 (== Upper-case Alpha 1, hellenistic books notation)

Texts references: different use cases

Page 22: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 22/32

• Start from a specific problem (principle #1)– Problem: link secondary to primary sources on the web

• Reuse building blocks from widely adopted standards (princ. #4)– Canonical texts citation scheme widely used among

scholars on Classical Literature– Canon of Greek Literature provided as authority list

compliant to the Registry Services Protocol

• “Paving the Cowpaths”– keep the references appearing the same way as now,

regarding to their appearance– Besides add semantics to references– Allow also internal linking systems

Designing a MF for Canonical Texts References (1)

Page 23: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 23/32

• Modularity and embeddability (princ. #5)

Designing a MF for Canonical Texts References (2)

1. MF for author

2. MF for works

3. MF for Text references

Page 24: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 24/32

Reference appearance

Reference underlying microformatted content

Designing a MF for Canonical Texts References (3)

urn:cts:greekLit:tlg0012:tlg001:20.131-20.137urn:cts:greekLit:tlg0012:tlg001:20.131-20.137

Page 25: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 25/32

• Get some valid microformatted references

• Tag resources from a popular review with urns instead of simple tags

• Make the browser aware of microformatted contents adding support for CTSreference MF to Operator extension for Firefox

• Add exemplifying actions to perform upon each MF:– find pertinent bookmarks on del.icio.us– search for pertinent research articles on CiteUlike

The Microformat in action

Page 26: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 26/32

Some microformatted references

Recognized microformats

Available actions

Green icons means that Operator is working...

The Microformat in action

Page 27: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 27/32

The Microformat in action

Page 28: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 28/32

The Microformat in action

Page 29: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 29/32

The Microformat in action

Page 30: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 30/32

• Citations encoded with a MF express references in a form:– Cross-language– Fully semantic, interoperable– reusable

• The reference linking system produced is:– Open (client-side based)– Independent from specific solutions

• Microformatted references allow:– targeted search -> more precise Information Retrieval

tools (Pingerati: microformats search engine provided by developers at Technorati)

Benefits for scholarship on Ancient Languages

Page 31: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 31/32

• Discussion on Microformats' mailing lists and wiki• Advocacy and support by real projects • Support of a digital library built upon CTS protocol• Urns as semantic tags and keywords in metadata

description• Tools for easy authoring• Webservices taking advantage of such a MF:

– An application that manages and exports references with several output formats to desktop applications

– harvester of CTS repositories

TODOs

Page 32: M.Romanello Ecal Presentation

Matteo Romanello Electronic Corpora for Ancient Languages - Prague, November 16th -17th 2007 32/32

• John Allsopp, Microformats: Empowering Your Markup for Web 2.0, Berkeley, CA : friends of ed.; New York : Distributed to the book trade by Springer Verlag, 2007

• Neel Smith, “TextServer: Toward a Protocol for Describing Libraries”, Classics@ vol. 2, edition of April 3, 2004.

• G. Crane et al., 'Beyond digital incunabula: Modeling the next generation of digital libraries', Proceedings of the 10th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2006) vol. 4172.

• The Canonical Text Services (CTS) Protocol, current version: 1.1<http://katoptron.holycross.edu/cocoon/diginc/specs/cts>

• The Registry Services Protocol, current version: 1.0.rc1 <http://katoptron.holycross.edu/cocoon/diginc/specs/registry>

References