Upload
dhlab
View
542
Download
0
Embed Size (px)
Florentina Armaselu – DHLab, Centre virtuel de la connaissance sur l’Europe (CVCE), Luxembourg [email protected]
1www.cvce.eu
From a Small-Scale Digital Edition to a TEI Publication Framework in Modern
European History
Text Encoding Initiative (TEI) Conference and Members’ Meeting. Connect, Animate, Innovate. 28 to 31 Octobre 2015. Université Lumière Lyon 2
1. The WEU-DIPLO pilot project
2. Transviewer, towards a TEI publication framework
3. Discussion
4. References
Summary
2
Part I The WEU-DIPLO pilot project
3
1. Goal: XML-TEI encoding, corpus analysis and Web publication of institutional documents of the W.E.U. (Western European Union):• Topics: armament production, standardization, control in the period from 1954 to 1982;• Source: Archives nationales de Luxembourg, W.E.U collection.
2. Initial format: • digitized versions (JPG) of typewritten materials (one file per page).
3. Size:
*proc. = processed
Overview of the WEU-DIPLO project
Part I. WEU-DIPLO pilot 4
Category Number of documents
Number of documents per language
Number of pages
Number of pages per language
EN FR FR proc.* EN FR FR proc.*
Note 89 43 46 37 395 191 204 155Minutes 30 15 15 15 256 138 118 118Memorandum 3 1 2 2 16 7 9 9Study 2 0 2 1 12 0 12 8
Discourse 1 0 1 0 4 0 4 0Draft protocol 2 1 1 0 4 2 2 0
Total 127 60 67 55 687 338 349 290
Overview of the WEU-DIPLO project: workflow
Part I. WEU-DIPLO pilot 5
Overview of the WEU-DIPLO project: page structure. ©WEU-UEO
Part I. WEU-DIPLO pilot 6
Header
Content
Footer
Microsoft Word Styling – WEU-DIPLO
Part I. WEU-DIPLO pilot 7
Headers, footers
Headings, line breaks, paragraphs
Conversion and enrichment (XSLT, manual, NER)
Part I. WEU-DIPLO pilot 8
OxGarage (DOCX to TEI P5)
oXygen XML Editor• XSLT transformation (metadata, structure); • manual enrichment (semantics – discourse of
country/institutional representatives)
GATE (Name Entity Recognition)• training phase (Gazetteer List Collector)• annotation phase (names of persons,
organisations, places, functions, events, products; dates) oXygen XML Editor
• XSLT (GATE XML to TEI P5 transformation)
XML-TEI Encoding: WEU-DIPLO - metadata; layout (header). ©WEU-UEO
Part I. WEU-DIPLO pilot 9
@@hAuthor
@@hArchNum
@@hStampConfid@@hDocRef
@@hOrigDate
@@hOrigLang
@@hVersion
XML-TEI Encoding: WEU-DIPLO – Structure (headings, paragraphs, line breaks); semantics (named entities, discourse). ©WEU-UEO
Part I. WEU-DIPLO pilot 10
@@Heading2@@Paragraph @@LineBreak@@Names
@@Discourse
XML-TEI Encoding: WEU-DIPLO – transcription features (Pierazzo, 2011)
Part I. WEU-DIPLO pilot 11
Part II Transviewer, towards a TEI
publication framework
12
• Treaties; official declarations and meeting reports; letters; notes; press articles; images, video and audio archives related to European integration history
Context: The CVCE’s ePublications
Part II. Transviewer 13
1. Transviewer concept:• XML-TEI transformation/visualisation on the fly, in the browser• flexible framework for the publication of XML-TEI documents in European
integration history;
2. Technologies : • XML, HTML, XSLT, CSS and JavaScript
3. Tested platforms:• EVT (Edition Visualization Technology): http://sourceforge.net/projects/evt-project/
• KILN : http://kiln.readthedocs.org/en/latest/#
• TEIBoilerplate : http://dcl.ils.indiana.edu/teibp/
• Versioning Machine: http://v-machine.org/
• XTF (eXtensible Text Framework): http://xtf.cdlib.org/about/
Transviewer overview
Part II. Transviewer 14
Implementation (adaptation and in-house development):• side-by-side view digital facsimile and transcription (EVT model)
• third-party libraries:o BookReader: tool designed to provide online access to scanned books o Saxon-CE: support for XSLT 2.0 transformation in the browser
o in-house development (configuration, frames and buttons layout/actions, transcription rendering, third-party libraries calls)
Transviewer prototype
Part II. Transviewer 15
Transviewer experiments– digital facsimile/transcription side-by-side view. ©WEU-UEO
Part II. Transviewer 16
Transviewer experiments– digital facsimile/transcription side-by-side view. Werner – handwritten notes
Part II. Transviewer 17
Transviewer experiments (simulation) – video/audio and transcription synchronisation. Werner - interviews
Part II. Transviewer 18
Transviewer features – panels layouts
Part II. Transviewer 19
Transviewer features – transcription format
Part II. Transviewer 20
Transviewer features – panels interlinking
Part II. Transviewer 21
Part III Discussion
22
“By teaching an edition how to swim, I mean endowing an edition not only with a store of factual knowledge concerning the work presented, but also with the capability of dealing gracefully with the mutability of the electronic medium, by exploiting the possibilities for reader-controlled changes to the edition’s presentation and by adapting successfully to rapid changes in the hardware and software environment.” (Sperberg-McQueen, 2009)
1. Transviewer prototype questions:• flexible enough to support different types of documents in
European integration history and different user requirements; • modular architecture to allow gradual development and
customisation according to the needs of the projects;• balance manual interventions/automatic processing (XSLT, NER);• XML transformation on the fly (no need for intermediary
formats/steps, changes to the XML already part of the publication).
Discussion
Part III. Discussion 23
3. Issues: • BookReader – use of an older version of jQuery library;• non-uniform support of Saxon-CE for XSLT 2.0 transformation in the
browsers;• need for batch conversion to XML-TEI (potential adaptation of
OxGarage for batch processing).4. Ongoing/future work for further development:
• evaluation (technology – technical experts; usability tests – experts in European integration studies);
• development of new modules (multi-panels, audio/video transcription, etc.) and tests with more project samples;
• integration into the existing CVCE’s Website architecture:o Back End;o Front End.
Discussion
Part III. Discussion 24
Thank you!
Discussion
25
Scaling in a publication framework would imply not only teaching your editions “how to swim” but also how to swim
together.
• Book Reader: https://openlibrary.org/dev/docs/bookreader
• EVT (Edition Visualization Technology): http://sourceforge.net/projects/evt-project/
• GATE: https://gate.ac.uk/
• KILN : http://kiln.readthedocs.org/en/latest/#
• OxGarage: http://www.tei-c.org/oxgarage/
• Pierazzo, Elena. (2011). A rationale of digital documentary editions. In LLC. The Journal of Digital Scholarship in the Humanities, Vol. 26, No. 4, December 2011, pp. 463-477.
• http://www.scholarlyediting.org/2014/essays/essay.pierazzo.html.
• TEIBoilerplate : http://dcl.ils.indiana.edu/teibp/ • TEI (Text Encoding Initiative): http://www.tei-c.org • Versioning Machine: http://v-machine.org/ • Saxon-CE: http://www.saxonica.com/ce/user-doc/1.1/index.html• Sperberg-McQueen, C.M. 2009. “How to teach your edition how to swim”. In LLC. The Journal of Digital
Scholarship in the Humanities. Volume 24, No. 1, April 2009. Oxford Journals.• XTF (eXtensible Text Framework): http://xtf.cdlib.org/about/
References
26