10
CHLT Integration

CHLT Integration

  • Upload
    lorand

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

CHLT Integration. Integration in two directions. Interoperability with indexing structures of Perseus Digital Library Integration of parsers into indexing module of search and visualization tool. Integration with Structure of Perseus Digital Library. - PowerPoint PPT Presentation

Citation preview

Page 1: CHLT Integration

CHLT Integration

Page 2: CHLT Integration

Integration in two directions Interoperability with indexing

structures of Perseus Digital Library

Integration of parsers into indexing module of search and visualization tool

Page 3: CHLT Integration

Integration with Structure of Perseus Digital Library Perseus text display system transforms

XML and legacy SGML files tagged according to an arbitrary DTD and creates a consistent set of core data files that can be read by any application Sentences Chunks Lemmatized Inflected Catalog of works (PTEXT DB) Morphological Databases Short Definitions

Page 4: CHLT Integration

File Locations

The surrogate files are written to a location that is associated with the unique ID assigned to the document in the PDL.

Each chunk or sentence also has a unique identifier

These two pieces of information can be used: To generate URLs to access full text in DL To generate human readable citations of the

sentences according to scholarly conventions

Page 5: CHLT Integration

WP2 Integration: Word Profile Tool Word Profile tool reads lemmatized

files to acquire a complete list of words in IGL corpus

All frequency counts, display sentences, human readable citations, and links to full text are based on surrogate files generated by PDL.

Page 6: CHLT Integration

WP2 Integration: Multi-Lingual IR Tool Author and language selection routines in MLIR

tool is dynamically generated from PDL metadata catalog

Database of translation equivalents is created directly from SGML/XML and saved as a core data file that is available to other applications in the system

Translation Equivalence Program works with any TEI conformant dictionary. Dictionary selection screen updates dynamically.

Translated query is handed off to current PDL search engine and the visualization tool based on

documented APIs

Page 7: CHLT Integration

WP4 Integration: Old Norse Text and Parser Middleware translates Old Norse Parser

output to format used by PDL ISO Language tags in texts tell system to

use Old Norse morphology and link to Old Norse lexicon

PDL short definition program automatically extracts information from Zoega

Page 8: CHLT Integration

WP4 & 6: Corpus Integration

TEI makes corpus integration easy Old Norse texts and lexica and

Neo-Latin texts are tagged according to TEI standards

Documentation of tagging conventions.

Page 9: CHLT Integration

Parser Integration with WP1 Similar middleware can link LemLat to

PDL WP1 Visualization Tool also includes a

parsing/stemming step This program is designed generally to

work with many systems, not simply those created by PDL

Source code for LemLat and Old Norse so that search/visualization tool can be used to search Old Norse and Latin texts that are not part of PDL

Page 10: CHLT Integration

Next Steps:

Implementation of parser integration with WP1

Seamless integration of MLIR tool and production deployment

Improved documentation of tags required for OAI linking