Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
It Takes a Village: Co-developing VedaWeb, a Digital
Research Platform for Old Indo-Aryan Texts Börge Kiss (IDH), Daniel Kölligan (HVS), Francisco Mondaca (CCeH), Claes Neuefeind (IDH), Uta Reinöhl
(ASW), Patrick Sahle (CCeH) 05.03.2019
Research Goals
Traditional research with large corpora
- concordances / word indexes, lexica: make usage
patterns and frequencies visible
- determination of meanings, functions, syntactic
patterns based on researchers' individual
assessments and their "reading experience"
- problems: rather intuitive, subjective; the more texts,
the more intractable
Research Goals
- online platform allowing combined searches of (1)
lexical, (2) morphological, (3) metrical and (4) syntactic
information, e.g.
- (1): lexical fields: differences between words for x, e.g.
'man/woman' [Kazzazi 2001]; 'light' [Roesler 1997] etc.
- (2): use/distribution/functional difference of allomorphs:
e.g. áśv-a- ʻhorseʼ, nom.pl. áśvās / áśvāsas ‘horses’
- (http://ifl.phil-fak.uni-koeln.de/36486.html?&L=1)
- (3): position of forms in verse; word-shapes
- (4): information structure (topic/focus)
Background
Rigveda
- oldest text of Indo-Aryan, part of Indo-European
language family, ca. 1300 / 1000 BC
- ca. 160.000 words (in 1028 hymns grouped into 10
books = "mandalas"); cf. Homer's Iliad + Odyssey =
ca. 190.000 words
- hymns to gods (Indra, Soma, Varuna, Mitra, …) recited
mostly during Soma sacrifice (juice of intoxicating
plant)
Further texts to be integrated: Atharvaveda (c. 170.000
words), Yajurveda; Vedic prose: Aitareya Brahmana (c.
100.000 words), Maitrayani Samhita (c. 120.000 words)
Background
Data
- morphology
- annotation provided by Prof. G. Dunkel, Prof. P.
Widmer et al., University of Zurich
- metre
- Prof. K. Ryan, University of Harvard
- syntax
- Prof. H. Hettrich (University of Würzburg), Dr. O.
Hellwig (University of Düsseldorf);
- Dr. U. Reinöhl (University of Cologne/Mainz) using
GRAID (Grammatical Relations and Animacy in
Discourse)
Team
CCeH/DCH
Apl. Prof. Dr. Patrick Sahle, P.I.
Francisco Mondaca, M.A.
Jonathan Blumtritt, M.A.
Martina Gödel, M.A.
IDH - Spinfo
Dr. Claes Neuefeind, P.I.
Börge Kiss, M.A.
ASW/HVS
PD Dr. Daniel Kölligan, P.I.
Dr. Uta Reinöhl , P.I.
Jakob Halfmann
Natalie Korobzow
Felix Rau, M.A.
Co-operation partners
Prof. Dr. Paul Widmer, Universität Zürich Dr. Salvatore Scarlata, Universität Zürich Prof. Dr. Kevin Ryan, University of Harvard Dr. Dieter Gunkel, University of Richmond Prof. Dr. Laurent Romary, Inria/HU Berlin, TEI Prof. Dr. Nikolaus P. Himmelmann, Universität zu Köln
VedaWeb: A digital platform for working with Old Indic texts
make available RV + translations + morphological glossings for view & export
connecting all word-forms of the annotated RV with the corresponding lexical entries in Grassmann, Böhtlingk / Roth, Monier Williams and vice versa
allowing combinatorial searches of lemmas, word-forms, morphological and metrical information via cascading search index
State of the Art
revisions & additions of Zurich glossings
development of data model and APIs for dictionaries (Francisco Mondaca)
development of web application (Börge Kiss)
integration of further resources
TEI - Modelling
Appropriate data model is of central importance for consistence, transfer, persistence and presentation
TEI (Text Encoding Initiative) offers the best way for textual data to persist in time, due to its active community of scholars and a detailed documentation. It’s the de facto standard in Digital Humanities projects.
modelling of texts (RV, translations) and dictionaries (Grassmann; Vedic Index of Names and Subjects)
VedaWeb App
http://vedaweb.uni-koeln.de
Cooperation within the project
not traditional "chasm" between IT and humanities people, but rather different ranges of competences and overlapping responsibilities:
"family constellation"
Cooperation within the project
overlap of competence areas makes project feasible
regular communication
close feedback loops
gitlab, issue tracking system
regular team meetings (once a month)
simple and challenging issues
different expectations of what is easy and difficult to implement, e.g.
multiple, combinable full-text search
search functions over diversely structured sets of data
complex structure of the base text:
books, hymns, verses, half-verses
different counting systems (by books, by hymns)
different text versions (editions; lemmas and annotations; "padapatha")
learning from each other
for linguists:
insights into opportunities provided by digital research platforms
getting to know affordances of data for building an online platform and ensure data longevity (TEI)
for technical researchers:
complexity of ancient texts (internal structure, variation, different layers of form and meaning)
interests of linguists and other humanities scholars in the data
both:
make one's terminology explicit and clear
make the data consistent
improved collaboration
general understanding
for DH researchers:
of the objects studied in various humanities disciplines and the relevant research questions and methods
for humanities scholars:
of the different fields and methods in DH (e.g. building a web-platform vs data modelling in TEI)
Future plans: next version
metrical data (D. Gunkel/K. Ryan)
audio & video:
some recordings of A. Daniélou available
complete recording of RV in Copenhagen - not really available
http://www.kb.dk/en/nb/samling/os/Sydasien/veda.html
texts: Atharvaveda, Maitrayani Samhita
annotation layers / user accounts: GRAID etc.
semantic search … (Semantic Web)
C-SALT : Cologne South Asian Languages and Texts
http://c-salt.uni-koeln.de/
overview of projects and digital resources related to South Asian languages, texts, and culture at the University of Cologne (TEI Sanskrit dictionaries, Pali dictionary…)
C-SALT coordinates the activity of these projects and facilitates sustainable development of the diverse resources.
further plans:
Iranian (Avestan corpus + annotation; digital version of Bartholomae's dictionary; Middle Persian texts)
Nuristani (A. Degener [Mainz]: Kalasha-Ala, Prasun)