It Takes a Village: Co-developing VedaWeb Research ... Vedaweb presentation.pdf · It Takes a Village: Co-developing VedaWeb, a Digital Research Platform for Old Indo-Aryan Texts

It Takes a Village: Co-developing VedaWeb, a Digital

Research Platform for Old Indo-Aryan Texts Börge Kiss (IDH), Daniel Kölligan (HVS), Francisco Mondaca (CCeH), Claes Neuefeind (IDH), Uta Reinöhl

(ASW), Patrick Sahle (CCeH) 05.03.2019

Research Goals

Traditional research with large corpora

- concordances / word indexes, lexica: make usage

patterns and frequencies visible

- determination of meanings, functions, syntactic

patterns based on researchers' individual

assessments and their "reading experience"

- problems: rather intuitive, subjective; the more texts,

the more intractable

Research Goals

- online platform allowing combined searches of (1)

lexical, (2) morphological, (3) metrical and (4) syntactic

information, e.g.

- (1): lexical fields: differences between words for x, e.g.

'man/woman' [Kazzazi 2001]; 'light' [Roesler 1997] etc.

- (2): use/distribution/functional difference of allomorphs:

e.g. áśv-a- ʻhorseʼ, nom.pl. áśvās / áśvāsas ‘horses’

- (http://ifl.phil-fak.uni-koeln.de/36486.html?&L=1)

- (3): position of forms in verse; word-shapes

- (4): information structure (topic/focus)

http://ifl.phil-fak.uni-koeln.de/36486.html?&L=1





Background

Rigveda

- oldest text of Indo-Aryan, part of Indo-European

language family, ca. 1300 / 1000 BC

- ca. 160.000 words (in 1028 hymns grouped into 10

books = "mandalas"); cf. Homer's Iliad + Odyssey =

ca. 190.000 words

- hymns to gods (Indra, Soma, Varuna, Mitra, …) recited

mostly during Soma sacrifice (juice of intoxicating

plant)

Further texts to be integrated: Atharvaveda (c. 170.000

words), Yajurveda; Vedic prose: Aitareya Brahmana (c.

100.000 words), Maitrayani Samhita (c. 120.000 words)

Background

Data

- morphology

- annotation provided by Prof. G. Dunkel, Prof. P.

Widmer et al., University of Zurich

- metre

- Prof. K. Ryan, University of Harvard

- syntax

- Prof. H. Hettrich (University of Würzburg), Dr. O.

Hellwig (University of Düsseldorf);

- Dr. U. Reinöhl (University of Cologne/Mainz) using

GRAID (Grammatical Relations and Animacy in

Discourse)

Team

CCeH/DCH

Apl. Prof. Dr. Patrick Sahle, P.I.

Francisco Mondaca, M.A.

Jonathan Blumtritt, M.A.

Martina Gödel, M.A.

IDH - Spinfo

Dr. Claes Neuefeind, P.I.

Börge Kiss, M.A.

ASW/HVS

PD Dr. Daniel Kölligan, P.I.

Dr. Uta Reinöhl , P.I.

Jakob Halfmann

Natalie Korobzow

Felix Rau, M.A.

Co-operation partners

Prof. Dr. Paul Widmer, Universität Zürich Dr. Salvatore Scarlata, Universität Zürich Prof. Dr. Kevin Ryan, University of Harvard Dr. Dieter Gunkel, University of Richmond Prof. Dr. Laurent Romary, Inria/HU Berlin, TEI Prof. Dr. Nikolaus P. Himmelmann, Universität zu Köln

VedaWeb: A digital platform for working with Old Indic texts

make available RV + translations + morphological glossings for view & export

connecting all word-forms of the annotated RV with the corresponding lexical entries in Grassmann, Böhtlingk / Roth, Monier Williams and vice versa

allowing combinatorial searches of lemmas, word-forms, morphological and metrical information via cascading search index

State of the Art

revisions & additions of Zurich glossings

development of data model and APIs for dictionaries (Francisco Mondaca)

development of web application (Börge Kiss)

integration of further resources

Morphological Glossings (Zurich)

Translations: German, English, French, Latin, Russian…

Workflow

TEI - Modelling

Appropriate data model is of central importance for consistence, transfer, persistence and presentation

TEI (Text Encoding Initiative) offers the best way for textual data to persist in time, due to its active community of scholars and a detailed documentation. It’s the de facto standard in Digital Humanities projects.

modelling of texts (RV, translations) and dictionaries (Grassmann; Vedic Index of Names and Subjects)

Software Architecture

VedaWeb App

http://vedaweb.uni-koeln.de

http://vedaweb.uni-koeln.de/



Cooperation within the project

not traditional "chasm" between IT and humanities people, but rather different ranges of competences and overlapping responsibilities:

"family constellation"

Cooperation within the project

overlap of competence areas makes project feasible

regular communication

close feedback loops

gitlab, issue tracking system

regular team meetings (once a month)

simple and challenging issues

different expectations of what is easy and difficult to implement, e.g.

multiple, combinable full-text search

search functions over diversely structured sets of data

complex structure of the base text:

books, hymns, verses, half-verses

different counting systems (by books, by hymns)

different text versions (editions; lemmas and annotations; "padapatha")

learning from each other

for linguists:

insights into opportunities provided by digital research platforms

getting to know affordances of data for building an online platform and ensure data longevity (TEI)

for technical researchers:

complexity of ancient texts (internal structure, variation, different layers of form and meaning)

interests of linguists and other humanities scholars in the data

both:

make one's terminology explicit and clear

make the data consistent

improved collaboration

general understanding

for DH researchers:

of the objects studied in various humanities disciplines and the relevant research questions and methods

for humanities scholars:

of the different fields and methods in DH (e.g. building a web-platform vs data modelling in TEI)

Future plans: next version

metrical data (D. Gunkel/K. Ryan)

audio & video:

some recordings of A. Daniélou available

complete recording of RV in Copenhagen - not really available

http://www.kb.dk/en/nb/samling/os/Sydasien/veda.html

texts: Atharvaveda, Maitrayani Samhita

annotation layers / user accounts: GRAID etc.

semantic search … (Semantic Web)

C-SALT : Cologne South Asian Languages and Texts

http://c-salt.uni-koeln.de/

overview of projects and digital resources related to South Asian languages, texts, and culture at the University of Cologne (TEI Sanskrit dictionaries, Pali dictionary…)

C-SALT coordinates the activity of these projects and facilitates sustainable development of the diverse resources.

further plans:

Iranian (Avestan corpus + annotation; digital version of Bartholomae's dictionary; Middle Persian texts)

Nuristani (A. Degener [Mainz]: Kalasha-Ala, Prasun)







धन्यवाद

Thank you!

Documents

It Takes a Village: Co-developing VedaWeb Research ... Vedaweb presentation.pdf · It Takes a Village: Co-developing VedaWeb, a Digital Research Platform for Old Indo-Aryan Texts