26
Laying the Foundations for a Diachronic Dictionary of Tunis Arabic A First Glance at an Evolving New Language Resource Karlheinz Mörth 1 , Stephan Procházka 2 , Ines Dallaji 2 1 Institute of Corpus Linguistics and Text Technology (Austrian Academy of Sciences) 2 Department of Oriental Studies (University of Vienna) [email protected] [email protected] [email protected]

Karlheinz Mörth 1 , Stephan Procházka 2 , Ines Dallaji 2

  • Upload
    phuong

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Laying the Foundations for a Diachronic Dictionary of Tunis Arabic A First Glance at an Evolving New Language Resource. Karlheinz Mörth 1 , Stephan Procházka 2 , Ines Dallaji 2 1 Institute of Corpus Linguistics and Text Technology ( Austrian Academy of Sciences ) - PowerPoint PPT Presentation

Citation preview

Page 1: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Laying the Foundations for a Diachronic Dictionary of Tunis Arabic

A First Glance at an Evolving New Language Resource

Karlheinz Mörth1, Stephan Procházka2, Ines Dallaji2

1Institute of Corpus Linguistics and Text Technology (Austrian Academy of Sciences)2Department of Oriental Studies (University of Vienna)

[email protected]@[email protected]

Page 2: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

IntroductionTwo projects

Vienna Corpus of Arabic Varieties (VICAV)Linguistic Dynamics in the Greater Tunis Area: A Corpus-

based Approach (TUNICO)

Text technology + Linguistics

Page 3: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

IntroductionVICAV

==> Vienna Corpus of Arabic Varieties

Digital language resources of a wide range of spoken Arabic varieties: dictionaries, corpora, bibliographies, language profiles, best practices

Cooperation of University of Vienna and the Austrian Academy of Sciences

http://corpus3.aac.oeaw.ac.at/vicav2/

Page 4: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

IntroductionVICAV

Page 5: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

IntroductionVICAV

Page 6: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

IntroductionVICAV

Page 7: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

IntroductionTUNICO

==> Linguistic Dynamics in the Greater Tunis Area: A Corpus-based Approach

Funded by the Austrian Science Fund (FWF, P 25706-G23)

Main objectives:Linguistic exploration of spoken, contemporary ArabicTwo digital language resources

Corpus of spoken youth languageDictionary of Tunis Arabic

Page 8: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Arabic dialect lexicography

No comprehensive dictionary of the Arabic dialect of Tunis

Basis for diachronic research:• Nicolas, A. (1911). Dictionnaire français-arabe• Beaussier, M. (2006). Dictionnaire pratique arabe-français (arabe maghrébin)

• Quéméneur, J. (1961). “Notes sur quelques vocables du parler Tunisien”

• Quéméneur, J. (1962). “Glossaire de dialectal”• Abdellatif, K. (2010). Dictionnaire «le Karmous» du Tunisien • Marçais, W. , Guîga, A. (1958-61). Textes arabes de Takroûna. II: Glossaire

Page 9: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis Arabic

- micro-diachronic and machine-readable- up-to-date and easily accessible lexical information - incorporation of:

a) contemporary data from a digital corpusb) various historical sources (e.g. Stumme, H.)

- information added is kept traceable to its origin

- basis: data taken from didactic materials - 3 other main sources: newly created corpus, interviews and historical publications

Page 10: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis ArabicContemporary sources

1) Corpus of spoken youth language (dialogues, narratives):

uncommon approach in Arabic dialectology: dialectological interests in language of older people --> only olderforms of particular varieties knownfocus on modern language, contemporary usage and lexicalneologisms

2) Additional interviews to complete the data gained from corpus and historical sources

Page 11: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis ArabicHistorical sources

- 800-page grammar of the Medina of Tunis by Hans-Rudolf Singer (1984): evaluation of data, integration of excerpted lexicographic data into dictionary

- Verification and completion of collected data with other historical resources

- Diachronic dimension helps to understand processes in the development of the lexicon

- Material gathered will allow analysis of recent developments (migration of parents from rural areas, influence by other Arabic varieties, influence of revolution, foreign elements)

Page 12: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis Arabic

Page 13: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis ArabicTechnical issues

Modelling the dataTools

Page 14: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis ArabicTechnical issues

Single schema for a range of dictionaries LMF, RDF, SKOS, TEI (P5)

Page 15: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis ArabicTechnical issues

Using the TEI dictionary module to encode digitised print dictionaries is a fairly common standard procedure in digital humanities.

The TEI dictionary module needs to be further constrained:• to enhance interoperability• to reduce alternate constructs• to achieve a high degree of compliance with LMF (ISO

24613)

Easy to impose in the creation of digitally born dictionaries.

Page 16: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis ArabicBasic schema

<TEI> <teiHeader> ... </teiHeader>

<text> <body> <div type="entries"> <entry>...</entry> <entry>...</entry> <entry>...</entry> ... ... ... </div> </body> </text></TEI>

Page 17: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis ArabicBasic schema

<body> <div type="entries"> <entry>...</entry> <entry>...</entry> <entry>...</entry> ... ... ... </div> <div type="examples"> <cit type="example">...</cit> <cit type="example">...</cit> <cit type="example">...</cit> ... ... ... </div> </body>

Page 18: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis ArabicBasic schema

<entry id="ktaab_001"> <form type="lemma"> <orth lang="ar-aeb-x-tunis-vicav">ktāb</orth></form>

<form type="inflected" ana="#n_pl"> <orth lang="ar-aeb-x-tunis-vicav">ktub</orth></form> 

<gramGrp> <gram type="pos">noun</gram> <gram type="root" lang="ar-aeb-x-tunis-vicav">ktb</gram> </gramGrp>

<sense> <cit type="translation" lang="en"> <quote>book</quote></cit>   <cit type="translation" lang="de"> <quote>Buch</quote></cit>   <cit type="translation" lang="fr"> <quote>livre</quote></cit> </sense> </entry>

Page 19: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis ArabicRepresenting diachrony

…<bibl> <author>Ritt-Benmimoun</author> <date>2014</date></bibl>

<bibl> <author>Singer</author> <date>1958</date> <biblScope unit="page">56</biblScope></bibl>

Page 20: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis ArabicDocumentation

http://corpus3.aac.ac.at/vicav2/query/ tools/dictionary_encoding_guidelines

Page 21: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis ArabicTools

Viennese Lexicographic Editor (VLE)XML editor providing functionalities typically needed in compiling

lexicographic dataWeb-based standalone applicationDesigned to process standard-based lexicographic and

terminological data such as LMF, TBX, RDF or TEI.Automating proceduresFreely configurable visualisation (via XSLT)Validation: MSXML SchemaClient-server architecture (php + mysql) Freely available and easy to setup

Page 22: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis ArabicTools

Page 23: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis ArabicTools

Corpus – Dictionary interfacetokenEditorSpecialised Web-browser

Page 24: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis ArabicTools

corpus_shell... a modular framework of reusable software components to access and publish heterogeneous and distributed language resources such as language corpora, dictionaries, encyclopaedic databases, prosopographic databases, bibliographies, metadata, and schemata.

Language Resources Portalclarin.oeaw.ac.at/ccv/corpus_shell.clarin.oeaw.ac.at/ccv/

Page 25: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

Dictionary of Tunis ArabicStatus and outlook

CLARIN-ERIC (Common Language Resources and Technology Infrastructure).

Open access and open source.~5000 entries

Page 26: Karlheinz Mörth 1 , Stephan  Procházka 2 , Ines Dallaji 2

النتباهكم ! شكراً

Karlheinz Mörth1, Stephan Procházka2, Ines Dallaji2

1Institute of Corpus Linguistics and Text Technology (Austrian Academy of Sciences)2Department of Oriental Studies (University of Vienna)

[email protected]@[email protected]

Thank you for your attention!