9
Multi-lingual & multi- institutional distant learning Example of an international master programme in Computational Linguistics 14-16 November, Blaubeuren, Germany Nikolai Vazov Sofia University Kiril Simov LML - BAS BulTreeBank project Petya Ossenova Sofia University BulTreeBank project

Multi-lingual & multi- institutional distant learning Example of an international master programme in Computational Linguistics 14-16 November, Blaubeuren,

Embed Size (px)

Citation preview

Page 1: Multi-lingual & multi- institutional distant learning Example of an international master programme in Computational Linguistics 14-16 November, Blaubeuren,

Multi-lingual & multi-institutional distant learning

Example of an international master programme in Computational

Linguistics

14-16 November, Blaubeuren, Germany

Nikolai VazovSofia University

Kiril SimovLML - BAS

BulTreeBank project

Petya OssenovaSofia University

BulTreeBank project

Page 2: Multi-lingual & multi- institutional distant learning Example of an international master programme in Computational Linguistics 14-16 November, Blaubeuren,

14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany

2

General goals of the programme

• to put together linguistics (linguists) and computer technologies (computer scientists)

• to put together foreign and local expertise

• to promote international multi-lingual cooperation in order to develop multi-lingual language electronic resources

Page 3: Multi-lingual & multi- institutional distant learning Example of an international master programme in Computational Linguistics 14-16 November, Blaubeuren,

14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany

3

Programme participants (1-2 year)

• Two academic partners– University of Sofia (2 departments)

– University of Paris IV - Sorbonne

• Two project managment partners– French Ministry of Foreign Affairs

– French Cultural Institute in Sofia

Page 4: Multi-lingual & multi- institutional distant learning Example of an international master programme in Computational Linguistics 14-16 November, Blaubeuren,

14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany

4

Programme participants (3 year)

• Six academic partners

– University of Sofia (3 departments)

– University of Paris IV - Sorbonne

– LML - Bulgarian Academy of Sciences

– University of Montréal (RALI & OLST)

– University of Iaşi (Romania)

– RACAI (Romania)

• Three project managment partners

– French Ministry of Foreign Affairs

– French Cultural Institute in Sofia

– Agence Universitaire de la Francophonie

Page 5: Multi-lingual & multi- institutional distant learning Example of an international master programme in Computational Linguistics 14-16 November, Blaubeuren,

14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany

5

Organisation (educational activities)

• Foreign participants

– 1-3 intensive teaching sessions

– distant follow up between the sessions

– distant examination

• Local participants

– successive modules (1-3 weeks each) accompanied by web-based courses

– distant tutorship after the course - individual work with students (the format 8 students/15 professors allows for it)

– on-line personal library for each student (articles to read and discuss with the other participants)

– distant examination

Page 6: Multi-lingual & multi- institutional distant learning Example of an international master programme in Computational Linguistics 14-16 November, Blaubeuren,

14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany

6

Organisation (research activities)

Carried out as individual tasks with twofold impact:

• development of personal skills in manipulating electronic text data (using CLaRK, Perl, MySQL, XML, HTML)

• integration of individual tasks into the main goal of the team - creation of mono- and multi-lingual electronic resources

Page 7: Multi-lingual & multi- institutional distant learning Example of an international master programme in Computational Linguistics 14-16 November, Blaubeuren,

14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany

7

Organisation (research activities)

Examples:• writing tokenizers for French (solved in

TreeTagger)• sentence boundaries identification (not entirely

handled by TreeTagger, but indispensable for parallel corpora)

• named entity recognition• temporal expression extraction• abbreviations identification• parenthetic expressions identification

• concordances (Bulgarian & French)

Page 8: Multi-lingual & multi- institutional distant learning Example of an international master programme in Computational Linguistics 14-16 November, Blaubeuren,

14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany

8

On-line ressources and tools

• CLaRK system

• Morphological dictionary for Bulgarian

• Large tagged corpus of Bulgarian

• Concordances (French, Bulgarian)

• Temporal expressions extractor (French)

• Large archive of bilingual (French-Bulgarian) texts

• Large tagged corpus FRANTEXT

• Large bilingual (French-English) aligned corpus Hansard

• Taggers (TreeTagger and LATL)

• Le Monde sur CD-ROM (with integrated search engine)

Developed by the team Other available ressources

Page 9: Multi-lingual & multi- institutional distant learning Example of an international master programme in Computational Linguistics 14-16 November, Blaubeuren,

14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany

9

Future work

• New (better targeted) master « Electronic language resources »

• Goals of the master defined the other way around: research needs determine the course content and not vice versa

• Envisaged product: parallel French-Bulgarian corpus with named entity identification

• So far: collection of parallel texts, development of proper names transcription module