Upload
hector-jefferson
View
213
Download
0
Embed Size (px)
Citation preview
Multi-lingual & multi-institutional distant learning
Example of an international master programme in Computational
Linguistics
14-16 November, Blaubeuren, Germany
Nikolai VazovSofia University
Kiril SimovLML - BAS
BulTreeBank project
Petya OssenovaSofia University
BulTreeBank project
14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany
2
General goals of the programme
• to put together linguistics (linguists) and computer technologies (computer scientists)
• to put together foreign and local expertise
• to promote international multi-lingual cooperation in order to develop multi-lingual language electronic resources
14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany
3
Programme participants (1-2 year)
• Two academic partners– University of Sofia (2 departments)
– University of Paris IV - Sorbonne
• Two project managment partners– French Ministry of Foreign Affairs
– French Cultural Institute in Sofia
14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany
4
Programme participants (3 year)
• Six academic partners
– University of Sofia (3 departments)
– University of Paris IV - Sorbonne
– LML - Bulgarian Academy of Sciences
– University of Montréal (RALI & OLST)
– University of Iaşi (Romania)
– RACAI (Romania)
• Three project managment partners
– French Ministry of Foreign Affairs
– French Cultural Institute in Sofia
– Agence Universitaire de la Francophonie
14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany
5
Organisation (educational activities)
• Foreign participants
– 1-3 intensive teaching sessions
– distant follow up between the sessions
– distant examination
• Local participants
– successive modules (1-3 weeks each) accompanied by web-based courses
– distant tutorship after the course - individual work with students (the format 8 students/15 professors allows for it)
– on-line personal library for each student (articles to read and discuss with the other participants)
– distant examination
14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany
6
Organisation (research activities)
Carried out as individual tasks with twofold impact:
• development of personal skills in manipulating electronic text data (using CLaRK, Perl, MySQL, XML, HTML)
• integration of individual tasks into the main goal of the team - creation of mono- and multi-lingual electronic resources
14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany
7
Organisation (research activities)
Examples:• writing tokenizers for French (solved in
TreeTagger)• sentence boundaries identification (not entirely
handled by TreeTagger, but indispensable for parallel corpora)
• named entity recognition• temporal expression extraction• abbreviations identification• parenthetic expressions identification
• concordances (Bulgarian & French)
14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany
8
On-line ressources and tools
• CLaRK system
• Morphological dictionary for Bulgarian
• Large tagged corpus of Bulgarian
• Concordances (French, Bulgarian)
• Temporal expressions extractor (French)
• Large archive of bilingual (French-Bulgarian) texts
• Large tagged corpus FRANTEXT
• Large bilingual (French-English) aligned corpus Hansard
• Taggers (TreeTagger and LATL)
• Le Monde sur CD-ROM (with integrated search engine)
Developed by the team Other available ressources
14-16 November 2003 MiLCA Symposium, Blaubeuren, Germany
9
Future work
• New (better targeted) master « Electronic language resources »
• Goals of the master defined the other way around: research needs determine the course content and not vice versa
• Envisaged product: parallel French-Bulgarian corpus with named entity identification
• So far: collection of parallel texts, development of proper names transcription module