Upload
hester-stephens
View
213
Download
0
Embed Size (px)
Citation preview
Roadmap for Language Resources and Evaluation in a Multilingual
Environment
Minority Languages in the African Context
Justus Roux Centre for Language and Speech Technology
(SU-CLaST)Stellenbosch University, South Africa
Aim
• Overview of
– proceedings of the LREC2006 workshop on
Networking the development of African Languages
– resolutions taken at the meeting
• Remarks on future development and co-
operation
Background to the LREC workshop
• African Language Association of Southern Africa Special Interest Group for Language and Speech Technology (ALASA-SIG)
– Special Track on HLT at ALASA International Conference in Johannesburg in 2005
– National and international participants
– Proceedings to appear in SA Journal of African Languages
– Decision to interact with the international community via LREC2006
Why?• UNESCO Year of African Languages (2006)
• Challenges in bridging the digital divide concerning African languages (connecting Africa)
• R&D activities in relative isolation
• Perceived need to develop resources and capacity for HLT R&D in African languages
• Similar activities in NEMLAR project – Language Technology for Arabic
AIMS of Workshop
• Develop an academic network for sharing ideas
• Promote co-operation in the development of resources and tools (BLARKs for African languages)
• Facilitate capacity building related to African languages in the context of HLT
Programme• Area surveys
– West Africa– East Africa– Central Africa– Southern Africa
• Projects per area
• Larger projects and infrastructures
• Discussion on networking possibilities
West Africa– Language Documentation paradigm: specific role of
Uni Bielefeld
– Doctoral students at various European universities
– ALT-I: African Language Technology Institute in Ibadan
– Local Language Speech Technology Initiative (Speech synthesis for Ibibio)
– Initiatives in development of morphological parsers (Cologne)
– West African Linguistics Society
East Africa
– Text corpora on Swahili across Europe
– University of Helsinki• Tools: Open Swahili Localisation Project (OSLP) – spelling
checker for Swahili• Tagging tools• Localisation Microsoft Windows XP: Swahili• Morphological analysers• SALAMA: Machine Translation
– Centre for Science and New Technologies & CNRS (Avignon)
• Speech mining in Somali
– University of Nairobi & University of Antwerp• Annotated corpora in Gikuyu and applied machine learning
Southern Africa
– Extremely wide range of activities in South Africa primarily by locals (see proceedings)
– University of South Africa• Morphological analysers for five African languages• Development of machine readable lexicons
– University of Pretoria• Text corpora and spelling checkers• Machine-aided Translation / Localisation
– Stellenbosch University Centre for Language and Speech Technology
• ASR, TTS and Natural language Understanding in five languages
Southern Africa (Continued)
– University of North West - Centre for Text Technology• Localisation, spelling checkers
– University of Limpopo & Cape Town• Speech Synthesis
– Meraka Institute (Pretoria)• Open source software for language and speech technology
applications
– University of the Free State & Province of Flanders• Interpreting services, data warehousing
Southern Africa (Continued)
• Standardisation:– ISO/TC 37 mirror Committee (StanSA TC37) Terminology training workshops with Termnet
Workshop on text annotation (Sept 2006)ISO-Meetings: Oslo (04), Warsaw (05), Beijing (06)
• AFRILEX:– International conferences and workshops
• National Language Service: – National Lexicography Units– National HLT Resource Centre
Larger Projects
• The African Anaphora Project (Rutgers, USA)
• Building an Infrastructure for Collaborative Development (Taiwan)
Decisions taken
• To consolidate an inventory on tools, resources etc. available in Africa by using the on-line ELRA BLARK website
• To set up a dedicated website (Wiki) to facilitate networking
• The current Organising Committee will be responsible for the activities above as well as for fundraising for training workshops in Africa
• To organise a similar workshop at LREC2008
Concluding impressions
• European countries are playing an active role in the field in West and East Africa – to be welcomed
• International organisations are becoming increasingly involved in Africa: – ISCA International Affairs Committee for Africa– ISO– ELRA??
• International co-operation in EU projects (FP7)?
Identification of multilingualdigital text and speech resources
in different sectors. Negotiations / Contracts
Software development: Tools for annotation / mark-updata management etc
Training (Non-formal) of annotators / database managers
Meraka
University C
University B
University A
Resources and Expertise to feed into
National Lexicographic Units (NLUs)
Government DeptsHLT–products for
E-Gateway (via SITA)
Academic research and development
Private sector ICT apps – telecoms
e-commerce
News Media
SABCArchives
Nat & ProvParliaments
National Resource Centre for Human Language
Technologies (VIRTUAL)
Annotation / Mark-up of Digital Text and Speech Resources
(Fixed standards)
GOVERNANCENRF / DAC / DST
SECRETARIAT: OPERATIONS
Central planning & Co-ordination, Consultation, Database Management
Resource backup, Marketing
DAC HLT UNIT
All Gov Depts
Publishing Houses
NLSTerminology
Services
SITA