View
3
Download
0
Category
Preview:
Citation preview
Data Collec*on as an Enabler of Mul*-‐Lingual Services based on
Language Technology!Dorota Iskra!
• Provider of speech and language resources, services and consulting!
• Customers are typically technology developers or technology users in Speech and Language technologies, Text Analytics/Processing, Machine Translation and Search!
• Clients come from both commercial and government sectors!
Appen !
• Appen (1996) + Butler Hill (1994)!• 150+ employees!
– Linguistics and language specialists!– Project and operations management!
• Resource pool of over 70,000 linguists, language specialists and data annotators covering 140+ languages located in 90 countries!
• Operational and administrative centres in:!– Sydney, Australia!– USA!– Philippines!– Also Jordan, Pakistan, India!
Appen - Company !
• Business Units focused on:!• Language Resources!• Social Instinct and Content Analytics!• Search and Consulting Services!
Appen - Organization !
• Data Collection!• Transcription/Annotation!
• End-to-End solution: recruiting, training, performance management, supervision!
• Tools for managing data flow, post-processing and statistics reporting, delivery!
• Typically 10-20 languages in transcription at any time!• Localization!• Dictionaries/Lexicons!• User Testing!• Language model specification, rule creation!
Services – Language Resources !
Need for affordable resources!• Well developed processes and tools for automating and
supporting tasks!• Crowdsourcing maintaining high levels of quality!• Off-shore location for reduced labour cost!• Joined collections for clients with similar data needs and
no IP requirements!• Flexible revenue models!
!
Multi-Lingual Europe !
Language Coverage !
Greek Gujara* Hai*an Creole Hausa Hebrew Hindi Hungarian Icelandic Italian Japanese Kannada Kazakh Kermanji (Iran) Korean Kurdish Sorani Laki (Iran) Latvian Lithuanian Luri (Iran) Malagasy Malayalam Mandarin (China, Taiwan) Maori Marathi Mazanderani (Iran)
Arabic (Egyp*an, Gulf, Iraqi, Levan*ne, MSA, Syrian, Maghrebi – Algerian, Libyan, Moroccan, Tunisian)
Bahasa Indonesia Bahasa Malaysia (Iran) Basque Bengali Bulgarian Cantonese (China PRC, Hong Kong) Catalan Croa*an Czech Danish Dari Dutch English (Australian, Canadian, Gulf,
Indian, Irish, New Zealand, Singapore, South African, UK, US)
Estonian Farsi Finnish French (Belgian, Canadian, French) Gallego (Galician) German (Austrian, German, Swiss)
Oriya Norwegian (Bøkmal, Nynorsk) Pashto Portuguese (Brazilian, European) Romanian Russian Serbian Slovak Slovenian Somali Spanish (Columbia, Costa Rican,
European, Mexican, Peruvian, US, Venezuelan)
Swedish Sylhe* Tagalog Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese Wu Xiang
Thank you!!
Contact: Dorota Iskra!diskra@appen.com !
Recommended