8
Data Collec*on as an Enabler of Mul*Lingual Services based on Language Technology Dorota Iskra

Dorota Iskra - meta-net.eu · • Appen (1996) + Butler Hill (1994)! • 150+ employees! – Linguistics and language specialists! – Project and operations management! • Resource

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dorota Iskra - meta-net.eu · • Appen (1996) + Butler Hill (1994)! • 150+ employees! – Linguistics and language specialists! – Project and operations management! • Resource

Data  Collec*on  as  an  Enabler  of  Mul*-­‐Lingual  Services  based  on  

Language  Technology!Dorota Iskra!

Page 2: Dorota Iskra - meta-net.eu · • Appen (1996) + Butler Hill (1994)! • 150+ employees! – Linguistics and language specialists! – Project and operations management! • Resource

•  Provider of speech and language resources, services and consulting!

•  Customers are typically technology developers or technology users in Speech and Language technologies, Text Analytics/Processing, Machine Translation and Search!

•  Clients come from both commercial and government sectors!

Appen !

Page 3: Dorota Iskra - meta-net.eu · • Appen (1996) + Butler Hill (1994)! • 150+ employees! – Linguistics and language specialists! – Project and operations management! • Resource

•  Appen (1996) + Butler Hill (1994)!•  150+ employees!

–  Linguistics and language specialists!–  Project and operations management!

•  Resource pool of over 70,000 linguists, language specialists and data annotators covering 140+ languages located in 90 countries!

•  Operational and administrative centres in:!–  Sydney, Australia!–  USA!–  Philippines!–  Also Jordan, Pakistan, India!

Appen - Company !

Page 4: Dorota Iskra - meta-net.eu · • Appen (1996) + Butler Hill (1994)! • 150+ employees! – Linguistics and language specialists! – Project and operations management! • Resource

•  Business Units focused on:!•  Language Resources!•  Social Instinct and Content Analytics!•  Search and Consulting Services!

Appen - Organization !

Page 5: Dorota Iskra - meta-net.eu · • Appen (1996) + Butler Hill (1994)! • 150+ employees! – Linguistics and language specialists! – Project and operations management! • Resource

•  Data Collection!•  Transcription/Annotation!

•  End-to-End solution: recruiting, training, performance management, supervision!

•  Tools for managing data flow, post-processing and statistics reporting, delivery!

•  Typically 10-20 languages in transcription at any time!•  Localization!•  Dictionaries/Lexicons!•  User Testing!•  Language model specification, rule creation!

Services – Language Resources !

Page 6: Dorota Iskra - meta-net.eu · • Appen (1996) + Butler Hill (1994)! • 150+ employees! – Linguistics and language specialists! – Project and operations management! • Resource

Need for affordable resources!•  Well developed processes and tools for automating and

supporting tasks!•  Crowdsourcing maintaining high levels of quality!•  Off-shore location for reduced labour cost!•  Joined collections for clients with similar data needs and

no IP requirements!•  Flexible revenue models!

!

Multi-Lingual Europe !

Page 7: Dorota Iskra - meta-net.eu · • Appen (1996) + Butler Hill (1994)! • 150+ employees! – Linguistics and language specialists! – Project and operations management! • Resource

Language Coverage !

Greek  Gujara*  Hai*an  Creole  Hausa    Hebrew  Hindi  Hungarian  Icelandic  Italian  Japanese  Kannada  Kazakh  Kermanji  (Iran)  Korean  Kurdish  Sorani  Laki  (Iran)  Latvian  Lithuanian  Luri  (Iran)  Malagasy  Malayalam  Mandarin  (China,  Taiwan)  Maori  Marathi    Mazanderani  (Iran)      

Arabic  (Egyp*an,  Gulf,  Iraqi,  Levan*ne,  MSA,  Syrian,  Maghrebi  –  Algerian,  Libyan,  Moroccan,  Tunisian)    

Bahasa  Indonesia  Bahasa  Malaysia    (Iran)  Basque    Bengali  Bulgarian  Cantonese  (China  PRC,  Hong  Kong)  Catalan  Croa*an  Czech  Danish  Dari  Dutch  English  (Australian,  Canadian,  Gulf,  

Indian,  Irish,  New  Zealand,  Singapore,  South  African,  UK,  US)  

Estonian  Farsi  Finnish  French  (Belgian,  Canadian,  French)  Gallego  (Galician)  German  (Austrian,  German,  Swiss)      

Oriya  Norwegian  (Bøkmal,  Nynorsk)  Pashto    Portuguese  (Brazilian,  European)  Romanian  Russian  Serbian  Slovak  Slovenian  Somali    Spanish  (Columbia,  Costa  Rican,  

European,  Mexican,  Peruvian,  US,  Venezuelan)  

Swedish  Sylhe*  Tagalog  Tamil  Telugu  Thai  Turkish  Ukrainian  Urdu  Vietnamese  Wu  Xiang    

Page 8: Dorota Iskra - meta-net.eu · • Appen (1996) + Butler Hill (1994)! • 150+ employees! – Linguistics and language specialists! – Project and operations management! • Resource

Thank you!!

Contact: Dorota [email protected] !