View
219
Download
0
Category
Tags:
Preview:
Citation preview
Translation by Collaboration among Monolingual UsersBenjamin B. Bederson
www.cs.umd.edu/~bederson@bederson
Computer Science DepartmentHuman-Computer Interaction Lab
Institute for Advanced Computer StudiesiSchool
University of Maryland
Human Computation
ThingsHUMANS
can do
ThingsCOMPUTERS
can do
Translation
Photo tagging
Face recognition
Human detection
Speech recognition
Text analysis
Planning
Human Computation Taxonomy
SocialComputing
Data Mining
Collective Intelligence
Crowdsourcing
HumanComputation
Source: Global Reach, Internet World Stats
Languages on Internet by Population
English28%
Chinese23%
Spanish8%
Japanese5%
the rest37%
2009
English32%
Chinese21%
Spanish8%
Japanese8%
the rest31%
2005
English52%
Chinese5%
Spanish5%
Japanese9%
the rest29%
2000
A real-world problem: ICDL
Now:– ~5,000 books– 55 languages– Some translations in a few
languages– 3,000 volunteer translators– 100K unique visitors/month
Goal:– 10,000 books– 100 languages– Every book in every
language!
www.childrenslibrary.org
Translation with the Crowd
vs. 1,200,000 contributors Wikipedia: 900 translators
Translate with the Monolingual Crowd
Quality
Spee
d /
Affor
dabi
lity
MachineTranslationMachineTranslation
Professional Bilingual Human ParticipationProfessional Bilingual Human Participation
Amateur Bilingual Human ParticipationAmateur Bilingual Human Participation
MonolingualHumanParticipation
Target LanguageMT
repeat …
Source Language
Original Sentence Translation Candidate
CrowdTasks:
1 Vote
2 Identify translation errors
3 Create new translationcandidates
1 Vote
3 Paraphrase source sentence
2 Explain errors
CrowdTasks:
New candidate
12
3
MT and
word alignment…
MT andword alignment
Explanation
PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)
Mary
Sees: In general, it means well, both.MT
PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)
Sees: En général, Il est à la fois de nous.
Mary
Sees: In general, it means well, both.
Edits into: In general, it is about both of us.
MT
MT
PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)
Sees: En général, Il est à la fois de nous.
Edits into: En général, nous nous entendons bien.
(lit. In general, we get along well.)
Mary
Sees: In general, it means well, both.
Edits into: In general, it is about both of us.
Sees: In general, we get along fine.
MT
MT
MT
enrichment
PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)
Sees: En général, Il est à la fois de nous.
Edits into: En général, nous nous entendons bien.
(lit. In general, we get along well.)
Sees: En général, nous sommes de bons amis.(lit. In general, we are good friends.)
Mary
Sees: In general, it means well, both.
Edits into: In general, it is about both of us.
Sees: In general, we get along fine.
Edits into: In general, we are good friends.
MT
MT
MT
MT
enrichment
PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)
Sees: En général, Il est à la fois de nous.
Edits into: En général, nous nous entendons bien.
(lit. In general, we get along well.)
Sees: En général, nous sommes de bons amis.(lit. In general, we are good friends.)
Proposes to stop with current translation
Mary
Sees: In general, it means well, both.
Edits into: In general, it is about both of us.
Sees: In general, we get along fine.
Edits into: In general, we are good friends.
Agrees to stop with current translation
MT
MT
MT
MT
enrichment
Experiment 1• 60 Spanish / 22 German speakers• ICDL volunteers• Worked on
– 4 Spanish books => German– 1 German book => Spanish
TranslateTheWorld.org
Evaluation• 2 German-Spanish bilingual evaluators• Fluency and adequacy: 5-point score• Compared Google Translate and MonoTrans2
Punchline
Google MonoTrans2
Sentences with fluency = 5 21 112
Sentences with accuracy = 5 17 118
Sentences where BOTH = 5 17 110
Sentences for which both bilingual evaluators agree score = 5
(N=162 sentences worked on in the experiment)
Straight MT: 10% of sentences ready for prime time
MonoTrans2: 68% of sentences ready for prime time
Experiment 2
• An alternative use case for crowdsourced translation… Fanmi mwen nan Kafou, 24
Cote Plage, 41A bezwen manje ak dlo
Moun kwense nan Sakre Kè nan Pòtoprens
Ti ekipman Lopital General genyen yo paka minm fè 24 è
Fanm gen tranche pou fè yon pitit nan Delmas 31
Munro, Robert. 2010. Crowdsourced translation for emergency response and beyond. NSF Workshop on crowdsourcing and translation, University of Maryland.
My family in Carrefour, 24 Cote Plage, 41A needs food and water
People trapped in Sacred Heart Church, PauP
General Hospital has less than 24 hrs. supplies
Undergoing children delivery Delmas 31
Experiment 2
• An alternative use case for crowdsourced translation…
Munro, Robert. 2010. Crowdsourced translation for emergency response and beyond. NSF Workshop on crowdsourcing and translation, University of Maryland.
Punchline
Google MonoTrans2
Sentences with fluency = 5 1 (1%) 22 (30%)
Sentences with adequacy = 5 11 (14%) 29 (38%)
Sentences where BOTH = 5 0 (0%) 14 (18%)
Sentences for which both bilingual evaluators agree score = 5
(N=76 sentences completed)
Straight MT: 0% of sentences preserve all the meaning
MonoTrans2: 38% of sentences preserve all the meaning
Live for one week:• 137,000 page views• 1,900 task submissions• 19 secs per task
Example
Toward a more general architecture
Joining forces with Chris Callison-Burch, Johns Hopkins University
Take-aways
• By combining – machine translation technology– human-computer interfaces– Crowdsourcing
it is possible to achieve accurate translation without bilingual human expertise.
Participating Students:
Chang HuCS Ph.D. student
Alex QuinnCS Ph.D. student
Vlad EidelmanCS Ph.D. student
Yakov KronrodLinguistics Ph.D. student
Olivia BuzekCS/Linguistics undergrad
New Paradigms…
Human Comp.
Comp. Ling.
HCI
TranslateTheWorld.org
Philip ResnikProfessor
LinguisticsInstitute of Advanced
Computer Studies
Recommended