View
43
Download
0
Category
Tags:
Preview:
DESCRIPTION
Finding Entries in an On-line Arabic Dictionary. 27 May 2010 27 th Annual HCIL Symposium Sarah C. Wayland, C. Anton Rytting, David Zajic, Timothy Buckwalter, Jason White, Corey Miller, Jeffrey Carnes, Nathanael Lynn, Paul Rodrigues, Michael Maxwell, Evelyn Browne. Arabic is not English. - PowerPoint PPT Presentation
Citation preview
Finding Entries in an On-line Arabic Dictionary
27 May 2010
27th Annual HCIL Symposium
Sarah C. Wayland, C. Anton Rytting, David Zajic, Timothy Buckwalter, Jason White, Corey Miller, Jeffrey Carnes, Nathanael Lynn, Paul Rodrigues, Michael Maxwell, Evelyn Browne
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Arabic is not English
• Different sounds (e.g., voiceless uvular /q/, retroflex /l/, voiced velar fricative /gh/, glottal stop / ‘ /)
• Different letters (مباريات)
• Different morphology (templatic vs. affixative)
• Written form doesn’t reflect spoken dialect
• Keyboard has different layout/letters
2
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Many informal texts diverge from Modern Standard Arabic
Texts differ from classroom Arabic in orthography, morphology, and lexical content.
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Many informal texts diverge from Modern Standard Arabic
Texts differ from classroom Arabic in orthography, morphology, and lexical content.
Orthographic differences are based on dialect pronunciations, typographical errors, and ... “style.”
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Orthographic Differences
Dialect SATTS (no vowels)
Native (no vowels)
MSA (Modern Standard Arabic) KLB لبكIraqi (with Persian character)
#CLB
J-LB لبچ
Iraqi (with MSA character) JLB لبج
Some dialects use non-standard characters
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Many informal texts diverge from Modern Standard Arabic
Texts differ from classroom Arabic in orthography, morphology, and lexical content.
Orthographic differences are based on dialect pronunciations, typographical errors, and ... “style.”
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Many informal texts diverge from Modern Standard Arabic
Texts differ from classroom Arabic in orthography, morphology, and lexical content.
Orthographic differences are based on dialect pronunciations, typographical errors, and ... “style.”
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Many informal texts diverge from Modern Standard Arabic
Texts differ from classroom Arabic in orthography, morphology, and lexical content.
Orthographic differences are based on dialect pronunciations, typographical errors, and ... “style.”
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Phonetic Differences
ق Educated Urban (MSA) لبق
qlb /qalb/
گ Iraq لبگ
glb /gaLub/
غ Sudan لبغ
qhlb /ghaLib/
أ Cairo لبأ
’lb /’alb/
Consonants sometimes vary across dialects
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Morphologically Complex
qalub* قلب “heart”
قلبال Al-qalb “the-heart”
بوقل *quluwb “hearts”
وقلالب
Al-quluwb “the-hearts”
يقلب qalb-iy “my-heart”
نابوقل quluwb-naA “our-hearts”
كقلب qalb-ak “your-heart (to a man)”
كقلب qalb-ik “your-heart (to a woman)”
بيقل qulayb “little heart”
* (the only forms listed in the dictionary)
LANGUAGE RESEARCH IN SERVICE TO THE NATION
The Arabic keyboard makes difficult-to-detect typos likely
LANGUAGE RESEARCH IN SERVICE TO THE NATION
The Arabic keyboard makes difficult-to-detect typos likely
Adjacent letters are often visually similar
LANGUAGE RESEARCH IN SERVICE TO THE NATION
The Arabic keyboard makes difficult-to-detect typos likely
Adjacent letters are often visually similar
LANGUAGE RESEARCH IN SERVICE TO THE NATION
The Arabic keyboard makes difficult-to-detect typos likely
Adjacent letters are often visually similar
LANGUAGE RESEARCH IN SERVICE TO THE NATION
The Arabic keyboard makes difficult-to-detect typos likely
Adjacent letters also often sound similar (with contrasts not found in English)
LANGUAGE RESEARCH IN SERVICE TO THE NATION
The Arabic keyboard makes difficult-to-detect typos likely
Adjacent letters also often sound similar (with contrasts subject to place-assimilation)
LANGUAGE RESEARCH IN SERVICE TO THE NATION
The Arabic keyboard makes difficult-to-detect typos likely
Adjacent letters also often sound similar (particularly so in some dialect pronunciations)
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Putting DYM…? together• A query is checked by
composing a single-string finite state automaton (FSA) with: – weighted keyboard, visual,
and sound-based FSTs – a dictionary FSA (with
weights for dialect variants)
• The n-best paths yielding unique strings are calculated
• The corresponding strings are displayed to the user
Rر
Hح
Aا
Bب
sound-basedkeyboard
HARB, ?ARB, OARB, ....
visual
LANGUAGE RESEARCH IN SERVICE TO THE NATION19
LANGUAGE RESEARCH IN SERVICE TO THE NATION20
LANGUAGE RESEARCH IN SERVICE TO THE NATION21
LANGUAGE RESEARCH IN SERVICE TO THE NATION22
LANGUAGE RESEARCH IN SERVICE TO THE NATION23
LANGUAGE RESEARCH IN SERVICE TO THE NATION24
LANGUAGE RESEARCH IN SERVICE TO THE NATION25
Show verbs Show non-verbs
LANGUAGE RESEARCH IN SERVICE TO THE NATION26
Download Results
LANGUAGE RESEARCH IN SERVICE TO THE NATION27
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Arabic is not English!
• One user interface for all languages will not work
• We must customize the user interface to take into account the unique structure of each language
28
Recommended