37
August 6 th ISAAC 2008 Word Prediction in Hebrew Preliminary and Surprising Results Yael Netzer Meni Adler Michael Elhadad Department of Computer Science Ben Gurion University, Israel

Word Prediction in Hebrew Preliminary and Surprising Results

  • Upload
    olina

  • View
    26

  • Download
    2

Embed Size (px)

DESCRIPTION

Word Prediction in Hebrew Preliminary and Surprising Results. Yael Netzer Meni Adler Michael Elhadad Department of Computer Science Ben Gurion University, Israel. Outline . Objectives and example. Methods of Word Prediction Hebrew Morphology Experiments and Results Conclusions?. - PowerPoint PPT Presentation

Citation preview

Page 1: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Word Prediction in Hebrew

Preliminary and Surprising ResultsYael Netzer

Meni AdlerMichael Elhadad

Department of Computer ScienceBen Gurion University, Israel

Page 2: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Outline • Objectives and example.• Methods of Word Prediction• Hebrew Morphology• Experiments and Results• Conclusions?

Outline

Page 3: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Word Prediction - Objectives

• Ease word insertion in textual software – by guessing the next word– by giving a list of possible options for the

next word– by completing a word given a prefix

• General idea: guess the next word given the

previous ones[Input w1 w2] [guess w3]

Objectives

Page 4: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

(Example)I s_____    

Word Prediction Example

Page 5: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

(Example)I s_____     verb, adverb?    

Word Prediction Example

Page 6: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

(Example)I s_____     verb    

sang? maybe.  singularized? hopefully

Word Prediction Example

Page 7: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

(Example)I saw a _____

Word Prediction Example

Page 8: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

(Example)I saw a _____ noun / adjective

Word Prediction Example

Page 9: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

(Example)I saw a b____  

Word Prediction Example

Page 10: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

(Example)I saw a b____   brown? big? bear?

barometer?

Word Prediction Example

Page 11: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

(Example)I saw a bird in the _____

Word Prediction Example

Page 12: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

(Example)I saw a bird in the _____ [semantics will

do good]

Word Prediction Example

Page 13: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

(Example)I saw a bird in the z____  

Word Prediction Example

Page 14: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

(Example)I saw a bird in the z____   obvious (?)

Word Prediction Example

Page 15: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Statistical Methods• Statistical information

– Unigrams: probability of isolated words• Independent of context, offer the most likely

words as candidates – More complex language models (Markov

Models)• Given w1..wn, determine most likely candidate for

wn+1

– Most common method in applications is the unigram (see references in [Garay-Vitoria and Abascal, 2004])

Word Prediction Methods

Page 16: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Syntactic Methods• Syntactic knowledge

– Consider sequences of part of speech tags[Article] [Noun] predict [Verb]

– Phrase structure[Noun Phrase] predict [Verb]

– Syntactic knowledge can be statistical or based on hand-coded rules

Word Prediction Methods

Page 17: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Semantic Methods• Semantic knowledge

– Assign semantic categories to words – Find a set of rules which constrain the

possible candidates for the next word• [eat verb] predict [word of category food]

– Not widely used in word prediction, mostly because it requires complex hand coding and is too inefficient for real-time operation

Word Prediction Methods

Page 18: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Word Prediction Knowledge Sources

• Corpora: texts and frequencies• Vocabularies (Can be domain specific)• Lexicons with syntactic and/or semantic

knowledge• User’s history • Morphological analyzers• Unknown words models

Word Prediction Methods

Page 19: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Evaluation of Word Prediction

• Keystroke savings• Time savings • Overall satisfaction

– Cognitive overload (length of choice list vs. accuracy).

• A predictor is considered adequate if its hit ratio is high as the required number of selections decreases.1-(# of actual keystrokes/# of expected keystrokes)

Word Prediction Evaluation

Page 20: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Work in non-English Languages

• Languages with rich morphology:– n-gram-based methods offer quite reasonable

prediction [Trost et al. 2005] but can be improved with more sophisticated syntactic/semantic tools

• Suggestions for inflected languages (e.g. Basque)– Use two lexicons: stems and suffixes– Add syntactic information to dictionaries and

grammatical rules to the system, offer stems and suffixes

– Combine these two approaches: offer inflected nouns.

Hebrew Word Prediction

Page 21: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Motivation for Hebrew

• We need word prediction for Hebrew– No known previous published research for

Hebrew.

• We wanted to test our morphological analyzer in a useful application.

Hebrew

Page 22: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Initial Hypothesis

Word prediction in Hebrew will be complicated,

morphological and syntactic knowledge will be

needed.

Page 23: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Hebrew Ambiguity• Unvocalized writing: most vowels are “dropped”

inherent inhrnt • Affixation: prepositions and possessives are

attached to nounsin her note inhrntin her net inhrnt

• Rich Morphology– ‘inhrnt’ could be inflected into different forms

according to sing/pl, masc/fem properties. inhrnti, inhrntit, inhrntiot

– Other morphological properties may leave ‘inherent’ unmodified (construct/absolute forms for noun compounding).

Hebrew

Page 24: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Ambiguity Level• These variations create a high level of ambiguity:

– English lexicon: inherent inherent.adj– With Hebrew word formation rules:

inhrnt in.prep her.pro.fem.poss note.noun in.prep her.pro.fem net.noun inherent.adj.masc.absolute inherent.adj.masc.construct

• Parts of speech tagset:– Hebrew: Theoretically: ~300K, In practice: ~3.6K distinct

forms– English: 45-195 tags

• Number of possible morphological analyses per word:– English: 1.4 (Average # words / sentence: 12)– Hebrew: 2.7 (Average # words / sentence: 18)

Hebrew

Page 25: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

(Real Hebrew) Morphological Ambiguity

• bzlm בצלם– bzelem (name of an association) בצלם– b-zalem (while taking a picture) בצלם– bzalam (their onion) בצלם– b-zila-m (under their shades) בצלם– b-zalam (in a photographer) בצלם– )ba-zalam (in the photographer בצלם– )b-zelem (in an idol בצלם– )ba-zelem (in the idol בצלם

Hebrew Morphology

Page 26: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Morphological AnalysisGiven a written form, recover the following

information:• Lexical category (part-of-speech)

– noun, verb adjective, adverb, preposition…• Inflectional properties

– gender, number, person, tense, status…• Affixes

– Prefixes: מ ש ה ו כ ל ב (prepositions, conjunctions, definiteness)

– Pronoun suffix: accusative, possessive, nominative

Hebrew Morphology

Page 27: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Morphological AnalysisExample: given the form בצלם propose the following

analyses:• בצלם

– proper-noun בצלם• בצלם

– verb, infinitive בצלם• בצלם

– noun, singular, masculine בצל-ם• בצלם

– noun, singular, masculine ב-צל-ם• בצלם בצלם

– noun, singular, masculine, absolute ב-צלם– noun, singular, masculine, construct ב-צלם

• בצלם בצלם – noun, definitive singular, masculine ב-צלם

Hebrew Morphology

Page 28: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Morphological Disambiguation

A difficult task in Hebrew:

Given a written form, select in context the correct morphological analysis out of all possible analyses.

We have developed a successful* system to perform morphological disambiguation in Hebrew [Adler et al, ACL06, ACL07, ACL08].

*93% for POS tagging and 90% for full morphology analysis, which was used in this test)

Hebrew Morphology

Page 29: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Word Prediction in Hebrew• We looked at Word Prediction as a

sample task to show off the quality of our Morphological Disambiguator

• But first… we checked a simple baseline

Hebrew Word Prediction

Page 30: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Baseline: n-gram methods• Check n-gram methods (unigram,

bigram, trigram)• Four sizes of selection menus: 1, 5, 7

and 9• Various training sets of 1M, 10M and

27M words to learn the probabilities of n-grams.

• Various genres.

Hebrew Word Prediction

Page 31: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Prediction results using n-grams only

Hebrew Word Prediction

Keystrokes needed to enter a message in % (Smaller is better)

For tri-grams model trained on 27M corpus – very good results!

Page 32: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Adding Syntactic Information

P(wn|w1,…,wn-1) = λ1P(wn-i,…,wn|LM) + λ2P(w1,…,wn|μ),– μ is the morpho-syntactic HMM (morphological disambiguator)– Combine P(w1,…,wn|μ) with the probabilistic language

model LM in order to rank each word candidate given previous typed words.

– if the user typed I saw, and the next word candidates are

{him, hammer}we use the HMM model, for calculating: p(I saw him|μ) p(I saw hammer|μ), in order to tune the probability given by the n-gram.

* Trained on a 1M sized corpus.Hebrew Word Prediction

Page 33: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Results with morpho-syntactic knowledge

Hebrew Word Prediction

Model sequences of parts of speech with morphological features

Results w/o syntactic knowledge

Page 34: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Some Notes on Results• n-grams perform very well (high level of

keystroke saving)• High rate for all genres• And the expected:

– Better prediction when trained on more data– Better prediction with tri-grams– Better prediction with larger window

• Morpho-syntactic information did not improve results (in fact, it hurt!)

Results

Page 35: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Conclusion• Statistical data on a language with rich

morphology yields good results – up to 29% with nine word proposals– 34% for seven proposals– 54% for a single proposal

• Syntactic information did not improve the prediction.

• Explanation - morphology didn't improve due the use of p(w1,…,wn|μ) of an unfinished sentence

Hebrew Word Prediction - Conclusions

Page 36: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

תודה

Thank you

Page 37: Word Prediction in Hebrew Preliminary and Surprising Results

August 6th ISAAC 2008

Technical Information• CMU – N-grams• Storage – Berkeley DB to store

knowledge for WP: Mapping n-grams• More questions on technology – [email protected]

Hebrew Word Prediction