9
TEACHING COMPUTERS TO READ WORD MORPHOLOGY

TEACHING COMPUTERS TO READ WORD MORPHOLOGY. RESEARCH PAPERS Book cites: “Viewing Morphology as an Inference Process” by Krovetz, SIGIR 1993 This is cited

Embed Size (px)

Citation preview

Page 1: TEACHING COMPUTERS TO READ WORD MORPHOLOGY. RESEARCH PAPERS Book cites: “Viewing Morphology as an Inference Process” by Krovetz, SIGIR 1993 This is cited

T E AC H I N G C O M P U T E R S T O R E A D

WORD MORPHOLOGY

Page 2: TEACHING COMPUTERS TO READ WORD MORPHOLOGY. RESEARCH PAPERS Book cites: “Viewing Morphology as an Inference Process” by Krovetz, SIGIR 1993 This is cited

RESEARCH PAPERS

• Book cites: “Viewing Morphology as an Inference Process” by Krovetz, SIGIR 1993• This is cited by: “Guessing Morphology from

Terms and Corpora” by Jacquemin, SIGIR 1997

• When are different words the same word?

Page 3: TEACHING COMPUTERS TO READ WORD MORPHOLOGY. RESEARCH PAPERS Book cites: “Viewing Morphology as an Inference Process” by Krovetz, SIGIR 1993 This is cited

PORTER STEMMING

Multi-step process to remove word suffixes• Stem• Stems• Stemmed• Stemming• -ology• -ize• -ship

Page 4: TEACHING COMPUTERS TO READ WORD MORPHOLOGY. RESEARCH PAPERS Book cites: “Viewing Morphology as an Inference Process” by Krovetz, SIGIR 1993 This is cited

STEMMING PROBLEMS

Derivation - Meaning

• Doe• Donut

• Paste• Pastafarian

Inflection – Syntax

• Do• Doing• Done

• Past (n)• Past (v)

Page 5: TEACHING COMPUTERS TO READ WORD MORPHOLOGY. RESEARCH PAPERS Book cites: “Viewing Morphology as an Inference Process” by Krovetz, SIGIR 1993 This is cited

INFLECTIONAL STEMMING

Afflictional suffixes are safe to remove… usually• Plural: s, es, ies 57%• Tense: ed 22%• Aspect: ing 21%

Page 6: TEACHING COMPUTERS TO READ WORD MORPHOLOGY. RESEARCH PAPERS Book cites: “Viewing Morphology as an Inference Process” by Krovetz, SIGIR 1993 This is cited

DERIVATIONAL STEMMING

Words that change meaning if they are stemmed.• Appreciate v Appreciation• 2/3rds of derivational variants appear in the

dictionary

• Krovetz’s solution is to leave dictionary words alone

Page 7: TEACHING COMPUTERS TO READ WORD MORPHOLOGY. RESEARCH PAPERS Book cites: “Viewing Morphology as an Inference Process” by Krovetz, SIGIR 1993 This is cited

INFERRING MORPHOLOGY

• Jacquemin asserts morphology can be derived from the corpus

1. Word truncation2. Multi-word term conflation3. Classification & filtering4. Clustering

Page 8: TEACHING COMPUTERS TO READ WORD MORPHOLOGY. RESEARCH PAPERS Book cites: “Viewing Morphology as an Inference Process” by Krovetz, SIGIR 1993 This is cited

STATISTICAL WEIGHTING

• Different segments of terms are given different statistical weights

Page 9: TEACHING COMPUTERS TO READ WORD MORPHOLOGY. RESEARCH PAPERS Book cites: “Viewing Morphology as an Inference Process” by Krovetz, SIGIR 1993 This is cited

WORD CLASSIFICATION

• Jacquemin’s algorithm allows error in conflation• Errors are filtered statistically• Rare and domain-specific terms are conflated• Gene rearrangement / Genetic rearrangement• Artificial ventilation / Artificially ventilated• North Africa / Northern Africa• Cirrhosis / Cirrhosia• Pulsating flow / pulsatile flow

• The algorithm acts like a snap-to-grid for text