Upload
austin-perkins
View
213
Download
0
Embed Size (px)
Citation preview
T E AC H I N G C O M P U T E R S T O R E A D
WORD MORPHOLOGY
RESEARCH PAPERS
• Book cites: “Viewing Morphology as an Inference Process” by Krovetz, SIGIR 1993• This is cited by: “Guessing Morphology from
Terms and Corpora” by Jacquemin, SIGIR 1997
• When are different words the same word?
PORTER STEMMING
Multi-step process to remove word suffixes• Stem• Stems• Stemmed• Stemming• -ology• -ize• -ship
STEMMING PROBLEMS
Derivation - Meaning
• Doe• Donut
• Paste• Pastafarian
Inflection – Syntax
• Do• Doing• Done
• Past (n)• Past (v)
INFLECTIONAL STEMMING
Afflictional suffixes are safe to remove… usually• Plural: s, es, ies 57%• Tense: ed 22%• Aspect: ing 21%
DERIVATIONAL STEMMING
Words that change meaning if they are stemmed.• Appreciate v Appreciation• 2/3rds of derivational variants appear in the
dictionary
• Krovetz’s solution is to leave dictionary words alone
INFERRING MORPHOLOGY
• Jacquemin asserts morphology can be derived from the corpus
1. Word truncation2. Multi-word term conflation3. Classification & filtering4. Clustering
STATISTICAL WEIGHTING
• Different segments of terms are given different statistical weights
WORD CLASSIFICATION
• Jacquemin’s algorithm allows error in conflation• Errors are filtered statistically• Rare and domain-specific terms are conflated• Gene rearrangement / Genetic rearrangement• Artificial ventilation / Artificially ventilated• North Africa / Northern Africa• Cirrhosis / Cirrhosia• Pulsating flow / pulsatile flow
• The algorithm acts like a snap-to-grid for text