Upload
vic
View
32
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Fine-Grained Soft Semantic Constraints. Yuval Marton University of Maryland http://umiacs.umd.edu/~ymarton/pub/umanch/Hybrid Knowledge-CorpusBasedSem-Manchester_090614.ppt. Why Care?. Tell’em apart: These, too:. FOX. FOX = FOX = FO rkhead/winged-heli X replicator gene. Road map. - PowerPoint PPT Presentation
Citation preview
Fine-Grained Soft
Semantic Constraints
Yuval MartonUniversity of Maryland
http://umiacs.umd.edu/~ymarton/pub/umanch/Hybrid Knowledge-CorpusBasedSem-Manchester_090614.ppt
Yuval Marton, U Manchester talk 2
Why Care?
Tell’em apart:
These, too:
Yuval Marton, U Manchester talk 3
FOX
• FOX =
• FOX = FOrkhead/winged-heliX replicator gene
Yuval Marton, U Manchester talk 4
Road map
• Brief overview of doctoral work
• Hybrid knowledge / corpus-based semantic similarity methods– Pure and hybrid methods
– Hard and soft constraints
– Fine-grained
– Named-entities
Yuval Marton, U Manchester talk 5
Dissertation Theme
• Hybrid Knowledge/Corpus-Based Statistical NLP Models Using Fine-Grained Soft Syntactic and Semantic Constraints– Soft Constraints– Fine-Grained– Syntactic (parsing)– Semantic (“concepts”, paraphrases)
• Evaluated in – Word-pair similarity ranking and – Statistical Machine Translation (SMT)
Yuval Marton, U Manchester talk 6
Soft Constraints
• Hard constraints– [0,1]; in/out– Decrease search space– “structural zeroes”– Theory-driven– Faster, slimmer
• Soft constraints– [0..1]; fuzzy– Only bias the model– Data-driven: Let patterns emerge
Univ.
Hard
Univ.
Soft
Yuval Marton, U Manchester talk 7
Fine-grained
• Granularity is a big deal– Soft syntactic constraints in SMT
• Chiang 2005 vs. Marton and Resnik 2008
• Neg results pos results
– Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs.
Marton, Mohammad and Resnik 2009
• Pos results better results
Yuval Marton, U Manchester talk 8
Soft Syntactic Constraints• X X1 speech ||| X1 espiche
– What should be the span of X1?
• Chiang’s 2005 constituency feature– Reward rule’s score if rule’s
source-side matches a constituent span
– Constituency-incompatible emergent patterns can still ‘win’ (in spite of no reward)
– Good idea -- Neg-result • But what if…
Yuval Marton, U Manchester talk 9
Rule granularity
• Chiang: Single weight for all constituents (parse tags)
• … But what if we can assign a separate feature and weight for each constituent?
• E.g., NP-only: (NP= )
• Or VP-only: (VP= )
Yuval Marton, U Manchester talk 10
Fine-grained
• Granularity is a big deal
Soft syntactic constraints in SMT• Chiang 2005 vs. Marton and Resnik 2008• Neg results pos results
– Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs.
Marton, Mohammad and Resnik 2009• Pos results better results
Yuval Marton, U Manchester talk 11
Word-pair similarity ranking
• Give each word pair a similarity score– Rooster – voyage– Coast – shore
• Noun-noun (Rubinstein & Goodenough, 1965)
• Verb-verb (Resnik & Diab, 2000)
• Result: list of pairs ordered by similarity• Spearman rank correlation
Yuval Marton, U Manchester talk 12
Similarity measures
• Distributional profiles (DP)– Which words did I occur next to?
• Context vectors
• Similar vectors similar meaning
Yuval Marton, U Manchester talk 13
Bank (pure word-based)
Bank
Yuval Marton, U Manchester talk 14
Bank (pure concept-based)
BankTellerMoney
…
Financial Institution
Water
RiverBankWater
…
–Compare closest senses
–Bankriver = water ??
Yuval Marton, U Manchester talk 15
Bank (Hybrid Model)
BankRiverBankFin.Inst
Yuval Marton, U Manchester talk 16
Fine-grained
• Granularity is a big deal
Soft syntactic constraints in SMT• Chiang 2005 vs. Marton and Resnik 2008• Neg results pos results
Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs.
Marton, Mohammad and Resnik 2009• Pos results better results
Yuval Marton, U Manchester talk 17
Unified Model
• Soft constraints in a log-linear model– Syntactic
– Semantic
– …
• ihi(x)
• Add more terms to the sum
Yuval Marton, U Manchester talk 18
Road map
Brief overview of doctoral work
• Hybrid knowledge / corpus-based semantic similarity methods– Pure and hybrid methods
– Hard and soft constraints
– Fine-grained
– Named-entities
Yuval Marton, U Manchester talk 19
Distributional profiles (DPs)
• DPW: word-based distributional profile– First order
– Distributional Hypothesis (Harris 1940; Firth 1957)
– Second order (vector representation)
– Strength of association• Counts, PMI, TF/IDF-based,
Log-likelihood ratios …
– Vector similarity (cosine, L1, L2,..)
word x word
Bush Obama
Presi-dent
.93 .96
Demo-crat
.13 .89
Repub-lican
.88 .15
White-house
.76 .91
… .45 .74
Yuval Marton, U Manchester talk 20
Taxonomies and Groupings
• WordNet– Synsets– Relations (“is-a”)– Arc distance
• UMLS• Thesaurus
– Flat– Coarse
– Bankriver = water ??
job
Academic job
Is-a
Postdoc
Is-a
Industry job
Is-a
CEO
Is-a
Yuval Marton, U Manchester talk 21
Hybrid measures
• WordNet– Resnik’s method (info content)– Lin and others
• Thesaurus Concept-based – Mohammad and Hirst (coarse-grained)– word may be listed under several concepts– Distance b/w most similar senses– Pro: Resource-poor languages and domains– Con: Small thesaurus low applicability– WCCM: Financial instit. ~ academic instit.
– Bankriver = water ??
Yuval Marton, U Manchester talk 22
WCCM: Concept-Word matrix
• WCCM: word-concept collocation matrix
• DPC: concept-based distributional profile
• Potentially iterative process
• Clean-up
conceptx word
Fin.Inst Water
bank .97 .85
teller .88 .07
money .94 .15
water .32 .91
… .45 .74
Yuval Marton, U Manchester talk 23
Use concept-based DPCs to bias word-based DPWs
Bank
BankTellerMoney
…
WaterFinancial Institution WaterFinancial Institution
RiverBankWater
…
–Compare closest senses
–Bankriver = water ??
BankRiverBankFin.Inst
+
=
Yuval Marton, U Manchester talk 24
Fine-grained soft constraints
• DPWS: distributional profile of word senses
• Use concept-based DPCs to bias word-based DPWs– Hybrid-filtered
– Hybrid-proportional
Yuval Marton, U Manchester talk 25
Hybrid-filtered
Fin.Inst DPC
Water DPC
bank
DPW
bankriver
DPWS
bank .97 .85 .76 .76
teller .88 .07 .54 .54
money .94 .15 .68 .68
water .00 .91 .62 .00
… .45 .74 .25 .25
Filter out collocates in DPW, if not appearing in DPC
Yuval Marton, U Manchester talk 26
Hybrid-proportional
Fin.Inst DPC
Water DPC
bank
DPW
bankriver
DPWS
bank .97 .85 .76 .33
teller .88 .07 .54 .05
money .94 .15 .68 .08
water .00 .91 .62 .00
… .45 .74 .25 .15
Only discount collocate’s value in DPW, in proportion to the ratio of its count in current DPC relative to all DPCs of the target word
Yuval Marton, U Manchester talk 27
WSD with DPWS
• Each sense of each word has a unique profile
– Bankfin.inst ≠ Bankriver ≠ water !
• Pro:– Not aggregated: DPC profiles are
– Non/less smearing: DPW profiles smear all senses in a single profile
Yuval Marton, U Manchester talk 28
Results
Yuval Marton, U Manchester talk 29
evaluation
• Word-pair similarity ranking– Spearman Rank correlation
• Paraphrasing in SMT– BLEU, TER, METEOR, ..
Yuval Marton, U Manchester talk 30
comparison
• WordNet results
• LSA results
Yuval Marton, U Manchester talk 31
Challenges
• Antonyms (black – white)
• “Hyperonyms” (vehicle – car)
• Co-hypernyms / co-taxonyms
Yuval Marton, U Manchester talk 32
Named Entities
• Challenges:– Bush – Obama
• Potentially helpful:– H2O – Water– FOX – “forkhead/winged-helix replicator”– FOXP2 – SPCH1
• “SPCH1” turned out to be a member of the FOX (forkhead/winged-helix replicator genes) family, of which several other genes are known all across the animal world. It was then labeled FOXP2, that being its current, and more conventional, name.
Yuval Marton, U Manchester talk 33
Biomedical/Chemical WSD
• Explore hybrid methods to create DPWS – FOXgene , FOXanimal
• requires a lexical resource – UMLS or other resources
• Useful for smaller training sets!
Yuval Marton, U Manchester talk 34
conclusion
• Hybrid Knowledge/Corpus-Based Statistical NLP Models Using Fine-Grained Soft Constraints– Soft Constraints
– Fine-Grained
– Semantic (“concepts”)
– resource-poor setting, special domains
Univ.
Soft
BankRiverBankFin.Inst
Yuval Marton, U Manchester talk 35
Thank you!
Questions?
Advisors: Philip Resnik & Amy Weinberg
Department of Linguistics and CLIP Lab
Yuval Marton, U Manchester talk 36
Fine-grained semantic
• Word-based: – Bank: river, money, water, teller, …
• “concept”-based– River: water, bank, boat, …– Financial institution: bank, money, teller,…– Humans compare closest senses
– Bankriver = water ??
• Hybrid: – Bankriver: more strongly associated with water
– Bankfin.inst: more strongly associated with money
Yuval Marton, U Manchester talk 37
SMT
• Statistical Machine Translation– What translational units to use?
– Syntactic constituents, re-ordering
– “es gibt”
• Paraphrases– Pivoting vs. bitext-free paraphrasing
– Typically monolingual
– Translation = bilingual / cross-domain paraphrasing
– Can be evaluated in SMT