Upload
garvey
View
42
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models . Yuval Marton Ph.D. Dissertation Defense Department of Linguistics University of Maryland. Dissertation Theme. - PowerPoint PPT Presentation
Citation preview
Fine-Grained LinguisticSoft Constraints on Statistical
Natural Language Processing Models
Yuval MartonPh.D. Dissertation DefenseDepartment of Linguistics
University of Maryland
Unifiedcorpus-based model
with soft linguistic constraints
Syntactic(Parsing)
in stat. machine translation
Semantic(Phrases)
in stat. machine translation
Unifiedcorpus-based model
with soft linguistic constraints
Yuval Marton, Dissertation Defense 2
Dissertation Theme
• Hybrid knowledge/corpus-based statistical NLP models using fine-grained linguistic soft constraints
Syntactic(Parsing)
in stat. machine translation
Semantic(Words)
in word-pair similarity tasks
Semantic(Phrases)
in stat. machine translation
Pure vs. Hybrid Models
• Pure models– Corpus-based, data-driven, distributional, statistical
• Statistical Machine Translation• Distributional Profiles (Context Vectors)
– Manually-crafted linguistic knowledge (rules, word grouping by concept), theory-driven• Rule-based / syntax-driven machine translation• WordNet/thesaurus-based semantic similarity measures
• Hybrid models– Here: bias data-driven models with linguistic constraints
Yuval Marton, Dissertation Defense 3
Yuval Marton, Dissertation Defense 4
Hard and Soft Constraints
• Hard constraints– [0,1]; in/out– Decrease search space– Theory-driven– Faster, slimmer
• Soft constraints– [0..1]; fuzzy– Only bias the model– Data-driven: Let patterns emerge
Universe
Hard
Universe
Soft
Yuval Marton, Dissertation Defense 5
Fine-GrainedSoft Linguistic Constraints
• Fine granularity is a big deal– Soft syntactic constraints in SMT
• Chiang 2005 vs. Marton and Resnik 2008• Negative results positive results
– Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs.
Marton, Mohammad and Resnik 2009• Positive results better results
– Soft semantic constraints in paraphrase generation for SMT• Callison-Burch et al. 2006 vs. Marton, Callison-Burch & Resnik 2009
Yuval Marton, Dissertation Defense 6
Road Map Hybrid models with soft constraints
– Pure and hybrid models– Hard and soft constraints– Fine-grained
• Soft syntactic constraints– In statistical machine translation
• Soft semantic constraints – In word pair similarity tasks– In paraphrasing for statistical machine translation
• Unified model
7
• Chiang 2005, 2007• Weighted synchronous CFG
– Unnamed non-terminals: X <e, f >e.g., X < 今年 X1, X1 this year>
• Translation model features:e.g., ϕ3 = log p(e|f)
• Log-linear model:+ rule penalty feature, “glue” rules
• These trees are not necessarily “syntactic”! – Not syntactic in the linguistic sense
Statistical Machine Translation: Hiero
的竞选 Election投票 在初选 voted in the primaries
Yuval Marton, Dissertation Defense
Yuval Marton, Dissertation Defense 8
Previous (Coarse) Soft Syntactic Constraints
• X X1 speech ||| X1 discurso – What should be the span of X1?
• Chiang’s (2005) constituency feature– Reward rule’s score if rule’s
source-side matches a constituent span
– Constituency-incompatible emergent patterns can still ‘win’ (in spite of no reward)
– Good idea -- Neg-result
Yuval Marton, Dissertation Defense 9
New (Fine-Grained) Soft Syntactic Constraints
• separate weighted feature for each constituent, e.g.:• NP-only: (NP= )• VP-only: (VP= )
10
New Constraint Conditions
• VP-only, revisited:– We saw VP-match (VP= ):
Reward exact match of a VP sub-tree span
– We can also incur a penalty for crossing constituent boundaries, e.g., VP-cross (VP+ )
Yuval Marton, Dissertation Defense
11
Constraint (Feature) Space• {NP, VP, IP, CP, …} x {match=,cross-boundary+}• Basic translation models:
– For each feature, add (only it) to default feature set, assigning it a separate weight.
• Feature “combo” translation models:– NP2 (double feature): add both NP= and NP+
with a separate weight for each– NP_ (conflated feature) ties weights of NP= and NP+
– XP=, XP+, XP2, XP_: conflate all labels that correspond to “standard” X-bar Theory XP constituents in each condition.
– All-labels= (Chiang’s), All-labels+, All-labels_, All-labels2
Yuval Marton, Dissertation Defense
12
Chinese-English Results• Replicated Chiang 2005
constituency feature (negative result)
• NP=, QP+, VP+ up to .74 BLEU points better.
• XP+, IP2, all-labels_, VP2, NP_, up to 1.65 BLEU points better.
• Validated on the NIST MT08 test set
BLEU score: higher=better*,**: sig. better than baseline+,++: better than Chiang-05
(replicated)
Yuval Marton, Dissertation Defense
13
Arabic-English Results• New result for Chiang’s
constituency feature (MT06, MT08)
• PP+, AdvP= up to 1.40 BLEU better than Chiang’s and baseline.
• AP2, AdvP2 up to 1.94 better.
• Validated on the NIST MT08 test set
*,**: sig. better than baseline+,++: better than Chiang-05
New!
Yuval Marton, Dissertation Defense
14
PP+ Example: Arabic MT06
Source ... (PP (IN ب) (NP (NP (NN تعىىن) (NP (NN مندوب) (NP (NNP سورىا) (NNP لدى)))) (DT ال) (NP (NN امم) (NP (NN ال) (JJ متحدة))))))) …
Gloss …(PP (IN in) (NP (NP (NN appointment) (NP (NN representative) (NP (NNP syria) (NNP to)))) (DT the) (NP (NN nations) (NP (NN the) (JJ united))))))) …
Reference [the third decree ordered] the appointment of the syrian representative to the united nations …
Baseline … to appoint syria to the united nations representative …
PP+ … to appoint a representative of syria to the united nations …
Yuval Marton, Dissertation Defense
15
Arabic-English Results – MIRA
Yuval Marton, Dissertation Defense
Chiang, Marton and Resnik (2008)
Previous problem of feature selection solved here:
Yuval Marton, Dissertation Defense 16
Road Map Hybrid models with soft constraints
– Pure and hybrid models– Hard and soft constraints– Fine-grained
Soft syntactic constraints– In statistical machine translation
• Soft semantic constraints – In word pair similarity tasks– In paraphrasing for statistical machine translation
• Unified model
Semantic Models• Forget Frege, alternative worlds, <e,t>, …• To model meaning of words, we can use
– “Pure” models• Knowledge-based: Manually crafted linguistic resources
(dictionary, thesaurus, taxonomies, WordNet)• Usage-based: Machine-generated distributional profiles
(containing word co-occurrence-based information)– Hybrid models
• Bias distributional profiles with soft semantic constraints– As we just saw with soft syntactic constraints– E.g, use thesaurus “concepts” as word senses, with which
to alter co-occurrence counts in distributional profiles
Yuval Marton, Dissertation Defense 17
Yuval Marton, Dissertation Defense 18
Word-Based Distributional Profiles (DPs)
• Distributional Hypothesis (Harris 1940; Firth 1957)– DP (Context Vector) of “bank”:
Which words “bank” occurs next to• Strength of association
– Counts, PMI, TF/IDF-based, Log-likelihood ratios …
• Vector similarity (cosine, L1, L2,..)
linguistmoneyrivertellerwater
…
banklinguistmoneyriver
tellerwater…
tenure
α
Yuval Marton, Dissertation Defense 19
Taxonomies and Groupings
• WordNet– Synsets– Classical Relations (“is-a”)– Arc distance– “The tennis problem”
• Thesaurus– Flat lists of related words– Potentially coarse – Implicit relations,
potentially non-classical
job
Academic job
Is-a
Professor
Is-a
Industry job
Is-a
CEO
Is-a
Yuval Marton, Dissertation Defense 20
Concept-Based Distributional ProfilesMohammad & Hirst (2006) – Macquarie Thesaurus
• Word-based DP• Concept-based
DP– Approximate
senses– Aggregated– Coarse
• “bank” is listed under several concepts
• DP for each sense
linguistmoneyrivertellerwater
…
bank
linguistmoneyrivertellerwater
…
RIVERbank, boat,
wave, …
linguistmoneyrivertellerwater
…
FIN.INSTbank, dollar,
deposit, …
Yuval Marton, Dissertation Defense 21
Concept-Based Distributional ProfilesMohammad & Hirst (2006) – Macquarie Thesaurus
• How similar are “bank” and “wave”?
• Compare all pairs of senses– FIN.INST, PHYSICS– FIN.INST, RIVER– RIVER, PHYSICS– RIVER, RIVER
• Return closest sense pair• Problem: bank = wave ??
bank
RIVERbank, boat,
wave, …
FIN.INSTbank, dollar,
deposit, …
wave
PHYSICSamp., wave, freq.,
…
Yuval Marton, Dissertation Defense 22
New: Word/Concept Hybrid Model(Word Sense DP)
• Given the word’s word-based DP and concept-based DPs:
• Bias DP of “bank” towards DP of RIVER
• Create bankFIN.INST
similarly, etc.
linguistmoneyrivertellerwater
…
bank
linguistmoneyrivertellerwater
…
RIVERbank, boat,
wave, …
linguistmoneyrivertellerwater
…
bankRIVER
Yuval Marton, Dissertation Defense 23
Fine-Grained Soft Semantic Constraints
• Hybrid models: best of all: fine-grained, sense-aware, widely applicable– bankFIN.INST ≠ bankRIVER ≠ waveRIVER !
• Two hybrid flavors:– Hybrid-filtered– Hybrid-proportional
Pros and cons: Word-based DP
Concept-based DP
Word senses: Smear senses Sense-awareRelations: co-occurrence Semantic RelatednessTarget granularity: Word level (fine) Aggregated (coarse)Applicability (vocab): Wide Limited
Yuval Marton, Dissertation Defense 24
Evaluation: Word-Pair Similarity Task
• Give each word pair a similarity score– Rooster – voyage: 0.12– Coast – shore: 0.93
• Same part-of-speech pairs– Noun-noun (Rubinstein & Goodenough, 1965; Finkelstein et al. 2002)
– Verb-verb (Resnik & Diab, 2000)
• Result: list of pairs ordered by similarity• Evaluation metric: Spearman rank correlation
Yuval Marton, Dissertation Defense 25
Word-Pair Similarity Results
Yuval Marton, Dissertation Defense 26
Road Map Hybrid models with soft constraints
– Pure and hybrid models– Hard and soft constraints– Fine-grained
Soft syntactic constraints– In statistical machine translation
• Soft semantic constraints – In word pair similarity tasks– In paraphrasing for statistical machine translation
• Unified model
Words Phrases
• Extend the word-based semantic similarity measures to “phrases”– she declined to provide any other information …– police refused to provide any other details …
• So far: See if y is similar to xNow: Find y’s similar to x
• Can solve other problems now!– Use these extended phrasal DPs to find
good paraphrases of unknown “phrases” in machine translation models
Yuval Marton, Dissertation Defense 27
informationmoney
declinedteller
details…
to provide
any otherbank
Coverage Problem in Statistical Machine Translation
• Trained on parallel text• Every new test
document contains some “phrases” unknown to the model
Spanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish English
Test
set
SpanishSpanishSpanishSpanishSpanishSpanishSpanishSpanish
??
28Yuval Marton, Dissertation Defense
Previous Solution: Pivoting
• Use other parallel texts to increase coverage
• Drawback: Parallel text is a limited resources!
Spanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish English
Test
set
SpanishSpanishSpanishSpanishSpanishSpanish’Spanish’’Spanish’’’
French SpanishFrench’’ Spanish’
’ French’’ Spanish
29Yuval Marton, Dissertation Defense
German SpanishGerman’ Spanish German’ Spanish’
New Solution: Monolingually-Derived Paraphrases
• Use monolingual text to increase coverage
• Resources available in abundance!
Spanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish English
Test
set
SpanishSpanishSpanishSpanishSpanishSpanish’Spanish’’Spanish’’’M
onol
ingu
al te
xt
SpanishSpanishSpanishSpanishSpanish’Spanish’’Spanish’’’Spanish’’’’
SpanishSpanishSpanish
30Yuval Marton, Dissertation Defense
α
Find Paraphrases
• Gather all contexts L _ R for phrase “to provide any other”:• What else appears between L _ R ?
31Yuval Marton, Dissertation Defense
Left context (L) __ Right context (R)declined to provide any other details
refused to provide any other information unable to provide any other details
failed to provide any other explanation
… to provide any other …
Find Paraphrases
• Gather all contexts L _ R for phrase “to provide any other”:• What else appears between L _ R ?• Measure distributional similarity to each candidate, e.g.,
“to provide any other” -- “to give further”
32Yuval Marton, Dissertation Defense
Left context (L) __ Right context (R)declined to give further details
refused to provide any information unable to reveal any details
failed to provide further explanation
… to provide any other …
Paraphrase Examples (Phrases)
•
33Yuval Marton, Dissertation Defense
Paraphrase Examples (Unigrams)
•
34Yuval Marton, Dissertation Defense
Paraphrase Feature Model
• Evidence reinforcement:If exist more than one fi paraphrases of f:Aggregate score with a “quasi-online updating”:asimi = asimi-1 + (1 – asimi-1) sim(fi,f), where asim0 = 0
35Yuval Marton, Dissertation Defense
Analogous to Callison-Burch et al. (2006)
English to Chinese Results
• 29k line subset created to emulate low density language setting
* better than baseline+ better than non-hybrid
counterpart
36Yuval Marton, Dissertation Defense
English-Chinese Translation Examples
Yuval Marton, Dissertation Defense 37
Spanish to English
•
38Yuval Marton, Dissertation Defense
Comparison with Corpus Size & Pivoting
•
39Yuval Marton, Dissertation Defense
Yuval Marton, Dissertation Defense 40
Road Map Hybrid models with soft constraints
– Pure and hybrid models– Hard and soft constraints– Fine-grained
Soft syntactic constraints– In statistical machine translation
Soft semantic constraints – In word pair similarity tasks– In paraphrasing for statistical machine translation
• Unified model
Yuval Marton, Dissertation Defense 41
Unified Model
• Soft linguistic constraints in a log-linear model– Syntactic– Semantic– …
• ihi(x)
• Constraints = Add more ihi(x) terms to the sum:
ihi(x) + jhj(x)
hi: Features / Constraints
i: Weight / importance of feature i
Unified Model (Soft Syntactic Constraints)
• Straightforward: if is a translation model,
bias is syntactically, e.g., as follows:
+ jϕj(f,e)
1 If the source language where ϕj(f,e) = word sequence f is a VP.
0 Otherwise.
Yuval Marton, Dissertation Defense 42
Unified Model (Soft Semantic Constraints)semantic distance of word e in sense s from word e’ in sense s’:
Yuval Marton, Dissertation Defense 43
where:
= K cosWord(e,e’)
= cosSense(es ,e’s’)
cross-termscross-terms
cos(es ,e’s’) =
fSense(e,s,wi)
fSense(e,s,wi)
fSense(e,s,wi)
fSense(e’,s’,wi) / ZC
/ ZC
/ ZC
/ ZC
fWord(e,wi)
fWord(e’,wi)
fWord(e,wi)
fWord(e,wi) fWord(e’,wi)
fSense(e’,s’,wi)
Unifiedcorpus-based model
with soft linguistic constraints
Syntactic(Parsing)
in stat. machine translation
Semantic(Phrases)
in stat. machine translation
Main Contributions
Yuval Marton, Dissertation Defense 44
Unifiedcorpus-based model
with soft linguistic constraints
Syntactic(Parsing)
in stat. machine translation
Semantic(Words)
in word-pair similarity tasks
Semantic(Phrases)
in stat. machine translation
Fine-grained linguistic
soft constraints
Fine-grained linguistic
soft constraints
Fine-grained linguistic
soft constraintsin state-of-the-art
end-to-end phrase-based SMT systems
in state-of-the-art end-to-end
phrase-based SMT systems
distributional paraphrase generation
evidence reinforcement component
45
Thanks to…
• Defense Committee:– Philip Resnik, Chair/Advisor – Amy Weinberg, Advisor – William Idsardi, Member – Chris Callison-Burch,
Special Member (JHU) – Bonnie Dorr, Dean's
Representative• Ling Chair:
– Norbert HornsteinYuval Marton, Dissertation Defense
• Ling Cohort:– Ellen … Lau– Phil Monahan– Eri Takahashi– Rebecca McKeown– Chizuru Nakao
• CLIP Lab– David Chiang, Smara Muresan,
Hendra Setiawan, Adam Lopez, Chris Dyer, Asad Sayeed, Vlad Eidelman, Zhongqiang Huang, Denis Filimonov, and many others!