46
Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models Yuval Marton Ph.D. Dissertation Defense Department of Linguistics University of Maryland

Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

  • Upload
    garvey

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models . Yuval Marton Ph.D. Dissertation Defense Department of Linguistics University of Maryland. Dissertation Theme. - PowerPoint PPT Presentation

Citation preview

Page 1: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Fine-Grained LinguisticSoft Constraints on Statistical

Natural Language Processing Models

Yuval MartonPh.D. Dissertation DefenseDepartment of Linguistics

University of Maryland

Page 2: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Unifiedcorpus-based model

with soft linguistic constraints

Syntactic(Parsing)

in stat. machine translation

Semantic(Phrases)

in stat. machine translation

Unifiedcorpus-based model

with soft linguistic constraints

Yuval Marton, Dissertation Defense 2

Dissertation Theme

• Hybrid knowledge/corpus-based statistical NLP models using fine-grained linguistic soft constraints

Syntactic(Parsing)

in stat. machine translation

Semantic(Words)

in word-pair similarity tasks

Semantic(Phrases)

in stat. machine translation

Page 3: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Pure vs. Hybrid Models

• Pure models– Corpus-based, data-driven, distributional, statistical

• Statistical Machine Translation• Distributional Profiles (Context Vectors)

– Manually-crafted linguistic knowledge (rules, word grouping by concept), theory-driven• Rule-based / syntax-driven machine translation• WordNet/thesaurus-based semantic similarity measures

• Hybrid models– Here: bias data-driven models with linguistic constraints

Yuval Marton, Dissertation Defense 3

Page 4: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 4

Hard and Soft Constraints

• Hard constraints– [0,1]; in/out– Decrease search space– Theory-driven– Faster, slimmer

• Soft constraints– [0..1]; fuzzy– Only bias the model– Data-driven: Let patterns emerge

Universe

Hard

Universe

Soft

Page 5: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 5

Fine-GrainedSoft Linguistic Constraints

• Fine granularity is a big deal– Soft syntactic constraints in SMT

• Chiang 2005 vs. Marton and Resnik 2008• Negative results positive results

– Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs.

Marton, Mohammad and Resnik 2009• Positive results better results

– Soft semantic constraints in paraphrase generation for SMT• Callison-Burch et al. 2006 vs. Marton, Callison-Burch & Resnik 2009

Page 6: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 6

Road Map Hybrid models with soft constraints

– Pure and hybrid models– Hard and soft constraints– Fine-grained

• Soft syntactic constraints– In statistical machine translation

• Soft semantic constraints – In word pair similarity tasks– In paraphrasing for statistical machine translation

• Unified model

Page 7: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

7

• Chiang 2005, 2007• Weighted synchronous CFG

– Unnamed non-terminals: X <e, f >e.g., X < 今年 X1, X1 this year>

• Translation model features:e.g., ϕ3 = log p(e|f)

• Log-linear model:+ rule penalty feature, “glue” rules

• These trees are not necessarily “syntactic”! – Not syntactic in the linguistic sense

Statistical Machine Translation: Hiero

的竞选 Election投票 在初选 voted in the primaries

Yuval Marton, Dissertation Defense

Page 8: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 8

Previous (Coarse) Soft Syntactic Constraints

• X X1 speech ||| X1 discurso – What should be the span of X1?

• Chiang’s (2005) constituency feature– Reward rule’s score if rule’s

source-side matches a constituent span

– Constituency-incompatible emergent patterns can still ‘win’ (in spite of no reward)

– Good idea -- Neg-result

Page 9: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 9

New (Fine-Grained) Soft Syntactic Constraints

• separate weighted feature for each constituent, e.g.:• NP-only: (NP= )• VP-only: (VP= )

Page 10: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

10

New Constraint Conditions

• VP-only, revisited:– We saw VP-match (VP= ):

Reward exact match of a VP sub-tree span

– We can also incur a penalty for crossing constituent boundaries, e.g., VP-cross (VP+ )

Yuval Marton, Dissertation Defense

Page 11: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

11

Constraint (Feature) Space• {NP, VP, IP, CP, …} x {match=,cross-boundary+}• Basic translation models:

– For each feature, add (only it) to default feature set, assigning it a separate weight.

• Feature “combo” translation models:– NP2 (double feature): add both NP= and NP+

with a separate weight for each– NP_ (conflated feature) ties weights of NP= and NP+

– XP=, XP+, XP2, XP_: conflate all labels that correspond to “standard” X-bar Theory XP constituents in each condition.

– All-labels= (Chiang’s), All-labels+, All-labels_, All-labels2

Yuval Marton, Dissertation Defense

Page 12: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

12

Chinese-English Results• Replicated Chiang 2005

constituency feature (negative result)

• NP=, QP+, VP+ up to .74 BLEU points better.

• XP+, IP2, all-labels_, VP2, NP_, up to 1.65 BLEU points better.

• Validated on the NIST MT08 test set

BLEU score: higher=better*,**: sig. better than baseline+,++: better than Chiang-05

(replicated)

Yuval Marton, Dissertation Defense

Page 13: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

13

Arabic-English Results• New result for Chiang’s

constituency feature (MT06, MT08)

• PP+, AdvP= up to 1.40 BLEU better than Chiang’s and baseline.

• AP2, AdvP2 up to 1.94 better.

• Validated on the NIST MT08 test set

*,**: sig. better than baseline+,++: better than Chiang-05

New!

Yuval Marton, Dissertation Defense

Page 14: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

14

PP+ Example: Arabic MT06

Source ... (PP (IN ب) (NP (NP (NN تعىىن) (NP (NN مندوب) (NP (NNP سورىا) (NNP لدى)))) (DT ال) (NP (NN امم) (NP (NN ال) (JJ متحدة))))))) …

Gloss …(PP (IN in) (NP (NP (NN appointment) (NP (NN representative) (NP (NNP syria) (NNP to)))) (DT the) (NP (NN nations) (NP (NN the) (JJ united))))))) …

Reference [the third decree ordered] the appointment of the syrian representative to the united nations …

Baseline … to appoint syria to the united nations representative …

PP+ … to appoint a representative of syria to the united nations …

Yuval Marton, Dissertation Defense

Page 15: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

15

Arabic-English Results – MIRA

Yuval Marton, Dissertation Defense

Chiang, Marton and Resnik (2008)

Previous problem of feature selection solved here:

Page 16: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 16

Road Map Hybrid models with soft constraints

– Pure and hybrid models– Hard and soft constraints– Fine-grained

Soft syntactic constraints– In statistical machine translation

• Soft semantic constraints – In word pair similarity tasks– In paraphrasing for statistical machine translation

• Unified model

Page 17: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Semantic Models• Forget Frege, alternative worlds, <e,t>, …• To model meaning of words, we can use

– “Pure” models• Knowledge-based: Manually crafted linguistic resources

(dictionary, thesaurus, taxonomies, WordNet)• Usage-based: Machine-generated distributional profiles

(containing word co-occurrence-based information)– Hybrid models

• Bias distributional profiles with soft semantic constraints– As we just saw with soft syntactic constraints– E.g, use thesaurus “concepts” as word senses, with which

to alter co-occurrence counts in distributional profiles

Yuval Marton, Dissertation Defense 17

Page 18: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 18

Word-Based Distributional Profiles (DPs)

• Distributional Hypothesis (Harris 1940; Firth 1957)– DP (Context Vector) of “bank”:

Which words “bank” occurs next to• Strength of association

– Counts, PMI, TF/IDF-based, Log-likelihood ratios …

• Vector similarity (cosine, L1, L2,..)

linguistmoneyrivertellerwater

banklinguistmoneyriver

tellerwater…

tenure

α

Page 19: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 19

Taxonomies and Groupings

• WordNet– Synsets– Classical Relations (“is-a”)– Arc distance– “The tennis problem”

• Thesaurus– Flat lists of related words– Potentially coarse – Implicit relations,

potentially non-classical

job

Academic job

Is-a

Professor

Is-a

Industry job

Is-a

CEO

Is-a

Page 20: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 20

Concept-Based Distributional ProfilesMohammad & Hirst (2006) – Macquarie Thesaurus

• Word-based DP• Concept-based

DP– Approximate

senses– Aggregated– Coarse

• “bank” is listed under several concepts

• DP for each sense

linguistmoneyrivertellerwater

bank

linguistmoneyrivertellerwater

RIVERbank, boat,

wave, …

linguistmoneyrivertellerwater

FIN.INSTbank, dollar,

deposit, …

Page 21: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 21

Concept-Based Distributional ProfilesMohammad & Hirst (2006) – Macquarie Thesaurus

• How similar are “bank” and “wave”?

• Compare all pairs of senses– FIN.INST, PHYSICS– FIN.INST, RIVER– RIVER, PHYSICS– RIVER, RIVER

• Return closest sense pair• Problem: bank = wave ??

bank

RIVERbank, boat,

wave, …

FIN.INSTbank, dollar,

deposit, …

wave

PHYSICSamp., wave, freq.,

Page 22: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 22

New: Word/Concept Hybrid Model(Word Sense DP)

• Given the word’s word-based DP and concept-based DPs:

• Bias DP of “bank” towards DP of RIVER

• Create bankFIN.INST

similarly, etc.

linguistmoneyrivertellerwater

bank

linguistmoneyrivertellerwater

RIVERbank, boat,

wave, …

linguistmoneyrivertellerwater

bankRIVER

Page 23: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 23

Fine-Grained Soft Semantic Constraints

• Hybrid models: best of all: fine-grained, sense-aware, widely applicable– bankFIN.INST ≠ bankRIVER ≠ waveRIVER !

• Two hybrid flavors:– Hybrid-filtered– Hybrid-proportional

Pros and cons: Word-based DP

Concept-based DP

Word senses: Smear senses Sense-awareRelations: co-occurrence Semantic RelatednessTarget granularity: Word level (fine) Aggregated (coarse)Applicability (vocab): Wide Limited

Page 24: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 24

Evaluation: Word-Pair Similarity Task

• Give each word pair a similarity score– Rooster – voyage: 0.12– Coast – shore: 0.93

• Same part-of-speech pairs– Noun-noun (Rubinstein & Goodenough, 1965; Finkelstein et al. 2002)

– Verb-verb (Resnik & Diab, 2000)

• Result: list of pairs ordered by similarity• Evaluation metric: Spearman rank correlation

Page 25: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 25

Word-Pair Similarity Results

Page 26: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 26

Road Map Hybrid models with soft constraints

– Pure and hybrid models– Hard and soft constraints– Fine-grained

Soft syntactic constraints– In statistical machine translation

• Soft semantic constraints – In word pair similarity tasks– In paraphrasing for statistical machine translation

• Unified model

Page 27: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Words Phrases

• Extend the word-based semantic similarity measures to “phrases”– she declined to provide any other information …– police refused to provide any other details …

• So far: See if y is similar to xNow: Find y’s similar to x

• Can solve other problems now!– Use these extended phrasal DPs to find

good paraphrases of unknown “phrases” in machine translation models

Yuval Marton, Dissertation Defense 27

informationmoney

declinedteller

details…

to provide

any otherbank

Page 28: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Coverage Problem in Statistical Machine Translation

• Trained on parallel text• Every new test

document contains some “phrases” unknown to the model

Spanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish English

Test

set

SpanishSpanishSpanishSpanishSpanishSpanishSpanishSpanish

??

28Yuval Marton, Dissertation Defense

Page 29: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Previous Solution: Pivoting

• Use other parallel texts to increase coverage

• Drawback: Parallel text is a limited resources!

Spanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish English

Test

set

SpanishSpanishSpanishSpanishSpanishSpanish’Spanish’’Spanish’’’

French SpanishFrench’’ Spanish’

’ French’’ Spanish

29Yuval Marton, Dissertation Defense

German SpanishGerman’ Spanish German’ Spanish’

Page 30: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

New Solution: Monolingually-Derived Paraphrases

• Use monolingual text to increase coverage

• Resources available in abundance!

Spanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish EnglishSpanish English

Test

set

SpanishSpanishSpanishSpanishSpanishSpanish’Spanish’’Spanish’’’M

onol

ingu

al te

xt

SpanishSpanishSpanishSpanishSpanish’Spanish’’Spanish’’’Spanish’’’’

SpanishSpanishSpanish

30Yuval Marton, Dissertation Defense

α

Page 31: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Find Paraphrases

• Gather all contexts L _ R for phrase “to provide any other”:• What else appears between L _ R ?

31Yuval Marton, Dissertation Defense

Left context (L) __ Right context (R)declined to provide any other details

refused to provide any other information unable to provide any other details

failed to provide any other explanation

… to provide any other …

Page 32: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Find Paraphrases

• Gather all contexts L _ R for phrase “to provide any other”:• What else appears between L _ R ?• Measure distributional similarity to each candidate, e.g.,

“to provide any other” -- “to give further”

32Yuval Marton, Dissertation Defense

Left context (L) __ Right context (R)declined to give further details

refused to provide any information unable to reveal any details

failed to provide further explanation

… to provide any other …

Page 33: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Paraphrase Examples (Phrases)

33Yuval Marton, Dissertation Defense

Page 34: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Paraphrase Examples (Unigrams)

34Yuval Marton, Dissertation Defense

Page 35: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Paraphrase Feature Model

• Evidence reinforcement:If exist more than one fi paraphrases of f:Aggregate score with a “quasi-online updating”:asimi = asimi-1 + (1 – asimi-1) sim(fi,f), where asim0 = 0

35Yuval Marton, Dissertation Defense

Analogous to Callison-Burch et al. (2006)

Page 36: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

English to Chinese Results

• 29k line subset created to emulate low density language setting

* better than baseline+ better than non-hybrid

counterpart

36Yuval Marton, Dissertation Defense

Page 37: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

English-Chinese Translation Examples

Yuval Marton, Dissertation Defense 37

Page 38: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Spanish to English

38Yuval Marton, Dissertation Defense

Page 39: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Comparison with Corpus Size & Pivoting

39Yuval Marton, Dissertation Defense

Page 40: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 40

Road Map Hybrid models with soft constraints

– Pure and hybrid models– Hard and soft constraints– Fine-grained

Soft syntactic constraints– In statistical machine translation

Soft semantic constraints – In word pair similarity tasks– In paraphrasing for statistical machine translation

• Unified model

Page 41: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Yuval Marton, Dissertation Defense 41

Unified Model

• Soft linguistic constraints in a log-linear model– Syntactic– Semantic– …

• ihi(x)

• Constraints = Add more ihi(x) terms to the sum:

ihi(x) + jhj(x)

hi: Features / Constraints

i: Weight / importance of feature i

Page 42: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Unified Model (Soft Syntactic Constraints)

• Straightforward: if is a translation model,

bias is syntactically, e.g., as follows:

+ jϕj(f,e)

1 If the source language where ϕj(f,e) = word sequence f is a VP.

0 Otherwise.

Yuval Marton, Dissertation Defense 42

Page 43: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Unified Model (Soft Semantic Constraints)semantic distance of word e in sense s from word e’ in sense s’:

Yuval Marton, Dissertation Defense 43

where:

= K cosWord(e,e’)

= cosSense(es ,e’s’)

cross-termscross-terms

cos(es ,e’s’) =

fSense(e,s,wi)

fSense(e,s,wi)

fSense(e,s,wi)

fSense(e’,s’,wi) / ZC

/ ZC

/ ZC

/ ZC

fWord(e,wi)

fWord(e’,wi)

fWord(e,wi)

fWord(e,wi) fWord(e’,wi)

fSense(e’,s’,wi)

Page 44: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Unifiedcorpus-based model

with soft linguistic constraints

Syntactic(Parsing)

in stat. machine translation

Semantic(Phrases)

in stat. machine translation

Main Contributions

Yuval Marton, Dissertation Defense 44

Unifiedcorpus-based model

with soft linguistic constraints

Syntactic(Parsing)

in stat. machine translation

Semantic(Words)

in word-pair similarity tasks

Semantic(Phrases)

in stat. machine translation

Fine-grained linguistic

soft constraints

Fine-grained linguistic

soft constraints

Fine-grained linguistic

soft constraintsin state-of-the-art

end-to-end phrase-based SMT systems

in state-of-the-art end-to-end

phrase-based SMT systems

distributional paraphrase generation

evidence reinforcement component

Page 45: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

45

Thanks to…

• Defense Committee:– Philip Resnik, Chair/Advisor – Amy Weinberg, Advisor – William Idsardi, Member – Chris Callison-Burch,

Special Member (JHU) – Bonnie Dorr, Dean's

Representative• Ling Chair:

– Norbert HornsteinYuval Marton, Dissertation Defense

• Ling Cohort:– Ellen … Lau– Phil Monahan– Eri Takahashi– Rebecca McKeown– Chizuru Nakao

• CLIP Lab– David Chiang, Smara Muresan,

Hendra Setiawan, Adam Lopez, Chris Dyer, Asad Sayeed, Vlad Eidelman, Zhongqiang Huang, Denis Filimonov, and many others!

Page 46: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Thank you!

• Questions

46Yuval Marton, Dissertation Defense