Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt

Fine-Grained Soft

Semantic Constraints

Yuval MartonUniversity of Maryland

http://umiacs.umd.edu/~ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt

http://umiacs.umd.edu/~ymarton/pub/ibm/Hybrid%20Knowledge-CorpusBasedSem-IBM_090728.ppt

http://umiacs.umd.edu/~ymarton/pub/ibm/Hybrid%20Knowledge-CorpusBasedSem-IBM_090728.ppt

Yuval Marton, IBM talk 2

Why Care?

Tell’em apart:

In spite of similar contexts

These, too:

In spite of same form


Road map

• Brief overview of doctoral work

• Hybrid knowledge / corpus-based semantic similarity methods– Pure and hybrid methods

– Hard and soft constraints

– Fine-grained

– Named-entities


Dissertation Theme

• Hybrid Knowledge/Corpus-Based Statistical NLP Models Using Fine-Grained Soft Syntactic and Semantic Constraints– Soft Constraints– Fine-Grained– Syntactic (parsing)– Semantic (“concepts”, paraphrases)

• Evaluated in – Word-pair semantic similarity ranking and – Statistical Machine Translation (SMT)


Soft Constraints

• Hard constraints– [0,1]; in/out– Decrease search space– “structural zeroes”– Theory-driven– Faster, slimmer

• Soft constraints– [0..1]; fuzzy– Only bias the model– Data-driven: Let patterns emerge

Universe

Hard

Universe

Soft


Fine-grained

• Granularity is a big deal– Soft syntactic constraints in SMT

• Chiang 2005 vs. Marton and Resnik 2008

• Neg results pos results

– Soft semantic constraints in word-pair similarity ranking

• Mohammad and Hirst 2006 vs. Marton, Mohammad and Resnik 2009

• Pos results better results


Soft Syntactic Constraints• X X1 speech ||| X1 espiche

– What should be the span of X1?

• Chiang’s 2005 constituency feature– Reward rule’s score if rule’s

source-side matches a constituent span

– Constituency-incompatible emergent patterns can still ‘win’ (in spite of no reward)

– Good idea -- Neg-result • But what if…


Rule granularity

• Chiang: Single weight for all constituents (parse tags)

• … But what if we can assign a separate feature and weight for each constituent?

• E.g., NP-only: (NP= )

• Or VP-only: (VP= )


Fine-grained

• Granularity is a big deal

Soft syntactic constraints in SMT• Chiang 2005 vs. Marton and Resnik 2008• Neg results pos results

– Soft semantic constraints in word-pair similarity ranking




Word-pair similarity ranking

• Give each word pair a similarity score– Rooster – voyage– Coast – shore

• Noun-noun (Rubinstein & Goodenough, 1965)

• Verb-verb (Resnik & Diab, 2000)

• Result: list of pairs ordered by similarity• Spearman rank correlation


Similarity measures

• Distributional profiles (DP)– Which words did I occur next to?

• Context vectors

• Similar vectors similar meaning


Bank (pure word-based)

Bank


Bank (pure concept-based)

BankTellerMoney

…

Financial Institution

Water

RiverBankWater

…

–Compare closest senses

–Bankriver = water ??


Bank (Hybrid Model)

BankRiverBankFin.Inst


Fine-grained

• Granularity is a big deal

Soft syntactic constraints in SMT• Chiang 2005 vs. Marton and Resnik 2008• Neg results pos results

Soft semantic constraints in word-pair similarity ranking




Unified Model

• Soft constraints in a log-linear model– Syntactic

– Semantic

– …

• ihi(x)

• Constraints = Add more terms to the sum


Road map

Brief overview of doctoral work

• Hybrid knowledge / corpus-based semantic similarity methods– Pure and hybrid methods

– Hard and soft constraints

– Fine-grained

– Named-entities


Distributional profiles (DPs)

• Distributional Hypothesis (Harris 1940; Firth 1957)

• First order vs. second order (vector representation)

• Strength of association– Counts, PMI, TF/IDF-based,

Log-likelihood ratios …

• Vector similarity (cosine, L1, L2,..)

word x word

Bush Obama

President .93 .96

Democrat .13 .89

Republican .88 .15

White-house

.76 .91

… .45 .74


Taxonomies and Groupings

• WordNet– Synsets– Relations (“is-a”)– Arc distance– The tennis problem

• UMLS• Thesaurus

– Flat– Coarse – Implicit relations,

potentially non-classical

job

Academic job

Is-a

Postdoc

Is-a

Industry job

Is-a

CEO

Is-a


Hybrid measures

• WordNet– Resnik’s method (info content)– Lin and others

• Thesaurus Concept-based – Mohammad and Hirst (coarse-grained)– Distance b/w most similar senses– Pro: Semantic relatedness (non-classical relations)

Resource-poor languages and domains– Con: Small thesaurus low applicability

Bankriver = water ??


Concept-Word DPs

• Concept-word collocation matrix

• Aggregate collocation info of words under concept

• Potentially iterative process

• Clean-up

Concept x word

Fin.Inst Water

bank .97 .85

teller .88 .07

money .94 .15

water .32 .91

… .45 .74


Use concept-based DPs to bias word-based DPs

Bank

BankTellerMoney

…

WaterFinancial Institution WaterFinancial Institution

RiverBankWater

…

–Compare closest senses

–Bankriver = water ??


+

=


Fine-grained soft constraints

• DPWS: distributional profile of word senses

• Use concept-based DPs to bias word-based DPs– Hybrid-filtered

– Hybrid-proportional


Hybrid-filteredFin.Inst concept

DP

Water concept

DP

bank

DP

bankriver

DPWS

bank .97 .85 .76 .76

teller .88 .07 .54 .54

money .94 .15 .68 .68

water .00 .91 .62 .00

… .45 .74 .25 .25

Filter out collocates in word DP,

if not appearing in concept DP


Hybrid-proportional

Fin.Inst concept

DP

Water concept

DP

bank

DP

bankriver

DPWS

bank .97 .85 .76 .33

teller .88 .07 .54 .05

money .94 .15 .68 .08

water .00 .91 .62 .00

… .45 .74 .25 .15

Only discount collocate’s value in word DP in proportion to the ratio of its count in current concept DP relative to all concept DPs of the target word


WSD with DPWS

• Each sense of each word has a unique profile

– Bankfin.inst ≠ Bankriver ≠ water !

• Pro:– Not aggregated: unlike concept DPs

– Non/less smearing: unlike word DPs that smear all senses in a single profile


Results


evaluation

• Word-pair similarity ranking– Spearman Rank correlation

• Paraphrasing in SMT– BLEU, TER, METEOR, ..


comparison

• WordNet results

• LSA results


Challenges

• Antonyms (black – white)

• “Hyperonyms” (vehicle – car)

• Co-hypernyms / co-taxonyms


conclusion

• Hybrid Knowledge/Corpus-Based Statistical NLP Models Using Fine-Grained Soft Constraints– Soft Constraints

– Fine-Grained

– Semantic (“concepts”)

– Semantic relatedness,resource-poor setting, special domains

Univ.

Soft



Thank you!

Questions?

[email protected]

Advisors: Philip Resnik & Amy Weinberg

Department of Linguistics and CLIP Lab

mailto:[email protected]

Paraphrase generation

For some target OOV phrase Phr:

• Build distributional profile DPPhr

• Gather contexts of Phr

• Gather paraphrase candidates

• Score / Rank candidates

• Output K-best candidates

Paraphrase Generation for Phr

Build distributional profile DPPhr

Gather contexts of Phr

Gather paraphrase candidates

Score / Rank candidates

Output K-best candidates

Distributional Profiles

• Example of collocational distributional profile (DP) for word “cord”:

• Sliding window(+/- 6 tokens)

• SoA: conditional probability (CP), mutual info (PMI), log-likelihood ratios (LLR), …

• Using LLR

Collocate Co-occurrence Count

Strength-of-Association (SoA)

Hanging 8 12.20

Ventral 6 18.44

Trousers 14 62.19

… … …







DP Similarity

• Each DP is represented as a vector

• Use any vector similarity

• Using cosine: cos(DPcord , DPrope)

• Example: estimating similarity between “cord” and “rope”:

SoA with “cord”

12.20

18.44

62.19

…

SoA with “rope”

10.43

4.97

31.82

…







Gather contexts

• Gather all contexts L _ R for “cord”:

• Length of context: start small, increase if too frequent

Left context (L) _ Right context (R)

A full cord is a large amount of wood.

History of the Cord 810 and 812

a soft tufted cord used in embroidery

a knotted cord that runs out from a reel

the cord of his electric razor.

living well after spinal

cord injury or disease

… cord …


• What else appears b/w L _ R ?

Left context (L) _ Right context (R)

A full wave analysis is required since it

is a large amount of electromagnetic

History of the world since his death in

810

a soft tufted soft tufted cord of silk, cotton, or worsted

used in embroidery

a knotted rope that runs out

the cable of his electric razor.

spinal accessory nerve injury

… … …


• Measure distributional similarityof target (“cord”) with each candidate:

candidate score

rope cos(DPcord , DPrope) = .83

cable cos(DPcord , DPrope) = .79

accessory nerve cos(DPcord , DPaccessory nerve) = .46

world since his death in

cos(DPcord , DPworld since his death in) = .03

… …

Output k-best candidates

• K = 20

• Limit span between L _R to 10 tokens

• Use best candidates to augment phrase table

Some real examples (unigrams)

•

Some real examples (ngrams)

•

English to Chinese

• 29k line subset created to emulate low density language setting

Spanish to English

•

Comparison with Pivoting

•

Comparison with Pivoting

• Pivoting is subject to translational “shift”– Due to double translation step

• Pivoting suffers from having function words as top candidates– Perhaps by-product of their alignment

“promiscuity”

• Monolingual paraphrases suffer from having antonyms as top candidates

Monolingually-Derived Paraphrases: Advantages

• Significant gains in SMT results for small sets

• Good for resource-poor languages

• Not relying on bitexts (a limited resource)

• Larger monolingual paraphrase training set

yields better paraphrases

• General: Can plug in any similarity measure

Challenges

• Quality: distributional paraphrases suffer from high ranking antonyms, co-hypernyms

• Smaller gains than the pivoting technique Callison-Burch et al. (2006), but can scale up.

• How to benefit from POS and syntactic info e.g, Callison-Burch (2008)

• How to benefit from semantic info / WSDe.g., Marton, Mohammad & Resnik 2009; Erk & Pado 2008

• Scaling: need to explore if can get gains on bigger SMT sets before exhausting capacity of handling huge monolingual set.

Thank you!

• Questions

http://umiacs.umd.edu/~ymarton

Documents

Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt