Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang...

Reified Context Models

Jacob Steinhardt Percy Liang

Stanford University

{jsteinhardt,pliang}@cs.stanford.edu

July 8, 2015

J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11

Structured Prediction Task

input x :

output y : v o l c a n i c

Contexts Are Key

v o l c a

v *o **l ***cDP:

v o l c a

v vo vol volcbeam search:

Key idea: contexts!

*odef=

aoboco...

Contexts Are Key

v o l c a

v *o **l ***cDP:

v o l c a

Key idea: contexts!

*odef=

aoboco...

Contexts Are Key

v o l c a

v *o **l ***cDP:

v o l c a

Key idea: contexts!

*odef=

aoboco...

Contexts Are Key

v o l c a

v *o **l ***cDP:

v o l c a

Key idea: contexts!

*odef=

aoboco...

Desiderata

r *o **l ***cv *a **i ***r

coverage (short contexts)better uncertainty estimates (precision)stabler partially supervised learning updates

r ro rol rolcv ra ral ralc

expressivity (long contexts)capture complex dependencies

r ro rol *olc

← best of both worldsv ra ral ***cy *o *ol ***r

* ** *** ****

Desiderata

r *o **l ***cv *a **i ***r

r ro rol *olc

* ** *** ****

Desiderata

r *o **l ***cv *a **i ***r

r ro rol *olc

* ** *** ****

Desiderata

r *o **l ***cv *a **i ***r

r ro rol *olc

* ** *** ****

Desiderata

r *o **l ***cv *a **i ***r

r ro rol *olc

* ** *** ****

Desiderata

r *o **l ***cv *a **i ***r

r ro rol *olc

* ** *** ****

Desiderata

r *o **l ***cv *a **i ***r

r ro rol *olc

* ** *** ****

Reifying Contexts

input x :

output y : v o l c a n i ccontext c: v *o *ol *olc · · · · · ·

Challenge: how to trade off contexts of different lengths?=⇒ Reify contexts as part of model!

Reifying Contexts

input x :

context c: v *o *ol *olc · · · · · ·

Reifying Contexts

input x :

Reifying Contexts

input x :

r ro rol *olcv ra ral ***cy *o *ol ***r

* ** *** ****

Reifying Contexts

input x :

r ro rol *olc

←“context sets”v ra ral ***cy *o *ol ***r

* ** *** ****C1 C2 C3 C4

Reifying Contexts

input x :

r ro rol *olc

* ** *** ****C1 C2 C3 C4

Challenge: how to trade off contexts of different lengths?

=⇒ Reify contexts as part of model!

Reifying Contexts

input x :

r ro rol *olc

* ** *** ****C1 C2 C3 C4

Given:

context sets C1, . . . ,CL

features φi(ci−1,yi)

Define the model

pθ (y1:L,c1:L−1) ∝ exp

∑i=1

φi(ci−1,yi)

)· κ(y ,c)︸︷︷︸

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κφ1 φ2 φ3 φ4 φ5inference via

forward-backward!

Given:

Define the model

∑i=1

φi(ci−1,yi)

)· κ(y ,c)︸︷︷︸

consistency

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

forward-backward!

Given:

Define the model

∑i=1

φi(ci−1,yi)

)· κ(y ,c)︸︷︷︸

consistency

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

forward-backward!

Given:

Define the model

∑i=1

φi(ci−1,yi)

)· κ(y ,c)︸︷︷︸

consistency

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κ

φ1 φ2 φ3 φ4 φ5inference via

forward-backward!

Given:

Define the model

∑i=1

φi(ci−1,yi)

)· κ(y ,c)︸︷︷︸

consistency

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κ

φ1 φ2 φ3 φ4 φ5

inference viaforward-backward!

Given:

Define the model

∑i=1

φi(ci−1,yi)

)· κ(y ,c)︸︷︷︸

consistency

Y1 Y2 Y3 Y4 Y5

C1 C2 C3 C4

κ κ κ κ κ

φ1 φ2 φ3 φ4 φ5inference via

forward-backward!

Adaptive Context Selection

Select context sets Ci during forward pass of inference

Greedily select contexts with largest mass

abcde...

cacb...eaeb...?a...

ca?a??

Biases towards short contexts unless there is high confidence.

abcde...

cacb...eaeb...?a...

ca?a??

abcde...

cacb...eaeb...?a...

ca?a??

abcde...

cacb...eaeb...?a...

ca?a??

abcde...

cacb...eaeb...?a...

ca?a??

abcde...

cacb...eaeb...?a...

ca?a??

abcde...

cacb...eaeb...?a...

ca?a??

abcde...

cacb...eaeb...?a...

ca?a??

abcde...

cacb...eaeb...?a...

ca?a??

abcde...

cacb...eaeb...?a...

ca?a??

Precision

input x :

Model assigns probability to each prediction, so can predict on most confidentsubset.

Measure precision (# of correct words) vs. recall (# of words predicted).comparison: beam search

0.0 0.2 0.4 0.6 0.8 1.0recall

Word Recognition

Beam searchRCM

Precision

input x :

0.0 0.2 0.4 0.6 0.8 1.0recall

Word Recognition

Beam searchRCM

Precision

input x :

Measure precision (# of correct words) vs. recall (# of words predicted).

comparison: beam search

0.0 0.2 0.4 0.6 0.8 1.0recall

Word Recognition

Beam searchRCM

Precision

input x :

0.0 0.2 0.4 0.6 0.8 1.0recall

Word Recognition

Beam searchRCM

Precision

Measure precision (# of correct words) vs. recall (# of words predicted).

0.0 0.2 0.4 0.6 0.8 1.0recall

1.00pr

Word Recognition

Beam searchRCM

Partially Supervised Learning

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .

latent z I am what I amoutput y 13 5 54 13 5

Goal: determine cipher

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.use learned emissions to determine cipher.again compare to beam search (Nuhn et al., 2013)

Fraction of correctly mapped words:

0 5 10 15 20training passes

DeciphermentRCMbeam

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .latent z I am what I am

output y 13 5 54 13 5

DeciphermentRCMbeam

Decipherment task:

cipher am 7→ 5, I 7→ 13, what 7→ 54, . . .latent z I am what I amoutput y 13 5 54 13 5

DeciphermentRCMbeam

Decipherment task:

DeciphermentRCMbeam

Decipherment task:

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.

use learned emissions to determine cipher.again compare to beam search (Nuhn et al., 2013)

DeciphermentRCMbeam

Decipherment task:

Fit 2nd-order HMM with EM, using RCMs for approximate E-step.use learned emissions to determine cipher.

again compare to beam search (Nuhn et al., 2013)Fraction of correctly mapped words:

DeciphermentRCMbeam

Decipherment task:

DeciphermentRCMbeam

Contexts During Training

Context lengths increase smoothly during training:

0 5 10 15 20number of passes

Decipherment

******↓

***ing↓

idding

Start of training: little information, short contexts.End of training: lots of information, long contexts.

hDecipherment

******↓

***ing↓

idding

hDecipherment

******↓

***ing↓

idding

Start of training: little information, short contexts.

End of training: lots of information, long contexts.

hDecipherment

******↓

***ing↓

idding

Discussion

RCMs provide both expressivity and coverage, which enable:

More accurate uncertainty estimates (precision)

Better partially supervised learning updates

Related work:

Coarse-to-fine inference (Petrov et al., 2006; Weiss et al., 2010)

Certificates of optimality (Sontag, 2010)

Tractable models (Poon & Domingos, 2011; Niepert & Domingos, 2014; Li& Zemel, 2014; S. & Liang, 2015)

Reproducible experiments on Codalab: codalab.org/worksheets

Discussion

Related work:

Discussion

Related work:

Discussion

Related work:

Discussion

Related work:

Discussion

Related work:

Discussion

Related work:

Reified Context Modelsjsteinhardt/publications/rcm/...July 8, 2015 J. Steinhardt & P. Liang...

Documents

New Puzzles in Supermassive Black Hole Evolution Charles L. Steinhardt IPMU, University of Tokyo October 14, 2010 Steinhardt & Elvis 2010, MNRAS, 402,

The Steinhardt family and the Cedar Boys

CONTACT - The Steinhardt Foundation for Jewish Life · THE STEINHARDT FOUNDATION FOR JEWISH LIFE Michael H. Steinhardt Chairman Robert P. Aronson President Rabbi David Gedzelman Executive

(n. Steinhardt) Caietele de La Rohia, Vol. 2

ELLs and Mathematics - NYU Steinhardt

By Bryden Steinhardt

American Educational Research Journal - NYU Steinhardt School of

The Hacker Conference: A Ritual Condensation - NYU Steinhardt

Sociology of Education MA Program Graduate - NYU Steinhardt

"Chinese Imperial City Planning" by Nancy Shatzman Steinhardt

Making Grade II - NYU Steinhardt

National League of Cities | Rachel Steinhardt | March 2014

High School - NYU Steinhardt

Презентация Turok Steinhardt - Xiaomi · 299m tsßfiï% new arrivals-9.20 299m rmb 299m 29900 rmb 29900 t u r ok steinhardt

Education and Social Policy MA Program - NYU Steinhardt

Culture and Education - NYU Steinhardt - NYU Steinhardt for... · Culture and Education Mitchell L. Stevens New York University keywords: culture, education, theory, measurement,

MTAC Association Presentation David Steinhardt CEO, IDEAlliance

Paul Steinhardt-Neil Turok

Trees in Our Community - NYU Steinhardt

High School Social Studies Glossary - NYU Steinhardt