102
2007-09-14 MMDSS - P. Gallinari 1 Learning structured ouputs P. Gallinari [email protected] www-connex.lip6.fr University Pierre et Marie Curie – Paris – Fr NATO ASI Mining Massive Data Sets for security

2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari [email protected] University Pierre et Marie Curie –

Embed Size (px)

Citation preview

Page 1: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 1

Learning structured ouputs

P. [email protected]

www-connex.lip6.fr

University Pierre et Marie Curie – Paris – Fr

NATO ASI

Mining Massive Data Sets for security

Page 2: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 2

Outline

Motivation and examples Approaches for structured learning

Generative models Discriminant models Search models

Page 3: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 3

Machine learning and structured data

Different types of problems Model, classify, cluster structured data Predict structured outputs Learn to associate structured

representations Structured data and applications in many

domains chemistry, biology, natural language, web,

social networks, data bases, etc

Page 4: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 4

Sequence labeling: POS

This Workshop brings together scientistsand engineers

DT NN VBZ RB NNS CC NNSinterested in recent developments in exploiting Massive

VBN IN JJ NNS IN VBG JJdata sets

NP NP

determiner noun Verb 3rd pers adverb Noun pluralCoord. Conj.

adjective Verb gerund

Verb plural

Page 5: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 5

PENN tag set1. CC Coordinating conjunction 25.TO to 2. CD Cardinal number 26.UH Interjection 3. DT Determiner 27.VB Verb, base form 4. EX Existential there 28.VBD Verb, past tense 5. FW Foreign word 29.VBG Verb, gerund/present participle 6. IN Preposition/subord. 30.VBN Verb, past participle 7. JJ Adjective 31.VBP Verb, non-3rd ps. sing. present 8. JJR Adjective, comparative 32.VBZ Verb, 3rd ps. sing. present 9. JJS Adjective, superlative 33.WDT wh-determiner 10.LS List item marker 34.WP wh-pronoun 11.MD Modal 35.WP Possessive wh-pronoun 12.NN Noun, singular or mass 36.WRB wh-adverb 13.NNS Noun, plural 37. # Pound sign 14.NNP Proper noun, singular 38. $ Dollar sign 15.NNPS Proper noun, plural 39. . Sentence-final punctuation 16.PDT Predeterminer 40. , Comma 17.POS Possessive ending 41. : Colon, semi-colon 18.PRP Personal pronoun 42. ( Left bracket character 19.PP Possessive pronoun 43. ) Right bracket character 20.RB Adverb 44. " Straight double quote 21.RBR Adverb, comparative 45. ` Left open single quote 22.RBS Adverb, superlative 46. " Left open double quote 23.RP Particle 47. ' Right close single quote 24.SYM Symbol 48. " Right close double quote

Page 6: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 6

Segmentation + labeling: syntactic chunking (Washington Univ. tagger)

This Workshop brings together scientistsand engineers

NP VP ADVP NPinterested in recent developments in

VP IN NP PNPexploiting Massive data sets

NP

Noun Phrase Verb Phrase Noun Phraseadverbial Phrase

Noun Phrase

Page 7: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 7

Segmentation + labeling: Named Entity recognition Entities

locations, persons, organizations Time expressions: dates, times Numeric expression: $ amount, percentages

NEW YORK (Reuters) - Goldman Sachs Group Inc. agreed on Thursday to pay $9.3 million to settle charges related to a former economist …. Goldman's GS.N settlement with securities regulators stemmed from charges that it failed to properly oversee John Youngdahl, a one-time economist …. James Comey, U.S. Attorney for the Southern District of New York, announced on Thursday a seven-count indictment of Youngdahl for insider trading, making false statements, perjury, and other charges. Goldman agreed to pay a $5 million fine and disgorge $4.3 million from illegal trading profits.

Page 8: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 8

Information extraction

Home Organizing Committee Lecturers Program Submission Participants NATO ASI Information Travel Information Hosted by

Event Sponsored by

NATO Advanced Study I nstitute

on

Mining Massive Data Sets for Security

September 10 - 21, 2007, Villa Cagnola - Gazzada - I taly

NATO ASI Announcement

This Workshop brings together scientists and engineers interested in recent developments in exploiting Massive Data Sets. Emphasis is placed on available techniques and their application to security-critical applications….

Lecturers

C. Best L. Bottou R. Feldman F. Fogelman-Soulié P. Gallinari E. Glover L. Giles A. Gionis I . Guyon D. Hand G. Hébrail F. Provost N. Tishby V. Vapnik D. Wilksinson

Objective

Today our world is awash in data and we live in an Information Society where every action leaves a trace, generating massive amounts of data. Recent scientific developments provide technologies to exploit these huge amounts of data and extract from it critical information. ……

Directors

Clive Best, J RC - IT Françoise Fogelman Soulié, Kxen - FR

Patrick Gallinari, Univesité Paris 6 - FR Naftali Tishby, Hebrew University - IL

Importants Dates

Deadline for submission of application form: J une 24, 2007 (Extended)

Notification of acceptance: J une 30, 2007 (New)

Deadline for Accomodation form: J uly 1, 2007

NATO ASI MMDSS: September 10-21, 2007

…….

Legal Notice Webmaster Top

Page 9: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 9

Syntaxic Parsing (Stanford Parser)

Page 10: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 10

Document mapping problem Problem: query heterogeneous XML databases or collections Need to know the correspondence between the structured

representations uually made by hand Learn the correspondence between the different sources

Labeled tree mapping problem

<Restaurant><Name>La cantine</Name><Adress> 65 rue des pyrénées, Paris, 19ème, FRANCE</Adress><Specialities> Canard à l’orange, Lapin au miel</ Specialities ></Restaurant>

<Restaurant><Name>La cantine</Name><Adress> <City>Paris</City><Stree>pyrénées</Street> <Num>65</Num></Adress><Dishes> Canard à l’orange</Dishes><Dishes> Lapin au miel</Dishes></Restaurant>

Page 11: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 11

Others Taxonomies Social networks Adversial computing: Webspam, Blogspam, … Translation Biology …..

Page 12: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 12

Is structure really useful ?Can we make use of structure ? Yes

Evidence from many domains or applications Mandatory for many problems

e.g. 10 K classes classification problem Yes but

Complex or long term dependencies often correspond to rare events

Practical evidence for large size problems Simple models sometimes offer competitive results

Information retrieval Speech recognition, etc

Page 13: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 13

Structured learning X, Y : input and output spaces Structured output

y Y decomposes into parts of variable size y = (y1, y2,…, yT)

Dependencies Relations between y parts Local, long term, global

Cost function O/ 1 loss: Hamming loss: F-score: BLEU etc

yy ˆ*1

T

iii yy

1

ˆ*1

yy ˆ*1

Page 14: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 14

General approach Predictive approach:

where F : X x Y R is a score function used to rank potential outputs

F trained to optimize some loss function Inference problem

|Y| sometimes exponential Argmax is often intractable: hypothesis

decomposability of the score function over the parts of y

Restricted set of outputs

),,(maxarg)(* yxFxfyYy

i

iiYy

yxFxfy ),,(maxarg)(*

Page 15: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 15

Structured algorithms differ by: Feature encoding Hypothesis on the output structure Hypothesis on the cost function

Page 16: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

Generative models

Hidden Markov ModelsProbabilistic Context Free grammarsTree labeling model

Page 17: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 17

Usual hypothesis Features : “natural” encoding of the input Hypothesis on the output structure : local

output dependencies, Markov property Score decomposes, e.g. sum of local cost on

each subpart Inference : usually dynamic programming

Page 18: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 18

HMMs Sequence labeling – segmentation Dependencies

Outputs : Markov independence

Decoding and learning Dynamic programming

Viterbi Argmax …. Forward Backward

Decoding complexity O(n|Q|2) for a sequence of length n and |Q| states

)/()/(111 tt

tt

qqpqqp

)/(),/(1

1

1 tttt

tqxpqxxp

Page 19: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 19

Consider a simple HMM Start

State space for an input sequence of size 3

Page 20: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 20

Probabilistic Context Free Grammar (after Manning & Shultze)

Set of terminals {w1,…,wv} Set of non terminals {N1,…,Nn} N1: start symbol Set of rules {Ni zi} with zi sequence of

terminals and non terminals To each rule is associated a probability P(Ni

zi) Special case: Chomsky Normal Form

grammars zi = wj

zi = NkNm

Page 21: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 21

S NP VP 1.0 NP NP PP 0.4 PP P NP 1.0 NP astronomer 0.1 VP V NP 0.7 NP ears 0.18 VP VP PP 0.3 NP saw 0.04 P with 1.0 NP stars 0.18 V saw 1.0 NP telescopes 0.1

S

VP

VP V

V

NP

NP

NPNP PP

PP P NP

astronomers

saw

stars

with ears

Page 22: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 22

Notations Sentence

Wp,q= wpwp+1…wq

Ni dominates sequence Wp,q if Ni may rewrite wpwp+1…wq

Assumptions Context free

Probability of a subtree does not depend on words outside the subtree

Independence from N.. Ancestors The probability does not depend on nodes in the

derivation outside the subtree

Nj

Wp……… Wq

Page 23: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 23

Inside and outside probabilities As for the forward – backward

variables in HMMS, 2 probabilities may be defined

Inside Probability of generating

wk…wl starting from Nj

Outside Probability of generating Nj

and all words outside wk…wl

)/(),( ,,jlklkDefj Nwplk

),,(),( ,1,1,1 nljlkkDefj wNwplk

W1…Wk-1 Wk ….…… Wl Wl+1…Wn

Nj

N1

Page 24: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 24

Probability of a sentence: CKY algorithm Probability of sentence w1,n

Left Right induction on the sequence

For k = 1 .. n For l= k+1 .. n, calculate

),1(),()(),(,,

lmmkNNNPlk qmqp

pqpj

j

kjwNpkk kj

j ,)(),(

Nj

NqNp

Wk……… Wm Wm+1 ….....… Wl

)(),1( ,11 nwpn

Page 25: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 25

Inference and learning Inference

Similar to probability of a sentence with Max instead of

Complexity: O(m3n3) n = length of the sentence, m = # non terminals

in the grammar Learning

Inside – outside Each step is O(m3n3)

Page 26: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

Tree generative models

Classification / clustering of structured documents (Denoyer et al. 2004)Document annotation / conversion (Wisniewski et al. 2006)

Page 27: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 27

Context-XML semi-structured documents

<fig>

<fgc>

<sec>

<p>

<bdy>

<article>

<st>

<hdr>

text

text

text

Page 28: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 28

Document model

),/()/(

)/,()/(

ddd

dd

sStTPsSP

tTsSPdDP

Structural probability

Content probability

),( dd tsd

s

t

! Scalability !

Page 29: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 29

Document Model: Structure Belief Networks

Paragraphe Paragraphe

Section Section

Titre

Titre

Document

Corps

Titre du document

Titre de la section

Cette section contient deuxparagraphes

Premier paragraphe Second paragraphe

La deuxième section necontient pas de paragraphes

Document

Intro SectionSection

Paragraphe Paragraphe Paragraphe

Document

Intro SectionSection

Paragraphe Paragraphe Paragraphe

//

1

)()(d

i

di

d sPsP

//

1

)))((/()(d

i

id

id

d nparentlabelsPsP

//

1

))(()),((/)(d

i

di

di

di

d nprécédentlabelnparentlabelsPsP

Document

Intro SectionSection

Paragraphe Paragraphe Paragraphe

Page 30: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 30

Document Model: Content Model for each node

1st order dependency

Use of a local generative model for each label

),....,( //1 dddd ttt

//

1

),/(),/(d

i

id

iddd stPstP

)/(),/( idd si

did

i tPstP

Page 31: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 31

Final network Document

Intro SectionSection

Paragraphe

Paragraphe Paragraphe

T1= « Ce documentest un exemple dedocument structuré

arborescent »

T2= « Ceci est lapremière section du

document »

T3= « Le premierparagraphe »

T4= «Le secondparagraphe »

T5= «La secondesection »

T6= «Le troisièmeparagraphe »

)/( DocumentIntroP )/( DocumentSectionP)/( DocumentSectionP

)/( SectionParagrapheP

)/( SectionParagrapheP

)/( SectionParagrapheP)/1( IntroTP

)/2( SectionTP

)/3( ParagrapheTP

)/5( SectionTP

)/4( ParagrapheTP )/6( ParagrapheTP

)/6()/5()/4(*

)/3()/2()/1(*

)/arg()²/()/()( 3

ParagrapheTPSectionTPParagrapheTP

ParagrapheTPSectionTPIntroTP

SectionraphePPDocumentSectionPDocumentIntroPdP

Page 32: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 32

Different learning techniques Likelihood maximization Discriminant learning

Logistic function Error minimization

Fisher Kernel

contenustructure

Dd

d

i

ts

di

di

Dd

sd

Dd

LL

stPsP

dPL

TRAIN

di

TRAIN

TRAIN

//

1

),/(log)/(log

)/(log

n

ic

ixpaix

cixpaix

e

e

xcP

cxP

cxP

1 )(,

)(,log

)/(

)/(log

1

11

1)/(

Page 33: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 33

Document mapping problem Problem

Learn from examples how to map heterogeneous sources onto a predefined target schema

Preserve the document semantic

Sources: semistructured, HTML, PDF, flat text, etc

Labeled tree mapping problemDifferent instances

Flat text to XMLHTML to XML

XML to XML….

<Restaurant><Nom>La cantine</Nom><Adresse> 65 rue des pyrénées, Paris, 19ème, FRANCE</Adresse><Spécialités> Canard à l’orange, Lapin au miel</Spécialités></Restaurant>

<Restaurant><Nom>La cantine</Nom><Adresse> <Ville>Paris</Ville> <Arrd>19</Arrd> <Rue>pyrénées</Rue> <Num>65</Num></Adresse><Plat> Canard à l’orange</Plat><Plat> Lapin au miel</Plat></Restaurant>

Page 34: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 34

Document mapping problem Central issue: Complexity

Large collections Large feature space: 103 to 106

Large search space (exponential)

Approach Learn generative models of XML target

documents from a training set Decoding of unknown sources according to

the learned model

Page 35: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 35

Problem formulationGiven

ST a target format

dsin(d) an input document

Find the most probable target document

)'(maxarg)(

'din

TT S

SdS ddPd

Decoding Learned transformation model

Page 36: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 36

General restructuration model

sd

td'

sd'

td

),,/(),/(argmax '''

'1 ddddd

dtstPssPd

Page 37: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 37

Example : HTML to XML (Tree annotation)

Hypothesis Input document

HTML tags mostly for visualization Remove tags Keep only the segmentation (leaves)

Annotation Leaves are the same in the HTML and XML

document Target document model: node label depends only

on its local context Context = content, left sibling, father

Page 38: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 38

Model and training

Probability of target tree

Solve

Exact Dynamic Programming decoding O(|Leaf nodes|3.|tags|)

Approximate solution with LASO (Hal Daume ICML 2005) O(|Leaf nodes|.|tags||tree nodes|)

iniiiidT

dTdSinT

nfathernsibcnPdddP

dddPddP

))(),(,(),...,(

),...,()(

1

1)(

Document

Intro SectionSection

Paragraphe Paragraphe Paragraphe

)'(maxarg)(

'din

TT S

SdS ddPd

Page 39: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 39

Experiments : HTML to XML

IEEE collection / INEX corpus 12 K documents,

Average: 500 leaf nodes, 200 int nodes, 139 tags Movie DB

10 K movie descriptions (IMDB) Average: 100 leaf nodes, 35 int. nodes, 28 tags

Shakespeare 39 plays Few doc, but:

Average: 4100 leaf nodes, 850 int nodes, 21 tags Mini-Shakespeare

Randomly chosen 60 scenes from the plays 85 leaf nodes, 20 int. nodes, 7 tags

For all collections ½ train, ½ test

Page 40: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 40

Performance

Page 41: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 41

Page 42: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 42

Summary 30 years of generative models

Hierarchical HMMs, Factorial HMMs, etc Local dependency hypothesis

On the outputs On the inputs

Inference and learning often use dynamic programming Prohibitive for some/ many problems Other methods: loopy propagation, search e.g. ,A*, ..

Cost function : joint likelihoood - decomposes

Page 43: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

Discriminant models

Structured Percepron (Collins 2002)Large margin methods (Tsochantaridis et al. 2004, Taskar

2004)

Page 44: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 44

Usual hypothesis Joint representation of input – output Φ(x, y)

Encode potential dependencies among and between input and output e.g. histogram of state transitions observed in training

set, frequency of (xi,yj), POS tags, etc

Large feature sets (102 -> 104)

Linear score function:

Decomposability of features set (outputs) and of the loss function

),(,),,( yxyxF

Page 45: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 45

Structured Perceptron (Collins 2002)

Discriminant model based on a Perceptron variant for sequence labeling

Initially proposed for POS and Chunking Possible extension to other structured

outputs tasks Inference: Viterbi Encodes input and output (local) dependencies Simplicity

Page 46: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 46

Algorithm

Training algorithm Initialize : Repeat n times over all training examples

(x,y)

If update parameters

0

),(,maxargˆ yxysY

yy ˆ

)ˆ,(),( yxyx

Page 47: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 47

Inference : DP Restricted to 0/ 1 cost Also

Convergence and generalization bounds (Freund & Shapire, 99 )

# mistakes depends only on on the margin, not on the size of output space (potential candidates)

Page 48: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 48

Extension of large margin methods

2 problems Generalize max margin principle to other

loss functions than O/1 loss Number of constraints proportional to |Y|,

i.e. potentially exponential

Page 49: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 49

SVM ISO (Tsochantaridis et al. 2004)

Extension of multi-class SVMs Principle:

0),(,)(,:\,,..1

esinequalitilinear N)card(*N solve :problem Equivalent

),(,),(,\

max:,..1

:iferror 0get We

esinequalitilinear non N solve toamounts Problem

yixwiyixwiyYyNi

Y

iyixwyixwiyYy

Ni

problem separable linearly and tionclassificafor loss 10/ : Example

Page 50: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 50

SVM formulation non linearly separable case, 0/1 cost (1 slack var. per non linear constraint (Crammer – Singer 91)):

iyYyiyixwiyixwyiw

i

n

iiN

Cw

QP

i

i

\,1),(,),(,)(,

sconstraintwith

1

22

1 min

0,

Page 51: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 51

Pb 1 : Extension to ∆ loss For each constraint, penalize examples according to its loss Rescale slack variables according to the loss incurred in each

linear constraint

New constraints

iyYyiyiy

iyixwiyixwyiw \,),(

1),(,),(,)(,

Page 52: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 52

Learning Pb 2 : Limit the number of constraints

Done implicitely via the training algorithm The algorithms finds a polynomial number of

« active » contraints the solution of the QP problem with these

constraints alone fullfill all constraints with a given precision ε

The algorithm requires to solve an Argmax problem at each iteration Viterbi for sequences CKY for parsing

Page 53: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 53

M3N (Taskar et al. 2003)

Combine a probabilistic model (Markov Network) with a max margin formulation Solution to problem 1 : margin rescaling Solution to problem 2 : the structure of the Markov

network limits the number of constraints (e.g. chain network for sequences)

Page 54: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 54

Summary: discriminant approaches

Hypothesis Local dependencies for the output Decomposability of the loss function

Long term dependencies in the input Nice convergence properties + bounds Complexity

Learning often does not scale

Page 55: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

Incremental learningLearning to search solution spaces

Incremental Parsing, Collins 2004SEARN, Daume et al. 2006Reinforcement Learning, Maes et al. 2007

Page 56: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 56

General ideas Incremental construction of output ŷ

Actions Decisions : choose a subset of actions

Learn how to explore the state space of the problem

Page 57: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 57

Incremental parsing (Collins, Roark, 2004)

Build incrementally the parse tree of a sentence Inference: greedy algorithm

Read the sentence from left to right Step i corresponds to the ith word in the sentence At step i,

candidate partial parse trees are generated for the first i words of the sentence

then scored and ranked A subset of the candidates is selected Selected candidates will be used to generate next set of

candidates Final candidate is the best scored parse tree at the

end of the sentence

Page 58: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 58

More Ranking function F = < Φ(x,y) , θ> F is learned from a training set with an

adaptive algorithm (Perceptron) Input at step i: joint vectorial representation

of the input sentence and the partial parse tree

No need for dynamic programming

Page 59: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 59

How to build sequences of partial trees : from Y(i) to Y(i+1) At step i

Consider word xi

Action = Try to attach each chain ending with xi to any attachment site

Grammar G includes some constraints Allowable Chain

only derivation chains appearing in the data set are allowed

Attachment site Places in the partial tree where chain can be attached

(inferred from data too)

Page 60: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 60

ε

NP

Astronomer

VP

saw V

saw stars

Page 61: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 61

SEARN (Daume et al 2006)

Introduced the idea of learning how to explore a search space

Hypothesis : the structured output can be built incrementally ŷ = (ŷ1, ŷ2,…, ŷT)

It will by build via Machine learning

Loss :

Goal Construct incrementally ŷ so as to minimize the loss Learn how to search the solution space

)ˆ,(

yy

XEC

Page 62: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 62

Example : sequence labelling

Example

2 labels R et B

Search space :

(input sequence, {sequence of labels})

For a size 3 sequence

x = x1 x2 x3 :

A node represents a state in the search space

Page 63: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 63

Example : expected loss

C=1

C=1

C=1

C=1

C=1

C=1

C=1

C=0

C=0

C=0

C=0

C=0

C=0CT=1

CT=2

CT=2

CT=3

CT=0

CT=1

CT=1

CT=2

C=0

Sequence of size 3

target :

Loss does not always

separate !!

Page 64: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 64

Example: state space exploration guided by local costs

C=1

C=1

C=1

C=1

C=1

C=1

C=1

C=0

C=0

C=0

C=0

C=0

C=0CT=1

CT=2

CT=2

CT=3

CT=0

CT=1

CT=1

CT=2

C=0

Sequence of size 3

target :

Goal:

generalize to unseen situation

Page 65: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 65

Inference

Suppose we have a policy function F which decides at each step which action to take

Inference could be performed by computing ŷ1= F(x,. ), ŷt= F(ŷ1,… ŷt-1),…, ŷT= F(ŷ1,… , ŷT-1) ŷ = F(ŷ1,… , ŷT)

No Dynamic programming needed

Page 66: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 66

Training F will be implemented with a classifier

F : {States} -> {Actions} Training

Learn a classifier F s.t. At each step, F takes the “optimal” decision

Incremental algorithm First classifier F1 will learn from the optimal path

supposed to be known at training time Bad solution for generalization

At step i, classifier i will learn from the decision of Fi-1

Page 67: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 67

Let Fi be current classifier For input x, at each state st,

there are 2 possible actions Compute the expected cost

associated to actions a1, a2 The best action will be

labeled 0, the other 1 : targets for the classifier

When the final state is reached, we get a set of training example for Fi+1

a1

a2

CF(s0,a1)

CF(s0,a1)

a1

a2

CF(s1,a1)

CF(s1,a2)

a1

a2

CF(s2,a1)

CF(s2,a2)

s0 s3

s2

s1

Page 68: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 68

Training algorithm F initialized with a good initial policy For each example x

Compute its prediction ŷ = (ŷ1, ŷ2,…, ŷT(x)) using current F Let (s1, s2,…, sT(x)) the corresponding set of states For each state sT

compute the state representation Φ(sT,x) For each possible action “a” compute the expected loss of “a”

cF(st, a) set of training examples for the policy classifier

For each x and each st {(a1, cF(st, a1)),… (a|A|, cF(st, a|A|))}x,st

Train classifier F’ to predict the best action Update current classifier F with F’ iterate

Page 69: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 69

Searn hypothesis Decomposability of y

y can be built from successive parts Decomposability of training loss Optimal policy

Remarks No Markov assumption No need for DP

ŷ built by successive applications of the policy Fast

Can accommodate large number of cost functions

Page 70: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 70

Reinforcement learning search (Maes 2007)

Formalizes search based ideas as a Markov Decision Problem and Reinforcement Learning problem

Provides a general framework for this approach Many RL algorithm could be used for training

Page 71: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 71

Reinforcement learning An agent A is in an environment At time t A is in a state st and takes action at A receives a reward rt from the environment and moves

to state st+1 Its goal is to maximize the long term reward The environment is often stochastic

Modeled as a finite-state Markov Decision Process (MDP)

Goal of A: maximize some long term reward No notion of correct input – output pair

A is “myopic” and shall explore the environment in order to estimate its reward

Typical situation in robotics, 2 player games, planning, etc

Page 72: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 72

Markov Decision Process A MDP is a tuple (S, A, P, R)

S is the State space, A is the action space, P is a transition function describing the

dynamic of the environment P(s, a, s’) = P(st+1 = s’| st = s, at = a)

R is a reward function R(s, a, s’) = E[rt|st+1 = s’, st = s, at = a)

Page 73: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 73

Policy Distribution: (s,a) = P(at = a | st = s)

Immediate reward When action a is chosen in state st, agent receives an

immediate reward rt

Cumulative reward from t

Goal Develop the policy that maximizes R0

If P and R are known: problem is usually solved using DP When only S and A are known : reinforcement learning

k

ktk

t rR

Page 74: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 74

Direct approach For each possible policy sample reward r Choose the policy with highest reward

Value function approach Belmann equation

Use estimates of E[R|st], the value function, and learn a policy that maximize them

1 ttt sRErsRE

Page 75: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 75

Reinforcement learning value functions

they measure “how good it is” to be in a state s or to choose action a when in state s

A policy is better than ’ if

In order to improve , learn to improve V or Q

],[),(

][)(

aassREasQ

ssREsV

ttt

tt

ssVsV )()( '

Page 76: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 76

Most RL algorithms use the following scheme Iterate

Evaluation of utility functions V or Q for the current policy Improve policy by increasing V or Q

Remark V and Q are often stored in tables For large problems unfeasible

Use approximate values : regression:

),(action - state couple theofn descriptio vectorial),(

),(,),(ˆ

asas

asasQ

Page 77: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 77

prototype RL ALGORITHM Initialize θ Repeat

choose an initial state s choose an action a (stochastic) While final state is not reached

take action a, observe reward r and next state s’ Learn θ to improve Q from this feedback s s’, a a’

Until convergence

Page 78: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 78

Structured outputs and MDP State st

Input x + partial output ŷt

Initial state : (x,) Actions

Task dependent POS: new tag for the current word XML: insert a new path in a partial tree

Reward Final: Heuristic:

)ˆ,( yyR )ˆ,( tyyr

Page 79: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 79

Inference Apply learned policy on x

Learning SARSA here (other RL algorithms could do)

Page 80: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 80

Exemple : sequence labeling Left Right model

Actions: label Order free model

Actions: label + position Loss : Hamming cost or F score Tasks

Named entity recognition (Shared task at CoNNL 2002 - 8 000 train, 1500 test)

Chunking – Noun Phrases (CoNNL 2002) Handwriten word recognition (5000 train, 1000 test)

Complexity of inference O(sequence size * number of labels)

Page 81: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 81

Page 82: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 82

Dependency parsing

Page 83: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 83

Action (target word, label)

Cost function “Labeled attachment” score (# correct target words+

label) CoNLL 2007

10 languages

Page 84: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 84

XML structuration Action

Attach path ending with the current leaf to a position in the current partial tree

Φ(.,.) encode a series of potential (state, action) pairs

Loss: F-Score for trees

Page 85: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 85

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

AUTHOR

Francis MAESTitle of the

section

SECTION

TITLE

Welcome toINEX

TEXT

INPUT DOCUMENT

TARGET DOCUMENT

Page 86: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 86

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Page 87: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 87

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Page 88: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 88

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

Page 89: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 89

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

Page 90: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 90

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

Page 91: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 91

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

Francis MAES

AUTHOR

Page 92: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 92

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

Francis MAES

AUTHOR

Page 93: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 95

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

Title of the section

SECTION

TITLE

AUTHOR

Francis MAES

Page 94: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 96

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

AUTHOR

Francis MAESTitle of the

section

SECTION

TITLE

Page 95: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 97

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

AUTHOR

Francis MAESTitle of the

section

SECTION

TITLE

Welcome toINEX

TEXT

Page 96: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 98

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

AUTHOR

Francis MAESTitle of the

section

SECTION

TITLE

Welcome toINEX

TEXT

Page 97: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 99

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

AUTHOR

Francis MAESTitle of the

section

SECTION

TITLE

Welcome toINEX

TEXT

This is afootnote

Page 98: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 100

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

AUTHOR

Francis MAESTitle of the

section

SECTION

TITLE

Welcome toINEX

TEXT

This is afootnote

Page 99: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 101

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

AUTHOR

Francis MAESTitle of the

section

SECTION

TITLE

Welcome toINEX

TEXT

INPUT DOCUMENT

TARGET DOCUMENT

Page 100: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 102

Results

Page 101: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 103

Summary on search method Learn to explore the state space of the problem Alternative to DP or classical search algorithms Could be used with any decomposable cost

function

Page 102: 2007-09-14MMDSS - P. Gallinari1 Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr  University Pierre et Marie Curie –

2007-09-14 MMDSS - P. Gallinari 104

Conclusion Other approaches

Y. Lecun (2006): energy Based models J. Weston (2007): regression Cohen (2006): stacking …..