61
Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer [email protected] www-connex.lip6.fr University Pierre et Marie Curie – Paris – Fr Computer Science Lab.

Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer [email protected] University Pierre

Embed Size (px)

Citation preview

Page 1: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

Learning structured ouputsA reinforcement learning

approach

P. Gallinari

F. Maes, L. [email protected]

www-connex.lip6.fr

University Pierre et Marie Curie – Paris – Fr

Computer Science Lab.

Page 2: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 2

Outline

Motivation and examples Approaches for structured learning

Generative models Discriminant models

Reinforcement learning for structured prediction

Experiments

Page 3: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 3

Machine learning and structured data

Different types of problems Model, classify, cluster structured data Predict structured outputs Learn to associate structured

representations Structured data and applications in many

domains chemistry, biology, natural language, web,

social networks, data bases, etc

Page 4: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 4

Sequence labeling: POS

This Workshop brings together scientistsand engineers

DT NN VBZ RB NNS CC NNSinterested in recent developments in exploiting Massive

VBN IN JJ NNS IN VBG JJdata sets

NP NP

determiner noun Verb 3rd pers adverb Noun pluralCoord. Conj.

adjective Verb gerund

Verb plural

Page 5: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 5

Segmentation + labeling: syntactic chunking (Washington Univ. tagger)

This Workshop brings together scientistsand engineers

NP VP ADVP NPinterested in recent developments in

VP IN NP PNPexploiting Massive data sets

NP

Noun Phrase Verb Phrase Noun Phraseadverbial Phrase

Noun Phrase

Page 6: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 6

Syntaxic Parsing (Stanford Parser)

Page 7: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 7

Tree mapping: Problem

query heterogeneous XML collections Web information extraction

Need to know the correspondence between the structured representations usually made by hand

Learn the correspondence between the different sourcesLabeled tree mapping problem

<Restaurant><Name>La cantine</Name><Adress> 65 rue des pyrénées, Paris, 19ème, FRANCE</Adress><Specialities> Canard à l’orange, Lapin au miel</ Specialities ></Restaurant>

<Restaurant><Name>La cantine</Name><Adress> <City>Paris</City><Stree>pyrénées</Street> <Num>65</Num></Adress><Dishes> Canard à l’orange</Dishes><Dishes> Lapin au miel</Dishes></Restaurant>

Page 8: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 8

Others Taxonomies Social networks Adversial computing: Webspam, Blogspam, … Translation Biology …..

Page 9: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 9

Is structure really useful ?Can we make use of structure ? Yes

Evidence from many domains or applications Mandatory for many problems

e.g. 10 K classes classification problem Yes but

Complex or long term dependencies often correspond to rare events

Practical evidence for large size problems Simple models sometimes offer competitive results

Information retrieval Speech recognition, etc

Page 10: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 10

Why is structure prediction difficult Size of the output space

# of potential labeling for a sequence # of potential joint labeling and

segmentation # of potential labeled trees for an input

sequence … Exponential output space

Inference often amounts at a combinatorial search problem

Scaling issues in most models

Page 11: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 11

Structured learning X, Y : input and output spaces Structured output

y Y decomposes into parts of variable size y = (y1, y2,…, yT)

Dependencies Relations between y parts Local, long term, global

Cost function O/ 1 loss: Hamming loss: F-score BLEU etc

yy ˆ*1

T

iii yy

1

ˆ*1

yy ˆ*1

Page 12: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 12

General approach Predictive approach:

where F : X x Y R is a score function used to rank potential outputs

F trained to optimize some loss function Inference problem

|Y| sometimes exponential Argmax is often intractable: hypothesis

decomposability of the score function over the parts of y

Restricted set of outputs

),,(maxarg)(* yxFxfyYy

i

iiYy

yxFxfy ),,(maxarg)(*

Page 13: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 13

Structured algorithms differ by: Feature encoding Hypothesis on the output structure Hypothesis on the cost function Type of structure prediction problem

Page 14: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

Three main families of SP models

Generative modelsDiscriminant modelsSearch models

Page 15: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 15

Generative models 30 years of generative models Hidden Markov Models

Many extensions Hierarchical HMMs, Factorial HMMs, etc

Probabilistic Context Free grammars Tree models, Graphs models

Page 16: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 16

Usual hypothesis Features : “natural” encoding of the input Local dependency hypothesis

On the outputs (Markov) On the inputs

Cost function Usually joint likelihood Decomposes, e.g. sum of local cost on each subpart

Inference and learning usually use dynamic programming HMMs

Decoding complexity O(n|Q|2) for a sequence of length n and |Q| states

PCFGs Decoding Complexity: O(m3n3), n = length of the sentence, m =

# non terminals in the grammar Prohibitive for many problems

Page 17: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 17

Summary generative SP models Well known approaches Large number of existing models Local hypothesis Often imply using dynamic programming

Inference learning

Page 18: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 18

Discriminant models Structured Perceptron (Collins 2002) Breakthrough

Large margin methods (Tsochantaridis et al. 2004, Taskar 2004)

Many others now Considered as an extension of multi-class

clasification

Page 19: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 19

Usual hypothesis Joint representation of input – output Φ(x, y)

Encode potential dependencies among and between input and output e.g. histogram of state transitions observed in training

set, frequency of (xi,yj), POS tags, etc

Large feature sets (102 -> 104)

Linear score function:

Decomposability of features set (outputs) and of the loss function

),(,),,( yxyxF

Page 20: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 20

Extension of large margin methods 2 problems

Classical 0/1 loss is meaningless with structured outputs Structured outputs is not a simple extension of

multiclass classification Generalize max margin principle to other loss functions

than 0/1 loss Number of constraints for the QP problem is

proportional to |Y|, i.e. potentially exponential Different solutions

Consider only a polynomial number of “active” constraints -> bounds on the optimality of the solution

Learning require solving an Argmax problem at each iteration i.e. Dynamic programming

Page 21: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 21

Summary: discriminant approaches

Hypothesis Local dependencies for the output Decomposability of the loss function

Long term dependencies in the input Nice convergence properties + bounds Complexity

Learning often does not scale

Page 22: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

Reinforcement learning for structured outputs

Page 23: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 23

Precursors

Incremental Parsing, Collins 2004 SEARN, Daume et al. 2006 Reinforcement Learning, Maes et al. 2007

Page 24: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 24

General idea The structured output will be built incrementally

ŷ = (ŷ1, ŷ2,…, ŷT) Different from solving directly the prediction preblem

It will by build via Machine learning

Loss :

Goal Construct incrementally ŷ so as to minimize the loss Training: learn how to search the solution space Inference: build ŷ as a sequence of successive decisions

)ˆ,(

yy

XEC

),,(maxarg)(* yxFxfyYy

Page 25: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 25

Example : sequence labelling

Example

2 labels R et B

Search space :

(input sequence, {sequence of labels})

For a size 3 sequence

x = x1 x2 x3 :

A node represents a state in the search space

Page 26: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 26

Example:actions and decisions

Actions

Π(s, a)

Π(s, a)

Π(s, a)

Π(s, a)

Sequence of size 3

target :

Π(s, a) : decision function

Page 27: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 27

Example : expected loss

C=1

C=1

C=1

C=1

C=1

C=1

C=1

C=0

C=0

C=0

C=0

C=0

C=0CT=1

CT=2

CT=2

CT=3

CT=0

CT=1

CT=1

CT=2

C=0

Sequence of size 3

target :

Loss does not always

separate !!

Page 28: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 28

Inference

Suppose we have a policy function F which decides at each step which action to take

Inference could be performed by computing ŷ1= F(x,. ), ŷt= F(ŷ1,… ŷt-1),…, ŷT= F(ŷ1,… , ŷT-1) ŷ = F(ŷ1,… , ŷT)

No Dynamic programming needed

Page 29: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 29

Example: state space exploration guided by local costs

C=1

C=1

C=1

C=1

C=1

C=1

C=1

C=0

C=0

C=0

C=0

C=0

C=0CT=1

CT=2

CT=2

CT=3

CT=0

CT=1

CT=1

CT=2

C=0

Sequence of size 3

target :

Goal:

generalize to unseen situation

Page 30: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 30

Training Learn a policy F

From a set of (input, output) pairs Which takes the optimal action at each state Which is able to generalize to unknown

situations

Page 31: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 31

Reinforcement learning for SP Formalizes search based ideas using Markov Decision

Processes as model and Reinforcement Learning for training

Provides a general framework for this approach Many RL algorithm could be used for training

Differences with classical use of RL Specific hypothesis

Deterministic environment Problem size

State : up to 106 variables Generalization properties

Page 32: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 32

Markov Decision Process A MDP is a tuple (S, A, P, R)

S is the State space, A is the action space, P is a transition function describing the

dynamic of the environment P(s, a, s’) = P(st+1 = s’| st = s, at = a)

R is a reward function R(s, a, s’) = E[rt|st+1 = s’, st = s, at = a)

Page 33: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 33

Example: Prediction problem

Page 34: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 34

Example: State space Deterministic MDP

States: input + partial output

Actions: elementary modifications of the partial output

Rewards: quality of prediction

Page 35: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 35

Example: partially observable reward

The reward function is only partially observable : limited set of training (input, output)

Page 36: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 36

Learning the MPDP The reward function cannot be computed on the whole MDP Approximated RL

Learn the policy on a subspace Generalize to the whole space

The policy is learned as a linear function

Φ(s, a) is a joint description of (s, a) θ is a ,parameter vector

Learning algorithm Sarsa, Q-learning, etc

),,(maxarg),( asasActionsa

Page 37: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 37

Inference State at step t

Initial input Partial output

Once the MDP is learned Using the state at step t, compute state t+1 Until a solution is reached

Greedy approach Only the best action is chosen at each time

Page 38: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 38

Structured outputs and MDP State st

Input x + partial output ŷt

Initial state : (x,) Actions

Task dependent POS: new tag for the current word Tree mapping: insert a new path in a partial tree

Inference Greedy approach

Only the best action is chosen at each time Reward

Final: Heuristic:

)ˆ,( yyR )ˆ,( tyyr

Page 39: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 39

Exemple: sequence labeling Left Right model

Actions: label Order free model

Actions: label + position Loss : Hamming cost or F score Tasks

Named entity recognition (Shared task at CoNNL 2002 - 8 000 train, 1500 test)

Chunking – Noun Phrases (CoNNL 2002) Handwriten word recognition (5000 train, 1000 test)

Complexity of inference O(sequence size * number of labels)

Page 40: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 40

Page 41: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 41

Tree mapping: Document mapping Problem

Labeled tree mapping problemDifferent instances

Flat text to XMLHTML to XML

XML to XML….

<Restaurant><Nom>La cantine</Nom><Adresse> 65 rue des pyrénées, Paris, 19ème, FRANCE</Adresse><Spécialités> Canard à l’orange, Lapin au miel</Spécialités></Restaurant>

<Restaurant><Nom>La cantine</Nom><Adresse> <Ville>Paris</Ville> <Arrd>19</Arrd> <Rue>pyrénées</Rue> <Num>65</Num></Adresse><Plat> Canard à l’orange</Plat><Plat> Lapin au miel</Plat></Restaurant>

Page 42: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 42

Document mapping problem Central issue: Complexity

Large collections Large feature space: 103 to 106

Large search space (exponential) Other SP methods fail

Page 43: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 43

XML structuration Action

Attach path ending with the current leaf to a position in the current partial tree

Φ(a,s) encodes a series of potential (state, action) pairs

Loss: F-Score for trees

Page 44: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 44

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

AUTHOR

Francis MAESTitle of the

section

SECTION

TITLE

Welcome toINEX

TEXT

INPUT DOCUMENT

TARGET DOCUMENT

Page 45: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 45

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Page 46: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 46

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Page 47: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 47

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

Page 48: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 48

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

Page 49: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 49

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

Page 50: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 50

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

Francis MAES

AUTHOR

Page 51: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 51

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

Francis MAES

AUTHOR

Page 52: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 54

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

Title of the section

SECTION

TITLE

AUTHOR

Francis MAES

Page 53: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 55

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

AUTHOR

Francis MAESTitle of the

section

SECTION

TITLE

Page 54: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 56

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

AUTHOR

Francis MAESTitle of the

section

SECTION

TITLE

Welcome toINEX

TEXT

Page 55: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 57

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

AUTHOR

Francis MAESTitle of the

section

SECTION

TITLE

Welcome toINEX

TEXT

Page 56: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 58

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

AUTHOR

Francis MAESTitle of the

section

SECTION

TITLE

Welcome toINEX

TEXT

This is afootnote

Page 57: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 59

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

AUTHOR

Francis MAESTitle of the

section

SECTION

TITLE

Welcome toINEX

TEXT

This is afootnote

Page 58: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 60

HTML

HEAD BODY

TITLE IT H1 P FONT

Example Francis MAESTitle of the

sectionWelcome to INEX

This is a footnote

Example

TITLE

DOCUMENT

AUTHOR

Francis MAESTitle of the

section

SECTION

TITLE

Welcome toINEX

TEXT

INPUT DOCUMENT

TARGET DOCUMENT

Page 59: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 61

Results

Page 60: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 62

Conclusion Learn to explore the state space of the problem Alternative to DP or classical search algorithms Could be used with any decomposable cost

function Fewer assumptions than other methods (e.g.

no Markov hypothesis) Leads to solutions that may scale with complex

and large SP problems

Page 61: Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr  University Pierre

2008-07-02 ANNPR 2008 - P. Gallinari 63

More on this

Paper at European Workshop on Reinforcement Learning 2008

Preliminary paper ECML 2007 Library available http://nieme.lip6.fr