34
Conditional Random Fields Dietrich Klakow

Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Conditional Random

Fields

Dietrich Klakow

Page 2: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Overview

• Sequence Labeling

• Bayesian Networks

• Markov Random Fields

• Conditional Random Fields

• Software example

Page 3: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Sequence Labeling Tasks

Page 4: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Sequence: a sentence

Pierre

Vinken

,

61

years

old

,

will

join

the

board

as

a

nonexecutive

director

Nov.

29

.

Page 5: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

POS Labels

Pierre

Vinken

,

61

years

old

,

will

join

the

board

as

a

nonexecutive

director

Nov.

29

.

NNP

NNP

,

CD

NNS

JJ

,

MD

VB

DT

NN

IN

DT

JJ

NN

NNP

CD

.

Page 6: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Chunking

Task: find phrase boundaries:

Page 7: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Chunking

Pierre

Vinken

,

61

years

old

,

will

join

the

board

as

a

nonexecutive

director

Nov.

29

.

B-NP

I-NP

O

B-NP

I-NP

B-ADJP

O

B-VP

I-VP

B-NP

I-NP

B-PP

B-NP

I-NP

I-NP

B-NP

I-NP

O

Page 8: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Named Entity Tagging

Pierre

Vinken

,

61

years

old

,

will

join

the

board

as

a

nonexecutive

director

Nov.

29

.

B-PERSON

I-PERSON

O

B-DATE:AGE

I-DATE:AGE

I-DATE:AGE

O

O

O

O

B-ORG_DESC:OTHER

O

O

O

B-PER_DESC

B-DATE:DATE

I-DATE:DATE

O

Page 9: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Supertagging

Pierre

Vinken

,

61

years

old

,

will

join

the

board

as

a

nonexecutive

director

Nov.

29

.

N/N

N

,

N/N

N

(S[adj]\NP)\NP

,

(S[dcl]\NP)/(S[b]\NP)

((S[b]\NP)/PP)/NP

NP[nb]/N

N

PP/NP

NP[nb]/N

N/N

N

((S\NP)\(S\NP))/N[num]

N[num]

.

Page 10: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Hidden Markov Model

Page 11: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

HMM: just an Application of a

Bayes Classifier

[ ])...,,...,(maxarg)ˆ...ˆ,ˆ( 2121..,

21

21

NNN xxxPN

πππππππππ

=

Page 12: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Decomposition of Probabilities

)..,,..,( 2121 NNxxxP πππ

∏=

−=N

i

iiii PxP1

1)|()|( πππ

)|( iixP π

)|( 1−iiP ππ : transition probability

: emission probability

Page 13: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Graphical view HMM

X1 X2 X3 XN…….

π1 π2 π3 πN…….

Observation sequence

Label sequence

Page 14: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Criticism

• HMMs model only limiter dependencies

a come up with more flexible models

a come up with graphical description

Page 15: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Bayesian Networks

Page 16: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Example for Bayesian Network

)()|()|(),|(

),,,(

CPCRPCSPRSWP

WRSCP =

From Russel and Norvig 95AI: A Modern Approach

Corresponding joint

distribution

Page 17: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Naïve Bayes

∏=

D

i

i zxP1

)|(

Observations x1, …. xD are assumed to be independent

Page 18: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Markov Random Fields

Page 19: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

• Undirected graphical model

• New term:

• clique in an undirected graph:

• Set of nodes such that every node is

connected to every other node

• maximal clique: there is no node that can be added without add without destroying the clique property

Page 20: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Example

cliques: green and blue

maximal clique: blue

Page 21: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Factorization

∑ ∏∈

Ψ=x CC

CC

M

xZ )(

∏∈

Ψ=MCC

CC xZ

xp )(1

)(

)0)((function potential:)(

cliques maximal all ofset :C

C cliquein nodes:

... nodes all:

CCCC

M

C

1

≥Ψ Ψ

xx

x

xxx N

Joint distribution described by graph

Normalization

Z is sometimes call the partition function

Page 22: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Example

x1

x2

x5

x3

x4

What are the maximum cliques?

Write down joint probability

described by this graph

a white board

Page 23: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Energy Function

)()( CxE

CC ex−

∑= ∈

MCC

CxE

eZ

xp)(

1)(

Define

Insert into joint distribution

Page 24: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Conditional Random Fields

Page 25: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Definition

Maximum random field

were each random variable yi

is conditioned on the complete input sequence x1, …xn

y1 y3

x

yn-1 yny2 …..

x=(x1…xn)

y=(y1…yn)

Page 26: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Distribution

∑∑= = =

−−n

i

N

j

iijj ixyyf

exZ

xyp 1 1

1 ),,,(

)(

1)|(

λ

trained be to parameters :jλ

models)entropy maximum (see

function feature :),,,( 1 ixyyf iij −

Distribution

Page 27: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Example feature functions

==

=−

else

yand yif i1-i

0

1),,,( 11

NNPINixyyf ii

==

=−

else

xand yif ii

0

1),,,( 12

SeptemberNNPixyyf ii

Modeling transitions

Modeling emissions

Page 28: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Training

• Like in maximum entropy models

Generalized iterative scaling

• Convergence:

p(y|x) is a convex function

a unique maximum

Convergence is slow

Improved algorithms exist

Page 29: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Define additional start symbol y0=START

and stop symbol yn+1=STOP

Define matrix

such that

Decoding: Auxiliary Matrix

)(xMi

[ ]∑

== =

−−

−N

j

iijj

iiii

ixyyfi

yyyy

iexMxM 1

1

11

),,,(

)()(λ

Page 30: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Reformulate Probability

∏+

=−

=1

1

)()(

1)|(

1

n

i

i

yy xMxZ

xypii

With that definition we have

)()....()(...)( 121

121

1 2 3

10xMxMxMxZ

n

yyyy

y y y y

yy nn

n

+

+∑∑∑ ∑=

with

Page 31: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Use Matrix Properties

[ ]STOPySTARTy

n

nxMxMxMxZ ==

+

+=

10 ,

121 )()...()()(

Use matrix product

with

[ ] ∑=1

211020)()()()( 2121

y

yyyyyyxMxMxMxM

Page 32: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Software

Page 33: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

CRF++

• See http://crfpp.sourceforge.net/

Page 34: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP

Summary

• Sequence labeling problems

• CRFs are

• flexible

• Expensive to train

• Fast to decode