sented by Jian-Shiun Tzeng 4/9/2009 Chapter 6. Hidden Markov and Maximum Entropy Model Daniel Jurafsky and James H. Martin 2008

Chapter 6. Hidden Markov and Maximum Entropy Model

Download PPTX Report

Upload
sinjin
View
68
Download
3

Embed Size (px)

DESCRIPTION

Chapter 6. Hidden Markov and Maximum Entropy Model. Daniel Jurafsky and James H. Martin 2008. Introduction. Maximum Entropy ( MaxEnt ) More widely known as multinomial logistic regression Begin from non-sequential classifier A probabilistic classifier - PowerPoint PPT Presentation

Citation preview

Presented by Jian-Shiun Tzeng 4/9/2009

Chapter 6. Hidden Markov and Maximum Entropy Model

Daniel Jurafsky and James H. Martin2008

Introduction

• Maximum Entropy (MaxEnt)– More widely known as multinomial logistic regression

• Begin from non-sequential classifier– A probabilistic classifier– Exponential or log-linear classifier– Text classification– Sentiment analysis

• Positive or negative opinion

– Sentence boundary

Page 3: Chapter 6. Hidden Markov and Maximum Entropy Model

Linear Regression

Page 4: Chapter 6. Hidden Markov and Maximum Entropy Model

Linear Regression

• x(j): a particular instance• y(j)

obs: observed label in the training set of x(j)

• y(j)pred: predict value from linear regression model

sum square error

Page 5: Chapter 6. Hidden Markov and Maximum Entropy Model

Logistic Regression – simplest case of binary classification

• Consider whether x is in class (1, true) or not (0, false)

w f‧ (-∞,∞)∈

∈ [0,∞)

∈ (-∞,∞)

∈ [0,1]

Page 6: Chapter 6. Hidden Markov and Maximum Entropy Model

Logistic Regression – simplest case of binary classification

Page 7: Chapter 6. Hidden Markov and Maximum Entropy Model

Logistic Regression – Classification

Page 8: Chapter 6. Hidden Markov and Maximum Entropy Model

Advanced: Learning in logistic regression

Page 9: Chapter 6. Hidden Markov and Maximum Entropy Model

Maximum Entropy Modeling

• Input: x (a word need to tag or a doc need to classify)– Features

• Ends in –ing• Previous word is “the”

– Each feature fi, weight wi

– Particular class c– Z is a normalizing factor, used to make the prob. sum

to 1

Page 10: Chapter 6. Hidden Markov and Maximum Entropy Model

Maximum Entropy Modeling

C = {c1, c2, …, cC}

Normalization

fi: A feature that only takes on the values 0 and 1 is also called an indicator function

In MaxEnt, instead of the notation fi, we will often use the notation fi(c,x), meaning that a feature i for a particular class c for a given observation x

Page 11: Chapter 6. Hidden Markov and Maximum Entropy Model

Maximum Entropy ModelingAssume C = {NN, VB}

Page 12: Chapter 6. Hidden Markov and Maximum Entropy Model

Learning Maximum Entropy Model

Page 13: Chapter 6. Hidden Markov and Maximum Entropy Model

HMM vs. MEMMHMM MEMM

MEMM can condition on any useful feature of the input observation; in HMM this isn’t possible

word

class

Page 14: Chapter 6. Hidden Markov and Maximum Entropy Model

Conditional Random Fields (CRFs)

• CRFs (Lafferty, McCallum, et al. 2001) constitute another conditional model based on maximal entropy

• Like MEMM, CRFs are able to accommodate many possibly correlated features of the observation

• However, CRFs are better able to trade off decisions at different sequence positions

• MEMM were found to suffer from the label bias problem

Page 15: Chapter 6. Hidden Markov and Maximum Entropy Model

Label Bias

• The problem appears when the MEMM contains states with different output degrees

• Because the probabilities of transitions from any given state must sum to 1, transitions from lower degree states receive higher probabilities than transitions from higher degree states

• In the extreme case, transition from a state with degree 1 always gets probability 1, effectively ignoring the observation

• CRFs do not have this problem because they define a single ME-based distribution over the whole label sequence