15
sented by Jian-Shiun Tzeng 4/9/2009 Chapter 6. Hidden Markov and Maximum Entropy Model Daniel Jurafsky and James H. Martin 2008

Chapter 6. Hidden Markov and Maximum Entropy Model

  • Upload
    sinjin

  • View
    68

  • Download
    3

Embed Size (px)

DESCRIPTION

Chapter 6. Hidden Markov and Maximum Entropy Model. Daniel Jurafsky and James H. Martin 2008. Introduction. Maximum Entropy ( MaxEnt ) More widely known as multinomial logistic regression Begin from non-sequential classifier A probabilistic classifier - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 6. Hidden Markov and Maximum Entropy Model

Presented by Jian-Shiun Tzeng 4/9/2009

Chapter 6. Hidden Markov and Maximum Entropy Model

Daniel Jurafsky and James H. Martin2008

Page 2: Chapter 6. Hidden Markov and Maximum Entropy Model

2

Introduction

• Maximum Entropy (MaxEnt)– More widely known as multinomial logistic regression

• Begin from non-sequential classifier– A probabilistic classifier– Exponential or log-linear classifier– Text classification– Sentiment analysis

• Positive or negative opinion

– Sentence boundary

Page 3: Chapter 6. Hidden Markov and Maximum Entropy Model

3

Linear Regression

Page 4: Chapter 6. Hidden Markov and Maximum Entropy Model

4

Linear Regression

• x(j): a particular instance• y(j)

obs: observed label in the training set of x(j)

• y(j)pred: predict value from linear regression model

sum square error

Page 5: Chapter 6. Hidden Markov and Maximum Entropy Model

5

Logistic Regression – simplest case of binary classification

• Consider whether x is in class (1, true) or not (0, false)

w f‧ (-∞,∞)∈

∈ [0,∞)

∈ (-∞,∞)

∈ [0,1]

Page 6: Chapter 6. Hidden Markov and Maximum Entropy Model

6

Logistic Regression – simplest case of binary classification

Page 7: Chapter 6. Hidden Markov and Maximum Entropy Model

7

Logistic Regression – Classification

Page 8: Chapter 6. Hidden Markov and Maximum Entropy Model

8

Advanced: Learning in logistic regression

Page 9: Chapter 6. Hidden Markov and Maximum Entropy Model

9

Maximum Entropy Modeling

• Input: x (a word need to tag or a doc need to classify)– Features

• Ends in –ing• Previous word is “the”

– Each feature fi, weight wi

– Particular class c– Z is a normalizing factor, used to make the prob. sum

to 1

Page 10: Chapter 6. Hidden Markov and Maximum Entropy Model

10

Maximum Entropy Modeling

C = {c1, c2, …, cC}

Normalization

fi: A feature that only takes on the values 0 and 1 is also called an indicator function

In MaxEnt, instead of the notation fi, we will often use the notation fi(c,x), meaning that a feature i for a particular class c for a given observation x

Page 11: Chapter 6. Hidden Markov and Maximum Entropy Model

11

Maximum Entropy ModelingAssume C = {NN, VB}

Page 12: Chapter 6. Hidden Markov and Maximum Entropy Model

12

Learning Maximum Entropy Model

Page 13: Chapter 6. Hidden Markov and Maximum Entropy Model

13

HMM vs. MEMMHMM MEMM

MEMM can condition on any useful feature of the input observation; in HMM this isn’t possible

word

class

Page 14: Chapter 6. Hidden Markov and Maximum Entropy Model

14

Conditional Random Fields (CRFs)

• CRFs (Lafferty, McCallum, et al. 2001) constitute another conditional model based on maximal entropy

• Like MEMM, CRFs are able to accommodate many possibly correlated features of the observation

• However, CRFs are better able to trade off decisions at different sequence positions

• MEMM were found to suffer from the label bias problem

Page 15: Chapter 6. Hidden Markov and Maximum Entropy Model

15

Label Bias

• The problem appears when the MEMM contains states with different output degrees

• Because the probabilities of transitions from any given state must sum to 1, transitions from lower degree states receive higher probabilities than transitions from higher degree states

• In the extreme case, transition from a state with degree 1 always gets probability 1, effectively ignoring the observation

• CRFs do not have this problem because they define a single ME-based distribution over the whole label sequence