java_lect_27.ppt

Graphical Models for Segmenting and Labeling Sequence Data

Manoj Kumar Chinnakotla

NLP-AI Seminar

OutlineIntroductionDirected Graphical ModelsHidden Markov Models (HMMs)Maximum Entropy Markov Models (MEMMs)Label Bias ProblemUndirected Graphical ModelsConditional Random Fields (CRFs)Summary

The TaskLabelingGiven sequence data, mark appropriate tags for each data itemSegmentationGiven sequence data, segment into non-overlapping groups such that related entities are in same group

ApplicationsComputational LinguisticsPOS TaggingInformation ExtractionSyntactic DisambiguationComputational BiologyDNA and Protein Sequence AlignmentSequence homologue searchingProtein Secondary Structure Prediction

Example : POS Tagging

Directed Graphical ModelsHidden Markov models (HMMs)Assign a joint probability to paired observation and label sequencesThe parameters trained to maximize the joint likelihood of train examples

Hidden Markov Models (HMMs)Generative Model - Models the joint distribution

Generation ProcessProbabilistic Finite State MachineSet of states Correspond to tagsAlphabet - Set of wordsTransition Probability State Probability

HMMs (Contd..)For a given word/tag sequence pair

Why Hidden?Sequence of tags which generated word sequence not visibleWhy Markov? Based on Markovian Assumption : current tag depends only on previous n tagsSolves the sparsity problemTraining Learning the transition and emission probabilities from data

HMMs Tagging ProcessGiven a string of words w, choose tag sequence t* such that

Computationally expensive - Need to evaluate all possible tag sequences!For n possible tags, m positions Viterbi AlgorithmUsed to find the optimal tag sequence t*Efficient dynamic programming based algorithm

Disadvantages of HMMsNeed to enumerate all possible observation sequencesNot possible to represent multiple interacting featuresDifficult to model long-range dependencies of the observationsVery strict independence assumptions on the observations

Maximum Entropy Markov Models (MEMMs)Conditional Exponential ModelsAssumes observation sequence given (need not model) Trains the model to maximize the conditional likelihood P(Y|X)

MEMMs (Contd..)For a new data sequence x, the label sequence y which maximizes P(y|x,) is assigned ( - parameter set)Arbitrary non-independent features on observation sequence possibleConditional Models known to perform well than GenerativePerforms Per-State NormalizationTotal mass which arrives at a state must be distributed among all possible successor states

Label Bias ProblemBias towards states with fewer outgoing transitionsDue to per-state normalizationAn Example MEMM

Undirected Graphical ModelsRandom Fields

Conditional Random Fields (CRFs)Conditional Exponential Model like MEMMHas all the advantages of MEMMs without label bias problemMEMM uses per-state exponential model for the conditional probabilities of next states given the current stateCRF has a single exponential model for the joint probability of the entire sequence of labels given the observation sequenceAllow some transitions vote more strongly than others depending on the corresponding observations

Definition of CRFs

CRF Distribution FunctionWhere :

V = Set of Label Random Variablesfk and gk = Featuresgk = State Featurefk = Edge Feature

are parameters to be estimatedy|e = Set of Components of y defined by edge ey|v = Set of Components of y defined by vertex v

CRF Training

CRF Training (Contd..)Condition for maximum likelihoodExpected feature count computed using Model equals Empirical feature count from training dataClosed form solution for parameters not possibleIterative algorithms employed - Improve log likelihood in successive iterationsExamplesGeneralized Iterative Scaling (GIS)Improved Iterative Scaling (IIS)

Graphical Comparison HMMs, MEMMs, CRFs

POS Tagging Results

SummaryHMMsDirected, Generative graphical modelsCannot be used to model overlapping features on observationsMEMMsDirected, Conditional ModelsCan model overlapping features on observationsSuffer from label bias problem due to per-state normalizationCRFsUndirected, Conditional ModelsAvoids label bias problemEfficient training possible

Thanks!

AcknowledgementsSome slides in this presentation are from Rongkun Shens (Oregon State Univ) Presentation on CRFs

Documents

java_lect_27.ppt