Upload
chander-kumar
View
212
Download
0
Tags:
Embed Size (px)
Citation preview
Graphical Models for Segmenting and Labeling Sequence Data
Manoj Kumar Chinnakotla
NLP-AI Seminar
OutlineIntroductionDirected Graphical ModelsHidden Markov Models (HMMs)Maximum Entropy Markov Models (MEMMs)Label Bias ProblemUndirected Graphical ModelsConditional Random Fields (CRFs)Summary
The TaskLabelingGiven sequence data, mark appropriate tags for each data itemSegmentationGiven sequence data, segment into non-overlapping groups such that related entities are in same group
ApplicationsComputational LinguisticsPOS TaggingInformation ExtractionSyntactic DisambiguationComputational BiologyDNA and Protein Sequence AlignmentSequence homologue searchingProtein Secondary Structure Prediction
Example : POS Tagging
Directed Graphical ModelsHidden Markov models (HMMs)Assign a joint probability to paired observation and label sequencesThe parameters trained to maximize the joint likelihood of train examples
Hidden Markov Models (HMMs)Generative Model - Models the joint distribution
Generation ProcessProbabilistic Finite State MachineSet of states Correspond to tagsAlphabet - Set of wordsTransition Probability State Probability
HMMs (Contd..)For a given word/tag sequence pair
Why Hidden?Sequence of tags which generated word sequence not visibleWhy Markov? Based on Markovian Assumption : current tag depends only on previous n tagsSolves the sparsity problemTraining Learning the transition and emission probabilities from data
HMMs Tagging ProcessGiven a string of words w, choose tag sequence t* such that
Computationally expensive - Need to evaluate all possible tag sequences!For n possible tags, m positions Viterbi AlgorithmUsed to find the optimal tag sequence t*Efficient dynamic programming based algorithm
Disadvantages of HMMsNeed to enumerate all possible observation sequencesNot possible to represent multiple interacting featuresDifficult to model long-range dependencies of the observationsVery strict independence assumptions on the observations
Maximum Entropy Markov Models (MEMMs)Conditional Exponential ModelsAssumes observation sequence given (need not model) Trains the model to maximize the conditional likelihood P(Y|X)
MEMMs (Contd..)For a new data sequence x, the label sequence y which maximizes P(y|x,) is assigned ( - parameter set)Arbitrary non-independent features on observation sequence possibleConditional Models known to perform well than GenerativePerforms Per-State NormalizationTotal mass which arrives at a state must be distributed among all possible successor states
Label Bias ProblemBias towards states with fewer outgoing transitionsDue to per-state normalizationAn Example MEMM
Undirected Graphical ModelsRandom Fields
Conditional Random Fields (CRFs)Conditional Exponential Model like MEMMHas all the advantages of MEMMs without label bias problemMEMM uses per-state exponential model for the conditional probabilities of next states given the current stateCRF has a single exponential model for the joint probability of the entire sequence of labels given the observation sequenceAllow some transitions vote more strongly than others depending on the corresponding observations
Definition of CRFs
CRF Distribution FunctionWhere :
V = Set of Label Random Variablesfk and gk = Featuresgk = State Featurefk = Edge Feature
are parameters to be estimatedy|e = Set of Components of y defined by edge ey|v = Set of Components of y defined by vertex v
CRF Training
CRF Training (Contd..)Condition for maximum likelihoodExpected feature count computed using Model equals Empirical feature count from training dataClosed form solution for parameters not possibleIterative algorithms employed - Improve log likelihood in successive iterationsExamplesGeneralized Iterative Scaling (GIS)Improved Iterative Scaling (IIS)
Graphical Comparison HMMs, MEMMs, CRFs
POS Tagging Results
SummaryHMMsDirected, Generative graphical modelsCannot be used to model overlapping features on observationsMEMMsDirected, Conditional ModelsCan model overlapping features on observationsSuffer from label bias problem due to per-state normalizationCRFsUndirected, Conditional ModelsAvoids label bias problemEfficient training possible
Thanks!
AcknowledgementsSome slides in this presentation are from Rongkun Shens (Oregon State Univ) Presentation on CRFs