23
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar

java_lect_27.ppt

Embed Size (px)

Citation preview

  • Graphical Models for Segmenting and Labeling Sequence Data

    Manoj Kumar Chinnakotla

    NLP-AI Seminar

  • OutlineIntroductionDirected Graphical ModelsHidden Markov Models (HMMs)Maximum Entropy Markov Models (MEMMs)Label Bias ProblemUndirected Graphical ModelsConditional Random Fields (CRFs)Summary

  • The TaskLabelingGiven sequence data, mark appropriate tags for each data itemSegmentationGiven sequence data, segment into non-overlapping groups such that related entities are in same group

  • ApplicationsComputational LinguisticsPOS TaggingInformation ExtractionSyntactic DisambiguationComputational BiologyDNA and Protein Sequence AlignmentSequence homologue searchingProtein Secondary Structure Prediction

  • Example : POS Tagging

  • Directed Graphical ModelsHidden Markov models (HMMs)Assign a joint probability to paired observation and label sequencesThe parameters trained to maximize the joint likelihood of train examples

  • Hidden Markov Models (HMMs)Generative Model - Models the joint distribution

    Generation ProcessProbabilistic Finite State MachineSet of states Correspond to tagsAlphabet - Set of wordsTransition Probability State Probability

  • HMMs (Contd..)For a given word/tag sequence pair

    Why Hidden?Sequence of tags which generated word sequence not visibleWhy Markov? Based on Markovian Assumption : current tag depends only on previous n tagsSolves the sparsity problemTraining Learning the transition and emission probabilities from data

  • HMMs Tagging ProcessGiven a string of words w, choose tag sequence t* such that

    Computationally expensive - Need to evaluate all possible tag sequences!For n possible tags, m positions Viterbi AlgorithmUsed to find the optimal tag sequence t*Efficient dynamic programming based algorithm

  • Disadvantages of HMMsNeed to enumerate all possible observation sequencesNot possible to represent multiple interacting featuresDifficult to model long-range dependencies of the observationsVery strict independence assumptions on the observations

  • Maximum Entropy Markov Models (MEMMs)Conditional Exponential ModelsAssumes observation sequence given (need not model) Trains the model to maximize the conditional likelihood P(Y|X)

  • MEMMs (Contd..)For a new data sequence x, the label sequence y which maximizes P(y|x,) is assigned ( - parameter set)Arbitrary non-independent features on observation sequence possibleConditional Models known to perform well than GenerativePerforms Per-State NormalizationTotal mass which arrives at a state must be distributed among all possible successor states

  • Label Bias ProblemBias towards states with fewer outgoing transitionsDue to per-state normalizationAn Example MEMM

  • Undirected Graphical ModelsRandom Fields

  • Conditional Random Fields (CRFs)Conditional Exponential Model like MEMMHas all the advantages of MEMMs without label bias problemMEMM uses per-state exponential model for the conditional probabilities of next states given the current stateCRF has a single exponential model for the joint probability of the entire sequence of labels given the observation sequenceAllow some transitions vote more strongly than others depending on the corresponding observations

  • Definition of CRFs

  • CRF Distribution FunctionWhere :

    V = Set of Label Random Variablesfk and gk = Featuresgk = State Featurefk = Edge Feature

    are parameters to be estimatedy|e = Set of Components of y defined by edge ey|v = Set of Components of y defined by vertex v

  • CRF Training

  • CRF Training (Contd..)Condition for maximum likelihoodExpected feature count computed using Model equals Empirical feature count from training dataClosed form solution for parameters not possibleIterative algorithms employed - Improve log likelihood in successive iterationsExamplesGeneralized Iterative Scaling (GIS)Improved Iterative Scaling (IIS)

  • Graphical Comparison HMMs, MEMMs, CRFs

  • POS Tagging Results

  • SummaryHMMsDirected, Generative graphical modelsCannot be used to model overlapping features on observationsMEMMsDirected, Conditional ModelsCan model overlapping features on observationsSuffer from label bias problem due to per-state normalizationCRFsUndirected, Conditional ModelsAvoids label bias problemEfficient training possible

  • Thanks!

    AcknowledgementsSome slides in this presentation are from Rongkun Shens (Oregon State Univ) Presentation on CRFs