Upload
dortha-jordan
View
213
Download
1
Embed Size (px)
Citation preview
Some Probability Theory and Computational models
A short overview
Basic Probability Theory
• We will only use discrete probability spaces over boolean events
• A Probability distribution maps a set of events to [0,1]– P(A) is the probability that A is true– The fraction of “worlds” in which A holds• “Possible worlds” interpretation
Axioms
If A and B are disjoint then
Conditional Probability and Independence
• is the fraction of worlds in which B is true, that also have A true
=
• Chain rule:
• If then A and B are independent– Implies that also – And that
• Conditional independence:
Bayes Rule
Example
• Consider two “language models” of French and English
• Assume that the probability of observing a word w is– 0.01 in English text– 0.05 in French text
• Assume the number of english and french texts are roughly equal
• What is the probability that w is in french?
Some Computational Models
• Finite State Machines
• Context Free Grammars
• Probabilistic Variants
Finite State Machines
• States and transitions
• Symbols on transitions
• Acceptors vs. generators
Markov Chains
• Finite State Machines with transitions governed by probabilistic events– In conjunction with / instead of external input
• Markovian property: Every transition is independent of the past, given the present state– Probability of following a path is the multiplication
of probabilities of individual transitions
Context Free Grammars
• Context Free Grammars are a more natural model for Natural Language
• Syntax rules are very easy to formulate using CFGs
• Provably more expressive than Finite State Machines– E.g. Can check for balanced parentheses
Context Free Grammars
• Non-terminals
• Terminals
• Production rules– V → w where V is a non-terminal and w is a
sequence of terminals and non-terminals
Context Free Grammars
• Can be used as acceptors
• Can be used as a generative model
• Similarly to the case of Finite State Machines
• How long can a string generated by a CFG be?
Stochastic Context Free Grammar
• Non-terminals
• Terminals
• Production rules associated with probability– V → w where V is a non-terminal and w is a
sequence of terminals and non-terminals– Markovian property is typically assumed
Chomsky Normal Form
• Every rule is of the form• V → V1V2 where V,V1,V2 are non-terminals • V → t where V is a non-terminal and t is a terminal
Every (S)CFG can be written in this form• Makes designing many algorithms easier