Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
CS839:ProbabilisticGraphicalModels
Lecture5:MessagePassing/BeliefPropagation
TheoRekatsinas
1
JunctionTree
2
• Acliquetreeforatriangulated graphisreferredtoasajunctiontree
• Injunctiontrees,localconsistencyimpliesglobalconsistency.Thusthelocalmessage-passingalgorithmsis(provably)correct.
• Only triangulatedgraphshavethepropertythattheircliquetreesarejunctiontrees.Thusifwewantlocalalgorithms,wemust triangulate
Howtotriangulate?
3
• Intermediatetermscorrespondtothecliquesresultedfromelimination• VEvsMPoverjunctiontree?
SketchoftheJunctionTreeAlgorithm
4
• Resultsinmarginalprobabilitiesofallcliques--- solvesallqueriesinasinglerun• AgenericexactinferencealgorithmforanyGM• Complexity:exponentialinthesizeofthemaximalclique--- agoodeliminationorderoftenleadstosmallmaximalclique,andhenceagood(i.e.,thin)JT
InferenceinHMMs
5
• Summingwithelimination
• Messagepassingcorrespondstoaforwardandbackwardpass
InferenceinHMMs
6
• AjunctiontreefortheHMM
• Forwardpass
InferenceinHMMs
7
• AjunctiontreefortheHMM
• Backwardpass
CS839:ProbabilisticGraphicalModels
Lecture6:GeneralizedLinearModels(MLE)
TheoRekatsinas
8
ParametersinGraphicalModels
9
• Bayesiannetwork
Howdowefindtheseparameters?
LinearRegressionasaBayesNet
10
• LinearReg:D=((x1,y1),(x2,y2),…,(xn,yn)),
• Assumethatε (errortermofunmoldedeffectsofrandomnoise)isaGaussianrandomvariableN(0,σ2)
• UseLeast-Mean-Squarealgorithmtoestimateparameters.
yi = ✓Txi + ✏i
xi 2 Rd, yi 2 R
p(yi|xi; ✓) =1p2⇡�2
exp
✓� (yi � ✓Txi)
2
2�2
◆
LogisticRegression(sigmoidclassifier)asaGm
11
• ConditionaldistributionisaBernoulli:
• Wecanuseatailoredgradientmethodagainasinlinearregression
• Butseethatp(y|x)belongsintheexponentialfamily anditisageneralizedlinearmodel.
p(y|x) = µ(x))
y(1� µ(x)))
1�y
µ(x)) =
1
1 + exp(�✓
Tx)
Markovrandomfields
12
RestrictedBoltzmannMachines
13
ConditionalRandomFields
14
• Discriminative
• Doesn’tassumethatfeaturesareindependent
• Whenlabelingfeatureobservationsaretakenintoaccount
P✓(Y |X) =
1
Z(✓, X)
exp
X
c
✓cfc(X,Yc)
!
Exponentialfamily:abasicbuildingblock
15
• ForanumericrandomvariableX
isanexponentialfamilydistributionwithnatural(canonical)parameterη
• FunctionT(x)isasufficientstatistic.• FunctionA(η)=logZ(η)isthelognormalizer• Examples:Bernoulli,multinomial,Gaussian,Poisson,Gamma,Categorical
p(x|⌘) = h(x) exp
�⌘
TT (x)�A(⌘)
�=
1
Z(⌘)
h(x) exp(⌘
TT (x))
Example:MultivariateGaussianDistribution
16
• Foracontinuousvectorrandomvariable
• Exponentialfamilyrepresentation
X 2 Rk
Example:MultivariateGaussianDistribution
17
• Forabinaryvectorrandomvariablex~multi(x|π)
Whyexponentialfamily?
18
• Momentgeneratingproperty
WecaneasilycomputemomentsofanyexponentialfamilydistributionbytakingthederivativesofthelognormalizerA(η)
Momentsvscanonicalparameters
19
• Themomentparameters(e.g.,μ) canbederivedfromthenaturalparameters• First=mean• Second– variance• Etc.
• A(η) isconvex
• Hence,wecaninverttherelationshipandinferthecanonicalparametersfromthemomentparameters(1-to-1)• Adistributionintheexp.familycanbeparametrizednotonlybyη butalsobyμ
MLEforExponentialFamily
20
• Foriid datathelog-likelihoodis
• Wetakethederivativesandsetthemtozero
• Weperformmomentmatching
• Wecaninferthecanonicalparametersusing
Sufficiency
21
• Forp(x|θ),T(x)issufficient forθifthereisnoinformationinXregardingθ beyondthatinT(x)• WecanthrowawayXforthepurposeofinferencew.r.t.Θ
• BayesianView
• FrequentistView
• Neyman factorizationtheorem• T(x)issufficientforθ if
Examples
22
• Gaussian:
• Multinomial:
GeneralizedLinearModels
23
• Thegraphicalmodel:• Linearregression• Discriminativelinearclassification
• GeneralizedLinearModel• Theobservedinputxisassumedtoenterintothemodelviaalinearcombinationofitselements.
• Theconditionalmeanμisrepresentedasafunctionf(ξ)ofξ,wherefisknownastheresponsefunction.
• Theobservedoutputyisassumedtobecharacterizedbyanexponentialfamilydistributionwithconditionalmeanμ.
Ep(T ) = µ = f(✓TX)
⇠ = ✓Tx
GeneralizedLinearModels
24
MLEforGLIMswithnaturalresponse
25
• Log-likelihood
• Derivativeoflog-likelihood
• LearningforcanonicalGLIMs• Stochasticgradientascent=leastmeansquares(LMS)
Second-ordermethods
26
• TheHessianmatrix
• XisthedesignmatrixandWiscomputedbycalculatingthe2-ndderivativeofA(ηn)
BacktoLeastSquares
27
• Objectivefunctioninmatrixform
• Tominimizethisobjectivewetakethederivativeandsetittozero
IterativelyReweightedLeastSquares
28
• Newton-RaphsonmethodswithobjectiveJ
• Wehave
• Update
IterativelyReweightedLeastSquares
29
• Newton-RaphsonmethodswithobjectiveJ
• Wehave
• Update
GenericupdateforanyExp FamilyDistribution
Example1:LogisticRegression
30
• ConditionaldistributionisaBernoulli:
• IRLS
p(y|x) = µ(x))
y(1� µ(x)))
1�y
µ(x)) =
1
1 + exp(�⌘(x))
⌘ = ⇠ = ✓
x
@µ
@⌘= µ(1� µ)
W =
2
64µ11� µ1 . . . . . .
.... . .
.... . . . . . µN (1� µN )
3
75
Example2:LinearRegression
31
• ConditionaldistributionisaGaussian:
• IRLS
SimpleGLIMsarethebuildingblocksofcomplexBNs
32
CPDscorrespondtoGLIMs
MLEforgeneralBNs
33
• IfweassumetheparametersforeachCPDaregloballyindependent,andallnodesarefullyobserved,thenthelog-likelihoodfunctiondecomposesintoasumoflocalterms,onepernode
• MLE-basedparameterestimationofGMreducestolocalest.ofeachGLIM.
Summary
34
• Forexponentialfamilydistributions,MLEamountstomomentmatching
• GLIM:• Naturalresponse• IterativelyReweightedLeastSquaresasageneralalgorithm
• GLIMsarebuildingblocksofmostpracticalGMs