Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
CS839:ProbabilisticGraphicalModels
Lecture1:IntroductiontoGraphicalModels
TheoRekatsinas
1
Acknowledgement:adaptedslidesbyEricXing
1.Introduction,admin&setup
2
Section1
WhoamI…
Instructor(me)TheoRekatsinas• FacultyintheComputerSciencesandpartoftheUW-DatabaseGroup• Research:dataintegrationandcleaning,statisticalanalytics,andmachinelearning.• [email protected]• Officehours:Byappointment@CS4361
3
Section1
CourseWebpage:
https://thodrek.github.io/CS839_fall18/
4
Section1
Logistics
• Textbooks:• ProbabilisticGraphicalModels,byDaphneKoller andNir Friedman• IntroductiontoStatisticalRelationalLearning,byLise Getoor andBenTaskar
• Officehours:• Byappointment.Justsendmeanemail
• Homeworksubmission:• WewilluseCanvas
5
Section1
AssignmentsandGradingLogistics
• 3homeworkassignments:20%ofgrade• Theoryexercises,implementationexercises
• Midterm:30%ofgrade• Inclassexam• ~weak#9
• Finalproject:50%ofgrade• Projectproposal:10%ofgrade(~weak#9)• Proposalpresentation:10%ofgrade• Finalreport:30%ofgrade(dueonDec20th)• Ingroupsof(upto)3.Ideallyitshouldbethree.Groupsshouldbeformedinthefirsttwoweeks.
6
Section1
Projectexamples
• ApplyingPGMtothedevelopmentofareal,substantialMLsystem• Buildaweb-scalefakenewsdetector.• Buildastorylinetrackingsystemfornewsmedia.• Designandimplementthestate-of-the-artknowledgebaseemgeddings
• Theoryand/oralgorithmicprojects• Amoreefficientapproximateinferencealgorithm.• Whenisinferenceinthepresenceofnoisyobservationshard?• WhencanweapproximatePGMswithfeed-forwardnetworks?
• System’s• ImplementMarkovlogicontopofPyro
7
Section1
2.Classoverview
8
Section2
Whataregraphicalmodels?
9
Section2
MGraph Model
D ⌘ {X(i)1 , X(i)
2 , . . . , X(i)m }NI=1
Data
PGMsallowustoreasonaboutuncertainty
10
Section2
InformationExtraction`
DataCleaningWeakSupervision
FundamentalQuestions
• Representation• Howtocapture/modeluncertaintiesinpossibleworlds?• Howtoencodeourdomainknowledge/assumptions/constraints?
• Example:IsyourGradeindependentoftheDifficultyoftheclass?
11
Section2
Difficulty Intelligence Grade
FundamentalQuestions
• Inference• Howdoweanswerquestions/queriesaccordingtothemodelinhandandtheavailabledataP(X|Data)
• Example:WhatwillyourGradebeifDifficultyis“high”?
12
Section2
Difficulty Intelligence Grade
FundamentalQuestions
• Learning• Whatmodelis“right”forthedata?
• Example:Whatifwehave(Difficulty=“Low”,Intelligence=“High”,Grade=“High”)forperson1,(Difficulty=“High”,Intelligence=“High”,Grade=“Low”)forperson2,etc?
13
Section2
Difficulty Intelligence Grade
M = arg max
M2MF (D;M)
BasicProbabilityConcepts
• Representation:Whatisthejointprobabilitydistributiononmultiplevariables?
• Howmanystateconfigurationswehaveintotal?• Aretheyallneededtoberepresented?• Whatinsightsdowegetfromthismodel?
14
Section2
P (X1, X2, X3, X4, X5, X6, X7, X8)
BasicProbabilityConcepts
• Learning:Wheredowegetalltheseprobabilities?
• Maximal-likelihoodestimation?Howmanydatadoweneed?• Arethereotherestimationprinciples?• Wheredoweputdomainknowledgeintermsofplausiblerelationshipsbetweenvariables,andplausiblevaluesoftheprobabilities?
15
Section2
BasicProbabilityConcepts
• Inference:Ifnotallvariablesareobservable,howtocomputetheconditionaldistributionoflatentvariablesgivenevidence?
• Assumeisgiven.Computingrequiressummingoverallconfigurationsoftheunobservedvaribles.
16
Section2
X1 P (X2|X1)26
Whatisagraphicalmodel?
17
Section2
AMultivariateDistributioninHigh-DSpace
Example:Apossibleworldforcellularsignaltransduction
18
Section2
Signaltransduction istheprocessbywhichachemicalorphysicalsignalistransmittedthroughacellasaseriesofmolecularevents
Example:Apossibleworldforcellularsignaltransduction
19
Section2
StructureSimplifiesRepresentation
20
Section2
Arrowsindicatedependenciesamongstvariables
ProbabilisticGraphicalModels
21
Section2
• If areconditionallyindependent(asdescribedbyaPGM),thejointcanbefactoredtoaproductofsimplerterms,e.g.,
• So,whyaPGM?Wecanincorporatedomainknowledge• 1+1+2+2+2+4+2+4=18,a16-foldreductionfrom28inrepresentationcost!
Xi’s
P (X1, X2, X3, X4, X5, X6, X7, X8)
=P (X1)P (X2)P (X3|X1)P (X4|X2)P (X5|X2)
P (X6|X3, X4)P (X7|X6)P (X8|X5, X6)
OtherdesiredpropertiesofPGMS
22
Section2
•Modularity– Allowsustointegrateheterogeneousdata
OtherdesiredpropertiesofPGMS
23
Section2
• Priorknowledge– Bayesianlearning
• Capturesuncertaintyinamoreprincipledway– introducepriors
Whatisagraphicalmodel?
24
Section2
Multivariatestatistics+structure
Whatisagraphicalmodel?
25
Section2
• Informal: Itisasmartwaytospecify/compose/designexponentially-largeprobabilitydistributionswithoutpayinganexponentialcost,andatthesametimeendowthedistributionswithstructuredsemantics
Whatisagraphicalmodel?
26
Section2
•Moreformal: Itreferstoafamilyofdistributionsonasetofrandomvariablesthatarecompatiblewithalltheprobabilisticindependencepropositionsencodedbyagraphthatconnectsthesevariables
TypesofPGMs
27
Section2
•Directed: Bayesiannetworks•Directededgesgivecausalityrelationships
P (X1, X2, X3, X4, X5, X6, X7, X8)
=P (X1)P (X2)P (X3|X1)P (X4|X2)P (X5|X2)
P (X6|X3, X4)P (X7|X6)P (X8|X5, X6)
TypesofPGMs
28
Section2
•Undirected:Markovrandomfields•Undirectededgessimplygivecorrelationsbetweenvariables
P (X1, X2, X3, X4, X5, X6, X7, X8)
=
1
Zexp(E(X1) + E(X2) + E(X1, X3) + E(X2, X4) + E(X2, X5))
exp(E(X6, X3, X4) + E(X7, X6) + E(X8, X5, X6))
BayesianNetworks
29
Section2
•Structure:DAGs•Meaning:anodeisconditionallyindependentofeveryothernodeinthenetworkoutsideitsMarkovblanket
TheMarkovblanketofnode includesitsparents,childrenandtheotherparentsofallofitschildren.
BayesianNetworks
30
Section2
• Structure:DAGs
•Meaning:anodeisconditionallyindependentofeveryothernodeinthenetworkoutsideitsMarkovblanket
• Localconditionaldistributions(CPD)andtheDAGcompletelydeterminethejointdistribution.
• Edgesrepresentcausalityrelationships,andfacilitateagenerativeprocess
MarkovRandomFields
31
Section2
• Structure:undirectedgraph
•Meaning:anodeisconditionallyindependentofeveryothernodeinthenetworkgivenitsdirectneighbors
• Localcontingencyfunctions(potentials)andthecliquesinthegraphcompletelydeterminethejointdist.
• Edgesrepresentcorrelationsbetweenvariables,butnoexplicitwaytogeneratesamples
Well-knownmodelsasPGMs
32
Section2
•Densityestimation• Parametricandnon-parametricmethods
•Regression• Linear,conditionalmixture,non-parametric
•Classification• Generativeanddiscriminativeapproaches
•Clustering
Morecomplexmodels
33
Section2
•PartiallyobservedMarkovdecisionprocesses
Morecomplexmodels
34
Section2
• InformationExtraction
[OpenTag,Zhengetal.,KDD2018]
Morecomplexmodels
35
Section2
•Solidstatephysics
ApplicationsofGraphicalModels
36
Section2
• Machinelearning• Computationalstatistics• Computervisionandgraphics• NLP• Informationextraction• Roboticcontrol• Decisionmakingunderuncertainty• Computationalbiology• Medicaldiagnosis/prognosis• Financeandeconomics• Etc.
WhyPGMs?
37
Section2
•Languageforcommunication•Languageforcomputation•Languagefordevelopment
•Doesitremindyouofsomething?
WhyPGMs?
38
Section2
•Probabilitytheory: Formalframeworktocombineheterogeneouspartsandensureconsistency.•Graphstructure: Appealinginterfaceformodelinghighly-interactingsetsofvariables.Interpretabilityanddomainknowledge.•Generalization:ManyclassicalprobabilisticsystemsarespecialcasesofPGMs
PGMsintheDeepLearningera
39
Section2
•ProbabilisticModels: Goalistocapturethejointdistributionofinputvariables,outputvariables,latentvariables,parametersandhyper-parameters.Everythingisarandomvariables.•Deep(Learning)Models: Hierarchicalmodelstructurewheretheoutputofonemodelbecomestheinputofthenexthigherlevelmodel.Targetedtowardsfeaturelearning.
PGMsintheDeepLearningera
40
Section2
DeepLearning PGMsEmpiricalgoal: e.g.,Classification,featurelearning e.g.,transferlearning,latentvariable
inference
Structure: Graphical Graphical
Objective: Aggregatedfromlocalfunctions Aggregatedfromlocalfunctions
Vocabulary: Neuron, activation/gatefunction Variables,potentialfunctions
Algorithm: Singleinference algorithm,BP Manyalgorithms,major focusofopenresearch,approximateinference
Evaluation: Onend-performance On almosteveryintermediatequantity(calibratedprobabilities)
Implementation: ManytricksJ Quite standardized
PGMsintheDeepLearningera
41
Section2
•WhyProbabilisticModels?: Predictionsfromaprobabilisticmodelthatcapturesaprinciplednotionofuncertainty.Decisionmaking.•WhyDeep(Learning)Models: Featurelearning.Noassumptionsforcomplexdomainssuchasimagesandspeech.
CombiningPGMsandDeepLearning
42
Section2
•DeepBoltzmannMachines
UsingPGMstogeneratetrainingdataforDL
43
Section2
•Weaksupervision/Dataprogramming
ClassOverview
44
Section2
• FundamentalsofPGMs:• BayesianNetworksandMarkovRandomFields• Discrete,Continuous,andHybridmodels,exponentialfamily• Basicrepresentation,inference,andlearning• Focusonspecificnetworks:MultivariateGaussianModels,HiddenMarkovModels
• AdvancedTopics:• Approximateinference• Boundedtreewidth• SpectralmethodsforGraphicalmodels• Structurelearning• Relationalrepresentationlearningandconnectionstodeeplearning
• Applications