Upload
amidst-toolbox
View
30
Download
0
Embed Size (px)
Citation preview
1BNAIC 2015 November 5-6, 2015
AMIDST Toolbox A Java library for Analysis of MassIve Data Streams using
Probabilistic Graphical Models
FP7 European research projecthttp://amidst.eu
AndersL.Madsen,AndresR.Masegosa, AnaM.Martinez,Hanen Borchani,ThomasD.Nielsen,Helge Langseth,Antonio
Salmeron, DarioRamos-Lopez.
Outline1. OverviewofAMIDSTToolbox
o Whydatastreamsareimportant?o WhyPGMs foranalyzingdatastreams?o ScalableInference(andlearning)o Roadmapforcomingreleases
2. LiveDemo:Modelingconceptdriftinfinancialdata.o Handlingdatastreams.o DefiningBayesiannetworkswithhidden variables.o InferenceandLearningBayesiannetworks.
BNAIC 2015 November 5-6, 2015
ScopePartI
Data Streams everywhere
• Unboundedflowsofdataaregenerateddaily:• SocialNetworks• NetworkMonitoring• Financial/Bankingindustry• ….
BNAIC 2015 November 5-6, 2015
Data Stream Processing
• Processingdatastreamsischallenging:– Donotfitinmainmemory– ContinuousModelupdating– ContinuousModelInference– ConceptDrift
BNAIC 2015 November 5-6, 2015
Processing Massive Data Streams
• Everythinghastoscale:• ScalableComputinginfrastructure• ScalableModels/Inference/Learning
BNAIC 2015 November 5-6, 2015
AMIDST Toolbox
• Scalableframeworkfordatastreamprocessing.• BasedonProbabilisticGraphicalModels.• UniqueprojectfordatastreamminingusingPGMs.• Opensourceproject(ApacheSoftwareLicense2.0).
BNAIC 2015 November 5-6, 2015
AMIDST EU Project
8
§ Thistoolboxaimstodealwithreal,complexandmassivedatastreams.§ Appliedtorealuse-casesofAMIDST’sindustrialpartners.
BNAIC 2015 November 5-6, 2015
Toolbox Web Page
http://amidst.github.io/toolbox/
BNAIC 2015 November 5-6, 2015
WhyPGMsfordatastreamprocessing?
PartII
Why Graphical Models?
§ Let’slookatthefollowingsimpleexample:§ Streamofsensormeasurementsabouttemperature andsmoke presenceinagivengeographicalarea.
§ Monitorthestreamtodetectthepresenceofafire (eventdetectionproblem)
?BNAIC 2015 November 5-6, 2015
§ Casttheproblemasananomalydetectionproblem(outliers).§ StreamingK-Means(widelyusedinindustry).
Why Graphical Models?
Anomaly
BNAIC 2015 November 5-6, 2015
Why Graphical Models for analyzing Data Streams?§ Manydatastreamsmodelsareblackboxmodels:
§ Pros:§ Noneedtounderstandtheproblem.
§ Cons:§ Manyhyper-parameterstotune.§ Blackbox modelscanrarelyexplainwhattheylearned.
Stream
Blackbox Model
Predictions
BNAIC 2015 November 5-6, 2015
§ BayesianNetworks:§ Openboxmodels§ Encodepriorknowledge.§ Continuousanddiscretevariables(CLGnetworks).§ Example:
Why Graphical Models?
Fire
Temp Smoke
T1 T2 T3 S1
p(Fire=true|t1,t2,t3,s1)
BNAIC 2015 November 5-6, 2015
Why Graphical Models?
Stream Predictions
Openbox Models
BNAIC 2015 November 5-6, 2015
Why Graphical Models?
Stream Predictions
Openbox Models
Blackbox InferenceEngine(multi-coreparallelization)
BNAIC 2015 November 5-6, 2015
InferenceEnginePartIII
Inference Engine
§ Queryingthemodel§ p(Fire=true|t1,t2,t3,s1,season)§ E(Temperature|smoke=true).
BNAIC 2015 November 5-6, 2015
Inference Engine
§ Queryingthemodel§ p(Fire=true|t1,t2,t3,s1,season)§ E(Temperature|smoke=true)
§ Learningfromdata(usingaBayesianapproach):§ Bayesianframeworknaturallydealswithdatastreams.§ Priorisupdatedinthelightofnewdata.
p(✓|d1, . . . , dn, dn+1) / p(dn+1|✓)p(✓|d1, . . . , dn)
BNAIC 2015 November 5-6, 2015
Querying the model
§ ParallelMonteCarloInference[Salmeron etal.CAEPIA2015]
§ ExploitMulti-Core(poweredbyJava8)
BNAIC 2015 November 5-6, 2015
Querying the model
§ ParallelMonteCarloInference[Salmeron etal.CAEPIA2015]
§ ExploitMulti-Core(poweredbyJava8)
§ VariationalMessagePassing[Winnetal.JMLR2004]§ Deterministicapproximation
BNAIC 2015 November 5-6, 2015
Learning from data streams
§ Bayesianapproach:§ Learningasaninferenceproblem.§ PoweredbyVMP.
✓
Z
x
i = 1 . . . N
↵
BNAIC 2015 November 5-6, 2015
Learning from data streams
§ Bayesianapproach:§ Learningasaninferenceproblem.§ PoweredbyVMP.§ Plateaunotation!!
BNAIC 2015 November 5-6, 2015
Learning from data streams
§ ParallelStreamingVariationalBayes [Brodericketal.NIPS2013]
§ PoweredbyVariationalMessagePassing.§ Multi-coreprocessing(usingJava8).
BNAIC 2015 November 5-6, 2015
Links to other open software
§ MoaLink§ MOAisastate-of-the-arttoolfordatastreammining.§ UsingAMIDSTmodelswithinMOAGUI!
§ Greatforevaluation&comparison.
BNAIC 2015 November 5-6, 2015
Links to other open software
§ HuginLink§ Hugin isacommercialsoftwareforPGMsandinfluencediagrams.§ Modelsconversion.§ Hugin inferenceenginecanbeusedwithinAMIDST.
26BNAIC 2015 November 5-6, 2015
RoadMapPartIII
Dynamic Bayesian Networks(release 1.1)
§ Encodetemporalknowledge§ Naturallyfitswithdatastreams
Fire(t)
Temp(t) Smoke(t)
T1(t) T2(t) T3(t) S1(t)
Fire(t-1)
Temp(t-1)
BNAIC 2015 November 5-6, 2015
Distributed Stream Processing(release 1.1)
§ RLink§ InvokeAMIDSTInferenceenginewithinR.§ Preliminaryfunctionalityrecentlypresented.
29BNAIC 2015 November 5-6, 2015
Distributed Stream Processing(release 2.0)
§ FlinkLink§ ApacheFlink:Opensourceplatformfordistributedstreamprocessing.§ HandlingMassiveDataStreams.
30BNAIC 2015 November 5-6, 2015
Open Source project
§ We’reopentoyourcontributions!!;)
31BNAIC 2015 November 5-6, 2015
Hosted on Github
§ Download::>git clonehttps://github.com/amidst/toolbox.git
§ Compile::>./compile.sh
§ Run::>./run.sh <class-name>
BNAIC 2015 November 5-6, 2015
Please “star” our project!(if you like it)
33BNAIC 2015 November 5-6, 2015
Any questions before the live demo ?
34
LiveDemoTrackingconceptdriftin
FinancialdatawithAMIDST
Borchani etal.ModelingConceptDrift:AProbabilisticGraphicalModelBasedApproach.IDA2015.
Demo Code Available in Github
36
eu.amidst.bnaic2015.examples.BCC
BNAIC 2015 November 5-6, 2015
Financial Data
§ ProvidedbyBCC(spanish regionalbank).
§ Consistofmonthlyaggregatedinformation§ Activeclientsbetween18and65yearsold.§ DatabetweenApril2007andMarch2014.§ 11variables
§ Income,totalcredit,expenses,etc.
§ Eachclientisclassifiedas:§ defaulter/non-defaulterinfollowing12months.
37BNAIC 2015 November 5-6, 2015
Financial Data
§ Hypothesis:§ Doesspanish financialcrisisimpactonbankcustomers?§ Lookattheevolutionofregionalunemploymentrate.
38BNAIC 2015 November 5-6, 2015
Data Preprocessing/Visualization
§ Visualizetheevolutionofthemonthlyaggregateddata:§ Datadoesnotfitinmainmemory!
39BNAIC 2015 November 5-6, 2015
Model Building
§ WeuseasimpleNaïveBayesmodel:§ Withaglobalhiddenvariabletotrackconceptdrift.
40
D
A1 A2 A11…
H
BNAIC 2015 November 5-6, 2015
Model Building
§ WealsousePlateaunotation§ “H”isdesignedtocaptureconceptdrift
41
D
A1 A2 A11…
HtHt-1
i=1…M
✓
BNAIC 2015 November 5-6, 2015
Tracking concept drift
42BNAIC 2015 November 5-6, 2015
Tracking concept drift
43BNAIC 2015 November 5-6, 2015
References
§ Masegosaetal.AMIDST:AnalysisofMassiveDataStreamsusingProbabilisticGraphicalModels.Submitted toJMLR.2015.
§ Borchani etal.ModelingConceptDrift:AProbabilisticGraphicalModelBasedApproach.IDA2015.
§ Masegosaetal.Probabilisticgraphicalmodelsonmulti-coreCPUsusingJava8.Submitted toIEEEComputational IntelligenceMagazine,SpecialIssueonComputational IntelligenceSoftware.2015.
§ Salmeron etal.ParallelimportancesamplinginconditionallinearGaussiannetworks.InProceedingsof theConferencia delaAsociacion Españolapara laInteligencia Artificial, volumeinpress,2015.
§ Winnetal. Variationalmessagepassing.JournalofMachineLearningResearch,6:661–694,2005.
§ Brodericketal.Streamingvariational Bayes.InAdvancesinNeuralInformationProcessingSystems,pages1727–1735,2013.
44BNAIC 2015 November 5-6, 2015
Any questions ?
45
http://amidst.github.io/toolbox/BNAIC 2015 November 5-6, 2015
Open Source project
§ We’reopentoyourcontributions!!;)
46BNAIC 2015 November 5-6, 2015