P.Smyth:NetworksMURIMee5ng,June3,2011
1
Scalable Methods for the Analysis of Network‐Based Data
MURI Mee=ng, June 3rd 2011, UC Irvine
Principal Inves=gator: Professor Padhraic Smyth Department of Computer Science University of California, Irvine
Addi5onalprojectinforma5ononlineatwww.datalab.uci.edu/muri
P.Smyth:NetworksMURIMee5ng,June3,2011
2
Today’sMee5ng
• Goals– Reviewourresearchprogress– Discussion,ques5ons,interac5on– Feedbackfromvisitors
• Format– Introduc5on– Researchtalks
• Regular:20minutes+5minsatendforques5ons/discussion• Short:10minutes(sessionaVerlunch)
– Twoopendiscussionsessions,ledbyfaculty– Ques5on/discussionencouragedduringtalks– Severalbreaksfordiscussion
Butts
P.Smyth:NetworksMURIMee5ng,June3,2011
3
MURIProjectTimeline
• Ini5al3‐yearperiod– May12008toApril30th2011
– Fundingactuallyarrivedtouniversi5esinOct2008
• 2‐yearextension:– May12011toApril30th2013
• Mee5ngs(allatUCIrvine)– KickoffMee5ng,November2008– WorkingMee5ngs,April2009,August2009– AnnualReview,December2009– WorkingMee5ng,May2010
– AnnualReview,November2010– Today,June2011
P.Smyth:NetworksMURIMee5ng,June3,2011
4
Mo5va5on
2007:interdisciplinaryinterestinanalysisoflargenetworkdatasets
Manyoftheavailabletechniquesweredescrip5ve,couldnothandle
‐ Predic5on
‐ Missingdata
‐ Covariates,etc
P.Smyth:NetworksMURIMee5ng,June3,2011
5
Mo5va5on
2007:interdisciplinaryinterestinanalysisoflargenetworkdatasets
Manyoftheavailabletechniquesweredescrip5ve,couldnothandle
‐ Predic5on
‐ Missingdata
‐ Covariates,etc
2007:significantsta5s5calbodyoftheoryavailableonnetworkmodeling
Manyoftheavailabletechniquesdidnotscaleuptolargedatasets,notwidelyknown/understood/used,etc
P.Smyth:NetworksMURIMee5ng,June3,2011
6
Mo5va5on
2007:interdisciplinaryinterestinanalysisoflargenetworkdatasets
Manyoftheavailabletechniquesweredescrip5ve,couldnothandle
‐ Predic5on
‐ Missingdata
‐ Covariates,etc
2007:significantsta5s5calbodyoftheoryavailableonnetworkmodeling
Manyoftheavailabletechniquesdidnotscaleuptolargedatasets,notwidelyknown/understood/used,etc
Goal of this MURI project
Develop new sta=s=cal network models and algorithms to broaden their scope of applica=on to large, complex, dynamic
real‐world network data sets
P.Smyth:NetworksMURIMee5ng,June3,2011
7
MURITeam
Inves=gator University Department Exper=se Number Of PhD Students
Number of Postdocs
PadhraicSmyth(PI) UCIrvine ComputerScience Machinelearning 4
CarterBugs UCIrvine Sociology Sta5s5calsocialnetworkanalysis
6
MarkHandcock UCLA Sta5s5cs Sta5s5calsocialnetworkanalysis
1 1
DaveHunter PennState Sta5s5cs Computa5onalsta5s5cs
2 1
DavidEppstein UCIrvine ComputerScience Graphalgorithms 2
MichaelGoodrich UCIrvine ComputerScience Algorithmsanddatastructures
1 1
DaveMount UMaryland ComputerScience Algorithmsanddatastructures
2
TOTALS 18 3
P.Smyth:NetworksMURIMee5ng,June3,2011
8Collabora5onNetwork
Padhraic Smyth Dave
Hunter
Mark Handcock
Dave Mount
Mike Goodrich
David Eppstein Carter
Butts
P.Smyth:NetworksMURIMee5ng,June3,2011
9
Emma Spiro
Lorien Jasny
Zack Almquist
Chris Marcum
Sean Fitzhugh
Nicole Pierski
Collabora5onNetwork
Padhraic Smyth Dave
Hunter
Mark Handcock
Dave Mount
Mike Goodrich
David Eppstein Carter
Butts
Chris DuBois
Minkyoung Cho
Eunhui Park
Miruna Petrescu-Prahova
Arthur Asuncion Jimmy
Foulds
Duy Vu Ruth Hummel
Michael Schweinberger
Ranran Wang
Nick Navaroli
Darren Strash
Lowell Trott
Maarten Loffler
Joe Simon
P.Smyth:NetworksMURIMee5ng,June3,2011
10
Emma Spiro
Lorien Jasny
Zack Almquist
Chris Marcum
Sean Fitzhugh
Nicole Pierski
Collabora5onNetwork
Padhraic Smyth Dave
Hunter
Mark Handcock
Dave Mount
Mike Goodrich
David Eppstein Carter
Butts
Chris DuBois
Minkyoung Cho
Eunhui Park
Miruna Petrescu-Prahova
Arthur Asuncion Jimmy
Foulds
Duy Vu Ruth Hummel
Michael Schweinberger
Ranran Wang
Nick Navaroli
Darren Strash
Lowell Trott
Maarten Loffler
Joe Simon
P.Smyth:NetworksMURIMee5ng,June3,2011
11
Emma Spiro
Lorien Jasny
Zack Almquist
Chris Marcum
Sean Fitzhugh
Nicole Pierski
Collabora5onNetwork
Padhraic Smyth Dave
Hunter
Mark Handcock
Dave Mount
Mike Goodrich
David Eppstein Carter
Butts
Chris DuBois
Romain Thibaux
Minkyoung Cho
Eunhui Park
Miruna Petrescu-Prahova
Arthur Asuncion Jimmy
Foulds
Duy Vu Ruth Hummel
Michael Schweinberger
Ranran Wang
Nick Navaroli
Ryan Acton
Krista Gile
Computational Social Science Initiative U Mass Amherst
INTEL
Darren Strash
Lowell Trott
Maarten Loffler
Joe Simon
RAND
P.Smyth:NetworksMURIMee5ng,June3,2011
12
Example:NetworkDynamicsinClassroomsNicole Pierski, Chris DuBois, Carter Butts
P.Smyth:NetworksMURIMee5ng,June3,2011
13
Data: Count matrix of 200,000 email messages among 3000 individuals over 3 months
Problem: Understand communication patterns and predict future communication activity
Challenges: sparse data, missing data, non-stationarity, unseen covariates
P.Smyth:NetworksMURIMee5ng,June3,2011
14
KeyScien5fic/TechnicalChallenges
• Parametrizemodelsinasensibleandcomputableway– Respecttheoriesofsocialbehavioraswellasexplainobserveddata
• Accountforrealdata– E.g.,understandsamplingmethods:accountformissing,error‐pronedata
• Makeinferencebothprincipledandprac5cal– computa5onally‐scalability:wantaccurateconclusions,butcan’twaitforeverfor
results
• Dealwithrichanddynamicdata– Real‐worldproblemsinvolvesystemswithcomplexcovariates(text,geography,
etc)thatchangeover5me
Insum:sta5s5callyprincipledmethodsthatrespectthereali5esofdataandcomputa5onalconstraints
P.Smyth:NetworksMURIMee5ng,June3,2011
15
DomainTheory DataCollec5on
Sta5s5calModels
MappingtheProjectTerrain
P.Smyth:NetworksMURIMee5ng,June3,2011
16
DomainTheory DataCollec5on
Sta5s5calModels Sta5s5calTheory
MappingtheProjectTerrain
P.Smyth:NetworksMURIMee5ng,June3,2011
17
DataStructuresandAlgorithms
DomainTheory DataCollec5on
Sta5s5calModels Sta5s5calTheory
Es5ma5onAlgorithms
MappingtheProjectTerrain
P.Smyth:NetworksMURIMee5ng,June3,2011
18
DataStructuresandAlgorithms
DomainTheory DataCollec5on
Sta5s5calModels Sta5s5calTheory
Es5ma5onAlgorithms
Inference
HypothesisTes5ng
Predic5on/Forecas5ng
DecisionSupport
Simula5on
MappingtheProjectTerrain
P.Smyth:NetworksMURIMee5ng,June3,2011
19
Topic State of the Art Prior to Project
State of the Art Now
Poten=al Applica=ons And Impact
Generaltheoryforhandling
missingdatainsocialnetworks
Problemonlypar5allyunderstood.
NosoVwareavailableforsta5s5calmodeling
Generalsta5s5caltheoryfortrea5ngmissingdatainasocialnetworkcontext.
Publicly‐availablecodeinR.(GileandHandcock,2010)
Allowsapplica5onofsocialnetworkmodelingtodatasetswithsignificantmissing
data
Rela5onaleventmodels
Basicdyadiceventmodels.
Noexogenousevents.NopublicsoVware.
Muchrichermodelwithexogenousevents,egocentricsupport,mul5pleobserver
accounts,hierarchiesSoVwarepubliclyavailable
(Bugsetal,2010)
Providesageneralframeworkfordynamic
networkmodelingtolargerealis5capplica5ons
Cliquefindingalgorithms
Tooslowforuseinsta5s5calnetwork
modeling
Newlinear‐5mealgorithmforlis5ngallmaximalcliquesin
sparsegraphs(Eppstein,Loffler,Strash,
2010)
Extendsapplicabilityofsta5s5calnetworkmodelingtolargernetworksandmore
complexmodels
Accomplishments
P.Smyth:NetworksMURIMee5ng,June3,2011
20
Topic State of the Art Prior to Project
State of the Art Now
Poten=al Applica=ons And Impact
Generaltheoryforhandling
missingdatainsocialnetworks
Problemonlypar5allyunderstood.
NosoVwareavailableforsta5s5calmodeling
Generalsta5s5caltheoryfortrea5ngmissingdatainasocialnetworkcontext.
Publicly‐availablecodeinR.(GileandHandcock,2010)
Allowsapplica5onofsocialnetworkmodelingtodatasetswithsignificantmissing
data
Rela5onaleventmodels
Basicdyadiceventmodels.
Noexogenousevents.NopublicsoVware.
Muchrichermodelwithexogenousevents,egocentricsupport,mul5pleobserver
accounts,hierarchiesSoVwarepubliclyavailable
(Bugsetal,2010)
Providesageneralframeworkfordynamic
networkmodelingtolargerealis5capplica5ons
Cliquefindingalgorithms
Tooslowforuseinsta5s5calnetwork
modeling
Newlinear‐5mealgorithmforlis5ngallmaximalcliquesin
sparsegraphs(Eppstein,Loffler,Strash,
2010)
Extendsapplicabilityofsta5s5calnetworkmodelingtolargernetworksandmore
complexmodels
Accomplishments
P.Smyth:NetworksMURIMee5ng,June3,2011
21
Topic State of the Art Prior to Project
State of the Art Now
Poten=al Applica=ons And Impact
Generaltheoryforhandling
missingdatainsocialnetworks
Problemonlypar5allyunderstood.
NosoVwareavailableforsta5s5calmodeling
Generalsta5s5caltheoryfortrea5ngmissingdatainasocialnetworkcontext.
Publicly‐availablecodeinR.(GileandHandcock,2010)
Allowsapplica5onofsocialnetworkmodelingtodatasetswithsignificantmissing
data
Rela5onaleventmodels
Basicdyadiceventmodels.
Noexogenousevents.NopublicsoVware.
Muchrichermodelwithexogenousevents,egocentricsupport,mul5pleobserver
accounts,hierarchiesSoVwarepubliclyavailable
(Bugsetal,2010)
Providesageneralframeworkfordynamic
networkmodelingtolargerealis5capplica5ons
Cliquefindingalgorithms
Tooslowforuseinsta5s5calnetwork
modeling
Newlinear‐5mealgorithmforlis5ngallmaximalcliquesin
sparsegraphs(Eppstein,Loffler,Strash,
2010)
Extendsapplicabilityofsta5s5calnetworkmodelingtolargernetworksandmore
complexmodels
Accomplishments
P.Smyth:NetworksMURIMee5ng,June3,2011
22
Topic State of the Art Prior to Project
State of the Art Now
Poten=al Applica=ons And Impact
Generaltheoryforhandling
missingdatainsocialnetworks
Problemonlypar5allyunderstood.
NosoVwareavailableforsta5s5calmodeling
Generalsta5s5caltheoryfortrea5ngmissingdatainasocialnetworkcontext.
Publicly‐availablecodeinR.(GileandHandcock,2010)
Allowsapplica5onofsocialnetworkmodelingtodatasetswithsignificantmissing
data
Rela5onaleventmodels
Basicdyadiceventmodels.
Noexogenousevents.NopublicsoVware.
Muchrichermodelwithexogenousevents,egocentricsupport,mul5pleobserver
accounts,hierarchiesSoVwarepubliclyavailable
(Bugsetal,2010)
Providesageneralframeworkfordynamic
networkmodelingtolargerealis5capplica5ons
Cliquefindingalgorithms
Tooslowforuseinsta5s5calnetwork
modeling
Newlinear‐5mealgorithmforlis5ngallmaximalcliquesin
sparsegraphs(Eppstein,Loffler,Strash,
2010)
Extendsapplicabilityofsta5s5calnetworkmodelingtolargernetworksandmore
complexmodels
Accomplishments
P.Smyth:NetworksMURIMee5ng,June3,2011
23Application to Email Data: 200,000 email messages among 3000 individuals over 3 months
(DuBois and Smyth, ACM SIGKDD 2010)
P.Smyth:NetworksMURIMee5ng,June3,2011
24
Impact:SoVware
• RLanguageandEnvironment– Open‐source,high‐levelenvironmentforsta5s5calcompu5ng
– Defaultstandardamongresearchsta5s5cians‐increasinglybeingadoptedbyothers
– Es5mated250kto1millionusers
• Statnet– Rlibrariesforanalysisofnetworkdata– Newcontribu5onsfromthisMURIproject:
• Missingdata(GileandHandcock,2010)
• Rela5onaleventmodels(Bugs,2010)
• Latent‐classmodels(DuBois,2010)
• Fastclique‐finding(Strash,2010)• +more……
P.Smyth:NetworksMURIMee5ng,June3,2011
25
Impact:Publica5ons
• Over40peer‐reviewedpublica5ons– acrosscomputerscience,sta5s5cs,andsocialscience
– Highvisibility• Science,Bugs,2009• Journal of the American Sta3s3cal Associa3on,Schweinberger,inpress• Annals of Applied Sta3s3cs,GileandHandcock,2010• Journal of the ACM,daFonsecaandMount,2010• Journal of Machine Learning Research,Asuncion,Smyth,etc,2009
– Highlyselec5veconferences• ACMSIGKDD2010(16%acceptrate)• NeuralInforma5onProcessing(NIPS)Conference2009(25%accepts)• IEEEInfocom2010(17.5%accepts)
• Cross‐pollina5on– Exposingcomputerscien5ststosta5s5calandsocialnetworkingideas
– Exposingsocialscien5stsandsta5s5cianstocomputa5onalmodelingideas
P.Smyth:NetworksMURIMee5ng,June3,2011
26
Impact:WorkshopsandInvitedTalks
• 2010Poli5calNetworksConference– WorkshoponNetworkAnalysis
– PresentedandrunbyBugsandstudentsSpiro,Fitzhugh,Almquist
• InvitedTalks:ConferencesandWorkshops– R!2010ConferenceatNIST(Handcock,2010)– 2010SummerSchoolonSocialNetworks(Bugs)– MiningandLearningwithGraphsWorkshop(Smyth,2010)– NSF/SFIWorkshoponSta5s5calMethodsfortheAnalysisofNetworkData(Handcock,2009)– Interna5onalWorkshoponGraph‐Theore5cMethodsinComputerScience(Eppstein,2009)
– Quan5ta5veMethodsinSocialScience(QMSS)Seminar,Dublin(Almquist.2010)– +manymore…..
• InvitedTalks:Universi5es– Stanford,UCLA,GeorgiaTech,UMass,Brown,etc
P.Smyth:NetworksMURIMee5ng,June3,2011
27
Impact:theNextGenera5on
• Facultyposi5onsatUMass– RyanActon,KristaGile‐>AsstProfs,partofnewini5a5veinComputa5onalSocial
Science
• Studentsspeakingatmajorsummerconferences– SunbeltInterna5onalSocialNetworks(Jasny,Spiro,Fitzhugh,Almquist,DuBois
– ACMSIGKDDConference(DuBois)
– Interna5onalConferenceonMachineLearning(Vu)
– AmericanSociologicalAssocia5onMee5ng(Marcum,Jasny,Spiro,Fitzhugh,Almquist)
• Bestpaperawardsornomina5ons(Spiro,Hummel)
• Na5onalfellowships:DuBois(NDSEG),Asuncion(NSF),Navaroli(NDSEG)
P.Smyth:NetworksMURIMee5ng,June3,2011
28
…..andtheOldGenera5on
• CarterBugs– AmericanSociologicalAssocia5on,LeoA.Goodmanaward,2010
– highestawardtoyoungmethodologicalresearchersinsocialscience
• MichaelGoodrich– ACMFellow,IEEEFellow,2009
• PadhraicSmyth– ACMSIGKDDInnova5onAward2009
– AAAIFellow2010
• MarkHandcock– FellowoftheAmericanSta5s5calAssocia5on,2009
P.Smyth:NetworksMURIMee5ng,June3,2011
29
WhatNext?
• “Push”algorithmicadvancesintosta5s5calmodeling– Willallowustoscaleexis5ngalgorithmstomuchlargerdatasets
• Developnetworkmodelswithricherrepresenta5onalpower– Geographicdata,temporalevents,textdata,actorcovariates,heterogeneity,etc
• Systema5callyevaluateandtestdifferentapproaches– evaluateabilityofmodelstopredictover5me,toimputemissingvalues,etc
• Applytheseapproachestohighvisibilityproblemsanddatasets– E.g.,onlinesocialinterac5onsuchasemail,Facebook,Twiger,blogs
• MakesoVwarepubliclyavailable
P.Smyth:NetworksMURIMee5ng,June3,2011
30
Organiza5onalCollabora5onduringtheKatrinaDisaster
Combined Network plotted by HQ Location
Almquist and Butts
P.Smyth:NetworksMURIMee5ng,June3,2011
31SESSION 1:
9:20 Dynamic Egocentric Models for Citation Networks Dave Hunter, Professor, Statistics, Penn State University 9:45 Membership Dimension Maarten Loffler, Postdoctoral Fellow, Computer Science, UC Irvine 10:05 Multilevel Network Models for Classroom Dynamics Chris DuBois, PhD student, Statistics, UC Irvine 10:30 COFFEE BREAK
SESSION 2: 10:45 DISCUSSION: FAST CHANGE-SCORE COMPUTATION IN DYNAMIC GRAPHS Led by David Eppstein and Michael Goodrich, Computer Science, UC Irvine 11:35 Bayesian Meta-Analysis of Network Data via Reference Quantiles Carter Butts, Professor, Social Sciences, UC Irvine
12:00 Break for lunch (lunch for PIs + visitors at the University Club)
P.Smyth:NetworksMURIMee5ng,June3,2011
32
1:30 to 2:40 Short Highlight Talks
Computational Issues with Exponential Random Graph Models Mark Handcock, Professor, Statistics, UCLA
Experimental Results on Fast Clique Finding David Eppstein, Professor, Computer Science, UC Irvine
Modeling Rates between Affiliates on Facebook from Sampled Data Emma Spiro, PhD student, Social Sciences, UC Irvine
Modeling Degree Sequences of Undirected Networks with Application to 9/11 Disaster Networks Miruna Petrescu-Prahova, Postdoctoral Fellow, Statistics, University of Washington
Statistical Models for Text and Networks Jimmy Foulds, PhD student, Computer Science, UC Irvine
Analysis of Life History Data Sean Fitzhugh, PhD student, Social Sciences, UC Irvine
Approximate Sampling for Binary Discrete Exponential Families with Fixed Execution Time and Quality Guarantees Carter Butts, Professor, Social Sciences, UC Irvine
P.Smyth:NetworksMURIMee5ng,June3,2011
33
2:40 – 3:30: SESSION 3
2:40 Instability, Sensitivity, and Degeneracy of Discrete Exponential Families Michael Schweinberger, Postdoctoral Fellow, Penn State University
3:05 Empirical Analysis of Latent Space Embedding David Mount, Professor, University of Maryland
3:30 COFFEE BREAK
4:00 DISCUSSION: LATENT VARIABLE MODELING OF NETWORK DATA Led by Carter Butts and Padhraic Smyth
4:45 WRAP-UP, CLOSING COMMENTS
5:00 ADJOURN 5:00 ADJOURN
P.Smyth:NetworksMURIMee5ng,June3,2011
34
Addi5onalResourcesProjectWebsite:hgp://www.datalab.uci.edu/muri/
SlidesandPostersfromAHM:hgp://www.datalab.uci.edu/muri/june2011/
Publica5ons:hgp://www.datalab.uci.edu/muri/publica5ons.php
SoVware:hgp://csde.washington.edu/statnet/
DataSets:hgp://networkdata.ics.uci.edu/resources.php
P.Smyth:NetworksMURIMee5ng,June3,2011
35
QUESTIONS?