27
CINET: A CyberInfrastructure for Network Science (Overview) NSF Software Development for CyberInfrastructure Grant OCI1032677 Additional support by grants from DTRA V&V, DTRA CNIMS, NSF NetSE, NSF DIBBS Team Virginia Tech, Indiana U., SUNY Albany, Jackson State, Argonne Na>onal Lab, U. Chicago, NCAT, U. Houston Downtown

CINET: A Cyber-Infrastructure for Network Science Overview

  • Upload
    ndsslvt

  • View
    60

  • Download
    3

Embed Size (px)

Citation preview

Page 1: CINET: A Cyber-Infrastructure for Network Science Overview

CINET:    A  Cyber-­‐Infrastructure  for  Network  Science  

(Overview)      

NSF  Software  Development  for  CyberInfrastructure  Grant  OCI-­‐1032677  Additional  support  by  grants  from  DTRA  V&V,  DTRA  CNIMS,  NSF  NetSE,  

NSF  DIBBS  Team  

Virginia  Tech,  Indiana  U.,  SUNY  Albany,  Jackson  State,  Argonne  Na>onal  Lab,  U.  Chicago,  NCAT,  U.  Houston  Downtown  

 

Page 2: CINET: A Cyber-Infrastructure for Network Science Overview

Goal:    A  Glimpse  of  CINET  Workings  &  Purpose  

•  Workings  – Workshop:    hands-­‐on  use  &  demonstraHons.  – Worthwhile:    high  level  

•  Glimpse  of  CINET  “insides.”  •  AppreciaHon  for  what  goes  on  behind  the  UIs.  

•  CINET  – A  community  resource.  

2

Page 3: CINET: A Cyber-Infrastructure for Network Science Overview

0"

1000000"

2000000"

3000000"

4000000"

5000000"

6000000"

7000000"

2000" 2002" 2004" 2006" 2008" 2010"

Network  Science  

•  Research  in  network  science  has  been  increasing  very  rapidly  in  the  last  decade,  in  many  different  scienHfic  fields.  

•  Several  conferences  and  journals;  e.g.,  ASONAM,  WWW,  Web  Sci,  Network  Science.  

•  Networks  can  be  very  large:  ~108  nodes,  ~1010  edges,  requiring  HPC  for  analysis  

•  There  is  a  need  for  middleware,  i.e.,  an  interface  layer  –  Domain  experts  do  not  need  to  become  

experts  in  graph  theory,  data  mining,  and  high-­‐performance  compuHng  

Number of papers with “Complex Networks” in the title

“Network  science  is  the  study  of  network  representations  of  physical,  biological,  and  social  phenomena”  

3

MAU=monthly  acHve  users  

The Motley Fool

Page 4: CINET: A Cyber-Infrastructure for Network Science Overview

Network  Science  

4

How  many  connecHons  does  the  person  in  orange  have?    Who  are  the  mostly  highly  connected  people?    How  many  connected  groups  are  in  a  populaHon?    How  many  “friends-­‐of-­‐friends”  arrangements  are  there?    Who  are  the  people  (computers,  etc.)  that  are  on  the  most  pathways  between  other  pairs  of  agents?    If  I  “seed”  (infect)  the  orange  person,  how  does  the  infecHon  spread?  

network  

IllustraHve  quesHons  

Page 5: CINET: A Cyber-Infrastructure for Network Science Overview

CINET  To  A  User  user   user  

Networks  

Page 6: CINET: A Cyber-Infrastructure for Network Science Overview

CINET  To  A  User  user   user  

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●

●●●●●

●●●

●●●●

●●●●●●

●●●●

●●

●●

●●●●

●●

●●●●●

●●●●●●

●●●●●

●●●

●●●

●●●●

●●●●●●●●●

●●

●●

●●●●●

●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●● ●100

101

102

103

104

105

100 101 102 103Degree

Num

ber o

f Nod

es 4B  node  graph  generator  

Networks  

Network  generators  and  measures  

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fra

cti

on o

f N

odes

Cluster Coefficient

Cluster Coefficient Distribution-Miami

No Shuffle

10% Shuffle

50% Shuffle

100% Shuffle

Miami  

Page 7: CINET: A Cyber-Infrastructure for Network Science Overview

CINET  To  A  User  

7

user   user  

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●

●●●●●

●●●

●●●●

●●●●●●

●●●●

●●

●●

●●●●

●●

●●●●●

●●●●●●

●●●●●

●●●

●●●

●●●●

●●●●●●●●●

●●

●●

●●●●●

●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●● ●100

101

102

103

104

105

100 101 102 103Degree

Num

ber o

f Nod

es 4B  node  graph  generator  

0"

0.001"

0.002"

0.003"

Base"

0+10"

11+20"

21+30"

31+40"

41+50"

51+60"

61+70"

71+80"

81+90"Frac%of%P

opula,

on%

Age%Range%for%Vaccina,on%

Liberia  Mexico  City  

Networks  

Network  generators  and  measures  

Network  dynamics  

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fra

cti

on o

f N

odes

Cluster Coefficient

Cluster Coefficient Distribution-Miami

No Shuffle

10% Shuffle

50% Shuffle

100% Shuffle

Miami  

Page 8: CINET: A Cyber-Infrastructure for Network Science Overview

CINET  Underneath  

8

user   user  

Client/server  

Page 9: CINET: A Cyber-Infrastructure for Network Science Overview

CINET  Underneath  

9

user   user  

Parallel  Distributed  Algorithms      1.    counHng  triangles.      2.    edge  swapping.      3.    converHng  graph  formats.      4.    simulaHon.      5.    …  others  …  

Client/server  

Page 10: CINET: A Cyber-Infrastructure for Network Science Overview

CINET  Underneath  

10

user   user  

Parallel  Distributed  Algorithms      1.    counHng  triangles.      2.    edge  swapping.      3.    converHng  graph  formats.      4.    simulaHon.      5.    …  others  …  

Input  Checking:      1.    immediate  value.      2.    values  within  a  screen.      3.    values  across  screens.  

Client/server  

Page 11: CINET: A Cyber-Infrastructure for Network Science Overview

CINET  Underneath  

11

0

50

100

150

2010 2011 2012 2013 2014Year

Num

bers

● ModulesNetworks

user   user  

Parallel  Distributed  Algorithms      1.    counHng  triangles.      2.    edge  swapping.      3.    converHng  graph  formats.      4.    simulaHon.      5.    …  others  …  

Input  Checking:      1.    immediate  value.      2.    values  within  a  screen.      3.    values  across  screens.  

Client/server  

Page 12: CINET: A Cyber-Infrastructure for Network Science Overview

CINET—What  Is  It?  •  Cyber-­‐infrastructure  for  network  science.  •  Suite  of  applicaHons  

– Granite:    network  structure;  measures,  graphs.  –  EDISON:    network  dynamics;  models.  – GDSC:    network  dynamics  (full);  models.  – Organic  expansion.  

•  SupporHng  services  •  Infrastructure  •  Environment  for  collaboraHve  science.  •  Community  resource.  

12

Page 13: CINET: A Cyber-Infrastructure for Network Science Overview

Community  Resource  

13

CINET  

networks  

algorithms  

simulaHons  resources  

annotaHons  

course  materials  

analyses  

Community  member  contribuHons  

Page 14: CINET: A Cyber-Infrastructure for Network Science Overview

CINET  Layered  Architecture  

VizApp:  App  for  network  

visualization        

Granite:  Graph  structural  analysis  

GDSC:  Phase  space  analysis  of  graph  

dynamics  

Computing    resources  and  data  storage  

Simfrastructure  

Case  studies  

Add  network  Add  

structural  method  

Store  results  

Add  data  and  statistical  

analysis  method  

14

EDISON:  Network  dynamics;  spread  of  contagions  over  

networks  

Research  Uses  

Tools  in  CINET  

Middleware/Workflow  

Hardware  

Metadata  Curation  

Memoization  Incentivization  

DL/Common  Services  

Networks  (directed  attributed)  

Services  for  network  

manipulation  

Netscript  

Network  science  courses  (Albany,  NCAT,  JSU,  VT)  

Page 15: CINET: A Cyber-Infrastructure for Network Science Overview

CINET  Layered  Architecture  

VizApp:  App  for  network  

visualization        

Granite:  Graph  structural  analysis  

GDSC:  Phase  space  analysis  of  graph  

dynamics  

Computing    resources  and  data  storage  

Network  science  courses  (Albany,  JSU,  NCAT,  VT)  

Case  studies  

Add  structural  method  

Store  results  

Add  data  and  statistical  

analysis  method  

15

EDISON:  Network  dynamics;  spread  of  contagions  over  

networks  

Research  Uses  

Tools  in  CINET  

Hardware  

DL/Common  Services  

Networks  (directed  attributed)  

Services  for  network  

manipulation  

UI UI UI

Simfrastructure  Middleware/Workflow   Netscript  

Under  the  hood  

Add  network  

Metadata  Curation  

Memoization  Incentivization  

Page 16: CINET: A Cyber-Infrastructure for Network Science Overview

•  Structural  Analysis  Tool  (Granite)  –  110+  networks  (graphs)  –  18+  network  generators  –  70+  network  algorithms  (measures);  GaLib,  SNAP  (Stanford),  NetworkX  –  VisualizaHon  of  networks;  Gephi  –  Service  for  adding  new  networks  (graphs)  –  Service  for  adding  new  structural  analysis  tools  (graph  algorithms)  

•  Graph  Dynamical  System  Calculator  (GDSC)  –  Analyzing  the  phase  structure  of  GDS;  small  graphs  –  13  graph  templates;  15  vertex  funcHon  (behavior)  families.  

•  SimulaHon  of  Dynamics  (EDISON)  –  Compute  (contagion)  dynamics  on  larger  networks:    simulaHon.  –  Services  to  manipulate  a"ributed  networks  and  to  run  simulaHons.  –  Several  contagion  models;  with  and  without  intervenHons.  

CINET  Apps  Overview  

Page 17: CINET: A Cyber-Infrastructure for Network Science Overview

•  Structural  Analysis  Tool  (Granite)  –  110+  networks  (graphs)  –  18+  network  generators  –  70+  network  algorithms  (measures);  GaLib,  SNAP  (Stanford),  NetworkX  –  VisualizaHon  of  networks;  Gephi  –  Service  for  adding  new  networks  (graphs)  –  Service  for  adding  new  structural  analysis  tools  (graph  algorithms)  

•  Graph  Dynamical  System  Calculator  (GDSC)  –  Analyzing  the  phase  structure  of  GDS;  small  graphs  –  13  graph  templates;  15  vertex  funcHon  (behavior)  families.  

•  SimulaHon  of  Dynamics  (EDISON)  –  Compute  (contagion)  dynamics  on  larger  networks:    simulaHon.  –  Services  to  manipulate  a"ributed  networks  and  to  run  simulaHons.  –  Several  contagion  models;  with  and  without  intervenHons.  

CINET  Apps  Overview  

StaHcs/Structure  

Dynamics  

Page 18: CINET: A Cyber-Infrastructure for Network Science Overview

•  Middleware  –  Sending  messages  (requests  for  services,  status);  sending  data.  –  Brokers  for  services  provide  communicaHon  with  services.  

•  Resource  Manager  –  Allows  mulHple  computaHonal  resources  to  be  used  and  selected.  –  Uses  remote  grids,  clouds.  

•  Netscript  –  Workflows.  

•  Digital  Library  (DL)  –  Metadata/data  storage,  organizaHon.  –  OperaHons:    curaHon,  memoizaHon,  incenHvzaHon,  etc.  

•  (Common)  Services  –  Support  and/or  interact  with  DL,  web  apps.  –  Example:    Query  services,  data  assignment  service.  

•  Website  –  AddiHonal  resources  (course  notes,  videos,  tutorials,  research  papers  etc).  

CINET  Infrastructure  Overview  

Page 19: CINET: A Cyber-Infrastructure for Network Science Overview

CINET  User  Benefits  

19

correctness  

reproducibility  

reuse  

security  

Open  access  system  

customizaHon  

privacy  

models  

 applicaHons  

algorithms  

Page 20: CINET: A Cyber-Infrastructure for Network Science Overview

Selected  Challenges  

•  Challenge  1:    Simple  computaHonal  interface  for  domain  experts  with  linle  training.  –  (ComputaHonal  experts,  too)  

•  Challenge  2:    Large  networks.  •  Challenge  3:    Data  management  and  movement.  

20

Page 21: CINET: A Cyber-Infrastructure for Network Science Overview

Types  of  PublicaHons  

•  System  (architecture)  •  Algorithms  •  Dynamical  systems  characterizaHons  •  Uses  (applicaHons)  

21

Page 22: CINET: A Cyber-Infrastructure for Network Science Overview

PublicaHons—Architecture/Use  

•  CINET  team,  “CINET  2.0:  A  CyberInfrastructure  for  Network  Science,”  eScience  2014.  

•  CINET  Team,  “CINET:  A  CyberInfrastructure  for  Network  Science,”  eScience  2012.  

•  Abdelhamid  et.  al.,  “GDSCalc:    A  Web-­‐Based  ApplicaHon  for  EvaluaHng  Discrete  Graph  Dynamical  Systems,”  Plos  One  2015.    

22

Page 23: CINET: A Cyber-Infrastructure for Network Science Overview

PublicaHons—Algorithms  •  Kuhlman  et.  al.,  “A  General-­‐Purpose  Graph  Dynamical  System  Modeling  Framework,”  WSC  2011.  •  Maksudul  Alam  and  Maleq  Khan,Parallel  Algorithms  for  GeneraHng  Random  Networks  with  Given  Degree  

Sequences,  12th  IFIP  Interna4onal  Conference  on  Network  and  Parallel  Compu4ng  (NPC),  New  York  City,  Sep.  2015.  

•  Shaikh  Arifuzzaman,  Maleq  Khan  and  Madhav  Marathe,  A  Space-­‐efficient  Parallel  Algorithm  for  CounHng  Exact  Triangles  in  Massive  Networks,  17th  IEEE  Interna4onal  Conference  on  High  Performance  Compu4ng  and  Communica4ons  (HPCC),  New  York  City,  Aug.  2015.    

•  Shaikh  Arifuzzaman  and  Maleq  Khan,  Fast  Parallel  Conversion  of  Edge  List  to  Adjacency  List  for  Large-­‐Scale  Graphs,  23rd  High  Performance  Compu4ng  Symposium  (HPC),  Alexandria,  VA,  USA,  April  2015.  

•  Hasanuzzaman  Bhuiyan,  Jiangzhuo  Chen,  Maleq  Khan,  and  Madhav  V.  Marathe,Fast  Parallel  Algorithms  for  Edge-­‐Switching  to  Achieve  a  Target  Visit  Rate  in  Heterogeneous  Graphs,  Interna4onal  Conference  on  Parallel  Processing  (ICPP),  Minneapolis,  Sep.  2014.    

•  Maksudul  Alam,  Maleq  Khan,  and  Madhav  V.  Marathe,Distributed-­‐Memory  Parallel  Algorithms  for  GeneraHng  Massive  Scale-­‐free  Networks  Using  PreferenHal  Anachment  Model,  Intl.  Conf.  for  High  Performance  Compu4ng,  Networking,  Storage  and  Analysis  (SuperCompu>ng),  Denver,  Nov.  2013.    

•  Shaikh  Arifuzzaman,  Maleq  Khan,  and  Madhav  V.  Marathe,PATRIC:  A  Parallel  Algorithm  for  CounHng  Triangles  in  Massive  Networks,  ACM  Conference  on  Informa4on  and  Knowledge  Management  (CIKM),  San  Francisco,  Oct.  2013.    

•  Zhao  Zhao,  Guanying  Wang,  Ali  Bun,  Maleq  Khan,  V.S.  Anil  Kumar,  and  Madhav  Marathe,  SAHAD:  Subgraph  Analysis  in  Massive  Networks  Using  Hadoop,  26th  IEEE  Interna4onal  Parallel  &  Distributed  Processing  Symposium  (IPDPS),  Shanghai,  China,  May  2012.  

•  Zhao  Zhao,  Maleq  Khan,  V.S.  Anil  Kumar  and  Madhav  V.  Marathe,  Subgraph  EnumeraHon  in  Large  Social  Contact  Networks  using  Parallel  Color  Coding  and  Streaming,  39th  Interna4onal  Conference  on  Parallel  Processing  (ICPP),  San  Diego,  California,  Sep.  2010.  

23

Page 24: CINET: A Cyber-Infrastructure for Network Science Overview

PublicaHons—Dynamical  Systems  •  Kuhlman,  Chris  J.,  and  Henning  S.  Mortveit,  “Limit  Sets  of  Generalized,  

MulH-­‐Threshold  Networks,”  Journal  of  Cellular  Automata,  Vol.  10,  pp.  161-­‐193,  2015.  

•  Kuhlman,  Chris  J.,  and  Henning  S.  Mortveit,  “Anractor  Stability  in  Nonuniform  Boolean  Networks,”  Theore9cal  Computer  Science,  Vol.  559,  pp.  20-­‐33,  2014.      

•  Kuhlman,  Chris  J.,  Henning  S.  Mortveit,  David  Murrugarra,  and  V.  S.  Anil  Kumar,  “BifurcaHons  in  Boolean  Networks,”  Automata,  pp.  29-­‐46,  2011.  

The  group  has  many  publica>ons  on  dynamical  systems;  these  use  GDSC.    

Page 25: CINET: A Cyber-Infrastructure for Network Science Overview

PublicaHons—ApplicaHons  •  Dumas,  C.,  D.  LaManna,  T.  M.  Harrison,  S.  S.  Ravi.  L.  Hagen,  C.  Kowila  

and  F.  Chen,  ``Examining  PoliHcal  MobilizaHon  of  Online  CommuniHes  through  E-­‐peHHoning  Behavior  in  We  the  People  (Extended  Abstract),  presented  at  the  Social  Media  and  Society  Conference,  Toronto,  Canada,  Oct.  2014.  

•  Dumas,  C.,  D.  LaManna,  T.  M.  Harrison,  S.  S.  Ravi.  L.  Hagen,  C.  Kowila  and  F.  Chen,  ``Examining  PoliHcal  MobilizaHon  of  Online  CommuniHes  through  E-­‐peHHoning  Behavior  in  We  the  People",  accepted  for  publicaHon  the  Journal  of  Big  Data  and  Society,  2015.  

•  Dumas,  C.,  D.  LaManna,  T.  M.  Harrison,  S.  S.  Ravi.  L.  Hagen,  C.  Kowila  and  F.  Chen,  ``E-­‐peHHoning  as  CollecHve  PoliHcal  AcHon  in  We  the  People",  Proc.  iConference  2015,  Newport  Beach,  CA,  March  2015  (20  pages).  

Page 26: CINET: A Cyber-Infrastructure for Network Science Overview

CINET  in  Context  •  User  interface—all  user  interacHon.  

–  No  need  to  program.  –  No  need  for  HPC  resources.  

•  Types  of  analysis  –  Network  structural  characterizaHons.  –  Dynamics  on  networks.  

•  Large  networks  –  GeneraHon.  –  Analyses.  

•  MulHple  tools  provided  under  a  CINET  umbrella.  •  Crowd-­‐sourced  plaworm  

–  Self-­‐sustaining.  –  Self-­‐managing.  

•  CollaboraHve  science.  •  Community  resource.  

26 There  are  many  good  tools;  but  none  to  our  knowledge  so  widely  encompassing.    

Page 27: CINET: A Cyber-Infrastructure for Network Science Overview

27

END