Upload
ndsslvt
View
60
Download
3
Tags:
Embed Size (px)
Citation preview
CINET: A Cyber-‐Infrastructure for Network Science
(Overview)
NSF Software Development for CyberInfrastructure Grant OCI-‐1032677 Additional support by grants from DTRA V&V, DTRA CNIMS, NSF NetSE,
NSF DIBBS Team
Virginia Tech, Indiana U., SUNY Albany, Jackson State, Argonne Na>onal Lab, U. Chicago, NCAT, U. Houston Downtown
Goal: A Glimpse of CINET Workings & Purpose
• Workings – Workshop: hands-‐on use & demonstraHons. – Worthwhile: high level
• Glimpse of CINET “insides.” • AppreciaHon for what goes on behind the UIs.
• CINET – A community resource.
2
0"
1000000"
2000000"
3000000"
4000000"
5000000"
6000000"
7000000"
2000" 2002" 2004" 2006" 2008" 2010"
Network Science
• Research in network science has been increasing very rapidly in the last decade, in many different scienHfic fields.
• Several conferences and journals; e.g., ASONAM, WWW, Web Sci, Network Science.
• Networks can be very large: ~108 nodes, ~1010 edges, requiring HPC for analysis
• There is a need for middleware, i.e., an interface layer – Domain experts do not need to become
experts in graph theory, data mining, and high-‐performance compuHng
Number of papers with “Complex Networks” in the title
“Network science is the study of network representations of physical, biological, and social phenomena”
3
MAU=monthly acHve users
The Motley Fool
Network Science
4
How many connecHons does the person in orange have? Who are the mostly highly connected people? How many connected groups are in a populaHon? How many “friends-‐of-‐friends” arrangements are there? Who are the people (computers, etc.) that are on the most pathways between other pairs of agents? If I “seed” (infect) the orange person, how does the infecHon spread?
network
IllustraHve quesHons
CINET To A User user user
Networks
CINET To A User user user
●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●
●
●
●●●
●
●●●●●●●
●●●●●
●●●
●●●●
●●●●●●
●
●
●
●●●●
●●
●●
●●●●
●
●●
●●●●●
●
●
●
●●●●●●
●
●
●●●●●
●●●
●
●●●
●●●●
●●●●●●●●●
●●
●
●●
●●●●●
●
●
●
●
●
●●●●
●
●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●● ●100
101
102
103
104
105
100 101 102 103Degree
Num
ber o
f Nod
es 4B node graph generator
Networks
Network generators and measures
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fra
cti
on o
f N
odes
Cluster Coefficient
Cluster Coefficient Distribution-Miami
No Shuffle
10% Shuffle
50% Shuffle
100% Shuffle
Miami
CINET To A User
7
user user
●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●
●
●
●●●
●
●●●●●●●
●●●●●
●●●
●●●●
●●●●●●
●
●
●
●●●●
●●
●●
●●●●
●
●●
●●●●●
●
●
●
●●●●●●
●
●
●●●●●
●●●
●
●●●
●●●●
●●●●●●●●●
●●
●
●●
●●●●●
●
●
●
●
●
●●●●
●
●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●● ●100
101
102
103
104
105
100 101 102 103Degree
Num
ber o
f Nod
es 4B node graph generator
0"
0.001"
0.002"
0.003"
Base"
0+10"
11+20"
21+30"
31+40"
41+50"
51+60"
61+70"
71+80"
81+90"Frac%of%P
opula,
on%
Age%Range%for%Vaccina,on%
Liberia Mexico City
Networks
Network generators and measures
Network dynamics
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fra
cti
on o
f N
odes
Cluster Coefficient
Cluster Coefficient Distribution-Miami
No Shuffle
10% Shuffle
50% Shuffle
100% Shuffle
Miami
CINET Underneath
8
user user
Client/server
CINET Underneath
9
user user
Parallel Distributed Algorithms 1. counHng triangles. 2. edge swapping. 3. converHng graph formats. 4. simulaHon. 5. … others …
Client/server
CINET Underneath
10
user user
Parallel Distributed Algorithms 1. counHng triangles. 2. edge swapping. 3. converHng graph formats. 4. simulaHon. 5. … others …
Input Checking: 1. immediate value. 2. values within a screen. 3. values across screens.
Client/server
CINET Underneath
11
●
●
●
●
●
0
50
100
150
2010 2011 2012 2013 2014Year
Num
bers
● ModulesNetworks
user user
Parallel Distributed Algorithms 1. counHng triangles. 2. edge swapping. 3. converHng graph formats. 4. simulaHon. 5. … others …
Input Checking: 1. immediate value. 2. values within a screen. 3. values across screens.
Client/server
CINET—What Is It? • Cyber-‐infrastructure for network science. • Suite of applicaHons
– Granite: network structure; measures, graphs. – EDISON: network dynamics; models. – GDSC: network dynamics (full); models. – Organic expansion.
• SupporHng services • Infrastructure • Environment for collaboraHve science. • Community resource.
12
Community Resource
13
CINET
networks
algorithms
simulaHons resources
annotaHons
course materials
analyses
Community member contribuHons
CINET Layered Architecture
VizApp: App for network
visualization
Granite: Graph structural analysis
GDSC: Phase space analysis of graph
dynamics
Computing resources and data storage
Simfrastructure
Case studies
Add network Add
structural method
Store results
Add data and statistical
analysis method
14
EDISON: Network dynamics; spread of contagions over
networks
Research Uses
Tools in CINET
Middleware/Workflow
Hardware
Metadata Curation
Memoization Incentivization
DL/Common Services
Networks (directed attributed)
Services for network
manipulation
Netscript
Network science courses (Albany, NCAT, JSU, VT)
CINET Layered Architecture
VizApp: App for network
visualization
Granite: Graph structural analysis
GDSC: Phase space analysis of graph
dynamics
Computing resources and data storage
Network science courses (Albany, JSU, NCAT, VT)
Case studies
Add structural method
Store results
Add data and statistical
analysis method
15
EDISON: Network dynamics; spread of contagions over
networks
Research Uses
Tools in CINET
Hardware
DL/Common Services
Networks (directed attributed)
Services for network
manipulation
UI UI UI
Simfrastructure Middleware/Workflow Netscript
Under the hood
Add network
Metadata Curation
Memoization Incentivization
• Structural Analysis Tool (Granite) – 110+ networks (graphs) – 18+ network generators – 70+ network algorithms (measures); GaLib, SNAP (Stanford), NetworkX – VisualizaHon of networks; Gephi – Service for adding new networks (graphs) – Service for adding new structural analysis tools (graph algorithms)
• Graph Dynamical System Calculator (GDSC) – Analyzing the phase structure of GDS; small graphs – 13 graph templates; 15 vertex funcHon (behavior) families.
• SimulaHon of Dynamics (EDISON) – Compute (contagion) dynamics on larger networks: simulaHon. – Services to manipulate a"ributed networks and to run simulaHons. – Several contagion models; with and without intervenHons.
CINET Apps Overview
• Structural Analysis Tool (Granite) – 110+ networks (graphs) – 18+ network generators – 70+ network algorithms (measures); GaLib, SNAP (Stanford), NetworkX – VisualizaHon of networks; Gephi – Service for adding new networks (graphs) – Service for adding new structural analysis tools (graph algorithms)
• Graph Dynamical System Calculator (GDSC) – Analyzing the phase structure of GDS; small graphs – 13 graph templates; 15 vertex funcHon (behavior) families.
• SimulaHon of Dynamics (EDISON) – Compute (contagion) dynamics on larger networks: simulaHon. – Services to manipulate a"ributed networks and to run simulaHons. – Several contagion models; with and without intervenHons.
CINET Apps Overview
StaHcs/Structure
Dynamics
• Middleware – Sending messages (requests for services, status); sending data. – Brokers for services provide communicaHon with services.
• Resource Manager – Allows mulHple computaHonal resources to be used and selected. – Uses remote grids, clouds.
• Netscript – Workflows.
• Digital Library (DL) – Metadata/data storage, organizaHon. – OperaHons: curaHon, memoizaHon, incenHvzaHon, etc.
• (Common) Services – Support and/or interact with DL, web apps. – Example: Query services, data assignment service.
• Website – AddiHonal resources (course notes, videos, tutorials, research papers etc).
CINET Infrastructure Overview
CINET User Benefits
19
correctness
reproducibility
reuse
security
Open access system
customizaHon
privacy
models
applicaHons
algorithms
Selected Challenges
• Challenge 1: Simple computaHonal interface for domain experts with linle training. – (ComputaHonal experts, too)
• Challenge 2: Large networks. • Challenge 3: Data management and movement.
20
Types of PublicaHons
• System (architecture) • Algorithms • Dynamical systems characterizaHons • Uses (applicaHons)
21
PublicaHons—Architecture/Use
• CINET team, “CINET 2.0: A CyberInfrastructure for Network Science,” eScience 2014.
• CINET Team, “CINET: A CyberInfrastructure for Network Science,” eScience 2012.
• Abdelhamid et. al., “GDSCalc: A Web-‐Based ApplicaHon for EvaluaHng Discrete Graph Dynamical Systems,” Plos One 2015.
22
PublicaHons—Algorithms • Kuhlman et. al., “A General-‐Purpose Graph Dynamical System Modeling Framework,” WSC 2011. • Maksudul Alam and Maleq Khan,Parallel Algorithms for GeneraHng Random Networks with Given Degree
Sequences, 12th IFIP Interna4onal Conference on Network and Parallel Compu4ng (NPC), New York City, Sep. 2015.
• Shaikh Arifuzzaman, Maleq Khan and Madhav Marathe, A Space-‐efficient Parallel Algorithm for CounHng Exact Triangles in Massive Networks, 17th IEEE Interna4onal Conference on High Performance Compu4ng and Communica4ons (HPCC), New York City, Aug. 2015.
• Shaikh Arifuzzaman and Maleq Khan, Fast Parallel Conversion of Edge List to Adjacency List for Large-‐Scale Graphs, 23rd High Performance Compu4ng Symposium (HPC), Alexandria, VA, USA, April 2015.
• Hasanuzzaman Bhuiyan, Jiangzhuo Chen, Maleq Khan, and Madhav V. Marathe,Fast Parallel Algorithms for Edge-‐Switching to Achieve a Target Visit Rate in Heterogeneous Graphs, Interna4onal Conference on Parallel Processing (ICPP), Minneapolis, Sep. 2014.
• Maksudul Alam, Maleq Khan, and Madhav V. Marathe,Distributed-‐Memory Parallel Algorithms for GeneraHng Massive Scale-‐free Networks Using PreferenHal Anachment Model, Intl. Conf. for High Performance Compu4ng, Networking, Storage and Analysis (SuperCompu>ng), Denver, Nov. 2013.
• Shaikh Arifuzzaman, Maleq Khan, and Madhav V. Marathe,PATRIC: A Parallel Algorithm for CounHng Triangles in Massive Networks, ACM Conference on Informa4on and Knowledge Management (CIKM), San Francisco, Oct. 2013.
• Zhao Zhao, Guanying Wang, Ali Bun, Maleq Khan, V.S. Anil Kumar, and Madhav Marathe, SAHAD: Subgraph Analysis in Massive Networks Using Hadoop, 26th IEEE Interna4onal Parallel & Distributed Processing Symposium (IPDPS), Shanghai, China, May 2012.
• Zhao Zhao, Maleq Khan, V.S. Anil Kumar and Madhav V. Marathe, Subgraph EnumeraHon in Large Social Contact Networks using Parallel Color Coding and Streaming, 39th Interna4onal Conference on Parallel Processing (ICPP), San Diego, California, Sep. 2010.
23
PublicaHons—Dynamical Systems • Kuhlman, Chris J., and Henning S. Mortveit, “Limit Sets of Generalized,
MulH-‐Threshold Networks,” Journal of Cellular Automata, Vol. 10, pp. 161-‐193, 2015.
• Kuhlman, Chris J., and Henning S. Mortveit, “Anractor Stability in Nonuniform Boolean Networks,” Theore9cal Computer Science, Vol. 559, pp. 20-‐33, 2014.
• Kuhlman, Chris J., Henning S. Mortveit, David Murrugarra, and V. S. Anil Kumar, “BifurcaHons in Boolean Networks,” Automata, pp. 29-‐46, 2011.
The group has many publica>ons on dynamical systems; these use GDSC.
PublicaHons—ApplicaHons • Dumas, C., D. LaManna, T. M. Harrison, S. S. Ravi. L. Hagen, C. Kowila
and F. Chen, ``Examining PoliHcal MobilizaHon of Online CommuniHes through E-‐peHHoning Behavior in We the People (Extended Abstract), presented at the Social Media and Society Conference, Toronto, Canada, Oct. 2014.
• Dumas, C., D. LaManna, T. M. Harrison, S. S. Ravi. L. Hagen, C. Kowila and F. Chen, ``Examining PoliHcal MobilizaHon of Online CommuniHes through E-‐peHHoning Behavior in We the People", accepted for publicaHon the Journal of Big Data and Society, 2015.
• Dumas, C., D. LaManna, T. M. Harrison, S. S. Ravi. L. Hagen, C. Kowila and F. Chen, ``E-‐peHHoning as CollecHve PoliHcal AcHon in We the People", Proc. iConference 2015, Newport Beach, CA, March 2015 (20 pages).
CINET in Context • User interface—all user interacHon.
– No need to program. – No need for HPC resources.
• Types of analysis – Network structural characterizaHons. – Dynamics on networks.
• Large networks – GeneraHon. – Analyses.
• MulHple tools provided under a CINET umbrella. • Crowd-‐sourced plaworm
– Self-‐sustaining. – Self-‐managing.
• CollaboraHve science. • Community resource.
26 There are many good tools; but none to our knowledge so widely encompassing.
27
END