1
KIT – University of the State of Baden- Württemberg and National Research Center of the Helmholtz Association NetworKit An Interactive Tool Suite for High-Performance Network Analysis Parallel Computing Group - Institute of Theoretical Informatics - Karlsruhe Institute of Technology (KIT) Christian L. Staudt , Aleksejs Sazonovs, Henning Meyerhenke Open Source Analytics Community Detection Centrality NetworKit is an open-source software package for high-performance analysis of large complex networks uses shared-memory parallelism and scales from notebooks to compute servers combines kernels written in C++ with a convenient interactive interface written in Python. Core Decomposition Design Goals performance interface integration Parallel Community Detection Heuristics [Staudt, Meyerhenke, ICPP 2013] PLM: modularity-driven multi-level technique, based on sequential Louvain method [Blondel et al. 2008] high modularity fast, scales to billions of edges PLP: parallel label-propagation technique, based on [Raghavan et al. 2007] fastest community detection heuristic scales well with the number of processors EPP: ensemble technique, combining several weak classifiers into a strong one PageRank eigenvector centrality betweenness centrality [Brandes 2001] betweenness approximation fast heuristic [Geisberger, Sanders, Schultes 2008] approximation with maximum error guarantee [Riondato, Kornaropoulos 2014] Degree Distribution Degree Assortativity By publishing NetworKit under the permissive open-source MIT license, we encourage usage and contributions by a community of algorithm engineers and data analysts. We thank all previous contributors. Get NetworKit: http://www.network-analysis.info Future improved support for dynamic and attributed graphs dynamic network analysis algorithms sparsification, filtering and compression new generative models Clustering Coefficients Diameter Additional Algorithms scalable algorithms, employing parallelism approximation performance-aware implementation Python ecosystem for scientific computing & data analysis additional network analysis software (e.g. Gephi, NetworkX) C D u Connected Components Interactive network analysis using IPython Notebook NetworKit architecture modular design interactive usage via Python k -cores result from iteratively peeling away nodes of degree k O(m) algorithm [Batagelj, Zaversnik 2003] www.network-analysis.info powerlaw module [Alstott et al. 2014] tests statistically for powerlaw distribution degree assortativity coefficient: correlation of node degrees among neighbors O(m) time algorithm local and global exact parallel computation in O(nd 2 max ) time very fast approximation with error guarantee [Schank, Wagner 2005] breadth-first and depth-first search Dijkstra’s algorithm approximate maximum weight matching algebraic distance maximum flows ... exact calculation (BFS/Dijkstra) fast approximation with bounded error [Magien et al. 2009] computed using parallel label propagation scheme Graph Generators Erdős-Renyi model classic random graph model fast generator Barabasi-Albert model produces networks with powerlaw degree distribution static and dynamic generator Chung-Lu model replicates any given degree distribution R-MAT generator popular high-performance graph generator power-law degree distribution, small-world property and self-similarity PubWeb generator geometric disk graph model simulates P2P network static and dynamic generator Performance Performance measurements on a shared-memory server with 256 GB RAM and 2x8 Intel(R) Xeon(R) E5-2680 cores (max. 32 threads) PLP 1 PLP 2 PLP 3 ζ 2 ζ 1 ζ 3 H ˜ ζ PLM ζ

NetworKitparco.iti.kit.edu/staudt/attachments/talks/Poster... · 2014. 7. 7. · KIT–UniversityoftheStateofBaden-Württembergand NationalResearchCenteroftheHelmholtzAssociation

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NetworKitparco.iti.kit.edu/staudt/attachments/talks/Poster... · 2014. 7. 7. · KIT–UniversityoftheStateofBaden-Württembergand NationalResearchCenteroftheHelmholtzAssociation

KIT – University of the State of Baden- Württemberg andNational Research Center of the Helmholtz Association

NetworKitAn Interactive Tool Suite for High-Performance Network Analysis

Parallel Computing Group - Institute of Theoretical Informatics - Karlsruhe Institute of Technology (KIT)Christian L. Staudt, Aleksejs Sazonovs, Henning Meyerhenke

Open Source

Analytics

Community Detection

Centrality

NetworKit� is an open-source software package for high-performance analysis of large complex networks� uses shared-memory parallelism and scales from notebooks to compute servers� combines kernels written in C++ with a convenient interactive interface written in Python.

Core Decomposition

Design Goals� performance� interface� integration

Parallel Community Detection Heuristics[Staudt, Meyerhenke, ICPP 2013]

� PLM: modularity-driven multi-level technique, basedon sequential Louvain method [Blondel et al. 2008]

– high modularity– fast, scales to billions of edges

� PLP: parallel label-propagation technique, based on[Raghavan et al. 2007]

– fastest community detection heuristic– scales well with the number of processors

� EPP: ensemble technique,combining several weak classifiersinto a strong one

� PageRank� eigenvector centrality� betweenness centrality [Brandes 2001]

� betweenness approximation– fast heuristic [Geisberger, Sanders, Schultes 2008]

– approximation with maximum error guarantee[Riondato, Kornaropoulos 2014]

Degree Distribution

Degree Assortativity

By publishing NetworKit under the permissive open-source MIT license, we encourage usage and contributionsby a community of algorithm engineers and data analysts. We thank all previous contributors.Get NetworKit: http://www.network-analysis.info

Future� improved support for dynamic and

attributed graphs� dynamic network analysis algorithms� sparsification, filtering and compression� new generative models

Clustering Coefficients

Diameter

Additional Algorithms

� scalable algorithms, employing– parallelism– approximation

� performance-aware implementation

� Python ecosystem for scientific computing & data analysis� additional network analysis software (e.g. Gephi, NetworkX)

C

Du

ConnectedComponents

Interactive network analysis using IPython Notebook

NetworKit architecture

� modular design� interactive usage via Python

� k-cores result from iterativelypeeling away nodes of degree k

� O(m) algorithm[Batagelj, Zaversnik 2003]

www.network-analysis.info

� powerlaw module [Alstott et al.2014] tests statistically forpowerlaw distribution

� degree assortativitycoefficient: correlation ofnode degrees among neighbors

� O(m) time algorithm

� local and global� exact parallel computation in

O(nd2max) time� very fast approximation with

error guarantee[Schank, Wagner 2005]

� breadth-first and depth-firstsearch

� Dijkstra’s algorithm� approximate maximum weightmatching

� algebraic distance� maximum flows� . . .

� exact calculation (BFS/Dijkstra)� fast approximation with

bounded error [Magien et al. 2009]� computed using parallel label

propagation scheme

Graph GeneratorsErdős-Renyi model� classic random

graph model� fast generator

Barabasi-Albertmodel� produces networks

with powerlawdegree distribution

� static and dynamicgenerator

Chung-Lu model� replicates any given degree

distribution

R-MAT generator� popular high-performance

graph generator� power-law degree

distribution, small-worldproperty and self-similarity

PubWeb generator� geometric disk graph

model� simulates P2P network� static and dynamic

generator

Performance

Performance measurements on a shared-memory server with 256 GB RAM and 2x8 Intel(R) Xeon(R) E5-2680 cores (max. 32 threads)

PLP1 PLP2 PLP3

ζ2ζ1 ζ3

H

ζ̃

PLM

ζ