7
Graph Waves James Abello Daniel Nakhimovich [email protected] [email protected] Rutgers University New Brunswick, New Jersey Abstract We present efficient algorithmic mechanisms to partition graphs with up to 1.8 billion edges into subgraphs which are fixed points of degree peeling [3]. Similarly to [1], we are able to process graphs with tens of millions of edges in seconds and hundreds of millions of edges in minutes. For fixed points that turn out to be larger than a desired interactivity parameter we further decompose them with a novel (to our knowledge) linear algorithm into what we call “graph waves and fragments”. This decomposition is used to create span- ning views of fixed points that we call “DAG Covers”. We illustrate these decompositions by presenting intuitive and interactive visu- alizations of the meta-structures of a variety of publicly available data sets including social, web, and citation networks [18]. Keywords graph waves, kcore, graph decomposition, graph visualization 1 Introduction How far can we simplify computation and visualization in order to maintain “large scale” macro structural graph features without losing the ability to interactively extract more detailed connectivity? Some progress in this direction appears in [3]. In general, macro structural graph views can be obtained by using some form of vertex or edge aggregation that can be represented as hierarchy trees [4, 8, 12, 23, 29]. The choice of suitable representations that facilitate smooth visual interaction is a subject of active research [13, 19, 20, 22]. In [11] an edge stratification technique is proposed that is applied to a given layout of a graph with a few hundred vertices and edges. All of these techniques require algorithms with running time greater than linear in the number of graph elements. Our focus is on graph decompositions that are applicable to graphs of several orders of magnitude larger for which a direct layout is unavailable due to memory constraints or impractical due to screen size contraints. Some hierarchical decompositions have faster techniques. Major approaches include SPQR-trees, k-core and nuclei based decomposi- tions [1, 5, 10, 15, 17, 24] or even simple stochastic sampling [21, 27]. k-connectivity decomposition techniques have been explored in Both authors contributed equally to this research. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). EDBT 2020, March 30-April 2, 2020, Copenhagen, Denmark © 2020 Copyright held by the owner/author(s). https://doi.org/10.1145/nnnnnnn.nnnnnnn [4, 6, 16] and terrain metaphors have been explored in [2] and [28]. For large graphs, such techniques are limited because there is no known canonical hierarchical decomposition for k-connectivity. The notion of cores was first introduced in [25] and their utiliy was further identified in [5]. One of the main advantages of core based vertex decompositions is that they can be computed efficiently [10] and they provide an arguably “natural” hierarchical view of any network. The vertex core decomposition has been used to obtain edge partitions where each set of edges in the partition is a max- imal edge subgraph that is a fixed point of degree peeling. The maximality of these decompositions is desirable because it provides bounds on the connectivity between the decomposed subgraphs [3]. The pragmatic challenge of computing this iterative edge “layer” decomposition for graphs with several hundred million edges in a few minutes has been undertaken in [1]. However, if a particular layer in the decomposition is large, the navigation and visualiza- tion of such big fixed points is still a major task. For example, in the Friendster social network there are four fixed points with over 100 million edges. We address this issue by introducing a novel and efficient algorithm that we call the “wave decomposition”. The “waves” produced by the decomposition can be naturally separated into connected sub-waves which can in turn be refined into a collec- tion of “fragments”. For large fragments we propose using maximal matching based contractions combined with graph rendering and clustering techniques similar to [9, 30]. Our notion of fragments is quite different to that used in [7]. The ultimate goal of this work is to obtain a humanly-interpretable, hierarchical description of any graph. First we introduce fixed points and waves in sections 2 and 3. In sections 4 and 5 we show sample visualizations and runtime results. Lastly we discuss our contributions and future work in section 6. 2 Fixed Points and Degree Peeling We deal with undirected graphs G = ( V , E) with vertex set V ( G) and edge set E( G). We denote by n the number of vertices in V , m the number of edges in E, and the degree of a vertex u V by deg(u). Definition 1. (Peel Value) The peel value of a vertex u V ( G) denoted peel G ( u ) is the largest i ∈[1, deд( u )] such that u belongs to a subgraph of G of minimum degree i . Definition 2. (Graph Core) The core of G, core ( G), sometimes called the k-core of G, is the subgraph induced by the maximal subset of vertices of G whose peeling value is maximum. Cores can be intuitively thought of as dense subgraphs. Although, they are not necessarily maximally dense in any known way, [26] shows that k-cores are 2-approximate locally dense subgraphs.

Graph Waves - bigvis.imsi.athenarc.gr · in Def. 6. These sets form a vertex partition of the wave vertices. We construct a directed acyclic graph (DAG) representation of the wave

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Graph Waves - bigvis.imsi.athenarc.gr · in Def. 6. These sets form a vertex partition of the wave vertices. We construct a directed acyclic graph (DAG) representation of the wave

Graph WavesJames Abello∗

Daniel Nakhimovich∗[email protected]@rutgers.eduRutgers University

New Brunswick, New Jersey

AbstractWe present efficient algorithmic mechanisms to partition graphswith up to 1.8 billion edges into subgraphs which are fixed pointsof degree peeling [3]. Similarly to [1], we are able to process graphswith tens of millions of edges in seconds and hundreds of millionsof edges in minutes. For fixed points that turn out to be larger than adesired interactivity parameter we further decompose them with anovel (to our knowledge) linear algorithm into what we call “graphwaves and fragments”. This decomposition is used to create span-ning views of fixed points that we call “DAG Covers”. We illustratethese decompositions by presenting intuitive and interactive visu-alizations of the meta-structures of a variety of publicly availabledata sets including social, web, and citation networks [18].

Keywordsgraph waves, kcore, graph decomposition, graph visualization

1 IntroductionHow far can we simplify computation and visualization in orderto maintain “large scale” macro structural graph features withoutlosing the ability to interactively extract more detailed connectivity?Some progress in this direction appears in [3]. In general, macrostructural graph views can be obtained by using some form ofvertex or edge aggregation that can be represented as hierarchytrees [4, 8, 12, 23, 29]. The choice of suitable representations thatfacilitate smooth visual interaction is a subject of active research[13, 19, 20, 22]. In [11] an edge stratification technique is proposedthat is applied to a given layout of a graph with a few hundredvertices and edges. All of these techniques require algorithms withrunning time greater than linear in the number of graph elements.Our focus is on graph decompositions that are applicable to graphsof several orders of magnitude larger for which a direct layout isunavailable due to memory constraints or impractical due to screensize contraints.

Some hierarchical decompositions have faster techniques. Majorapproaches include SPQR-trees, k-core and nuclei based decomposi-tions [1, 5, 10, 15, 17, 24] or even simple stochastic sampling [21, 27].k-connectivity decomposition techniques have been explored in

∗Both authors contributed equally to this research.

Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s).EDBT 2020, March 30-April 2, 2020, Copenhagen, Denmark© 2020 Copyright held by the owner/author(s).https://doi.org/10.1145/nnnnnnn.nnnnnnn

[4, 6, 16] and terrain metaphors have been explored in [2] and [28].For large graphs, such techniques are limited because there is noknown canonical hierarchical decomposition for k-connectivity.The notion of cores was first introduced in [25] and their utiliy wasfurther identified in [5]. One of the main advantages of core basedvertex decompositions is that they can be computed efficiently [10]and they provide an arguably “natural” hierarchical view of anynetwork. The vertex core decomposition has been used to obtainedge partitions where each set of edges in the partition is a max-imal edge subgraph that is a fixed point of degree peeling. Themaximality of these decompositions is desirable because it providesbounds on the connectivity between the decomposed subgraphs [3].The pragmatic challenge of computing this iterative edge “layer”decomposition for graphs with several hundred million edges in afew minutes has been undertaken in [1]. However, if a particularlayer in the decomposition is large, the navigation and visualiza-tion of such big fixed points is still a major task. For example, inthe Friendster social network there are four fixed points with over100 million edges. We address this issue by introducing a noveland efficient algorithm that we call the “wave decomposition”. The“waves” produced by the decomposition can be naturally separatedinto connected sub-waves which can in turn be refined into a collec-tion of “fragments”. For large fragments we propose using maximalmatching based contractions combined with graph rendering andclustering techniques similar to [9, 30]. Our notion of fragments isquite different to that used in [7]. The ultimate goal of this work isto obtain a humanly-interpretable, hierarchical description of anygraph.

First we introduce fixed points and waves in sections 2 and 3. Insections 4 and 5 we show sample visualizations and runtime results.Lastly we discuss our contributions and future work in section 6.

2 Fixed Points and Degree PeelingWe deal with undirected graphs G = (V , E) with vertex set V (G)and edge set E(G). We denote by n the number of vertices in V ,m the number of edges in E, and the degree of a vertex u ∈ V bydeg(u).Definition 1. (Peel Value) The peel value of a vertex u ∈ V (G)denoted peelG (u) is the largest i ∈ [1,deд(u)] such that u belongs toa subgraph of G of minimum degree i .

Definition 2. (Graph Core) The core of G, core(G), sometimescalled the k-core ofG , is the subgraph induced by the maximal subsetof vertices of G whose peeling value is maximum.

Cores can be intuitively thought of as dense subgraphs. Although,they are not necessarily maximally dense in any known way, [26]shows that k-cores are 2-approximate locally dense subgraphs.

Page 2: Graph Waves - bigvis.imsi.athenarc.gr · in Def. 6. These sets form a vertex partition of the wave vertices. We construct a directed acyclic graph (DAG) representation of the wave

In a previous work, [3] utilizes k-cores to produce an iterativedecomposition of a graph’s edges into layers of fixed points (whichwe redefine below) with the same peel value. Since the decomposi-tion forms a partition of edges, we call vertices that appear in morethan one fixed point, clone vertices. The notion of clone vertices canbe used to define a diversity measure and are useful for detectingcommunity overlap [3].Definition 3. (Fixed Point) A graph Fk is a fixed point of degreepeeling k if core(Fk ) = V (Fk ) and the peel value of each vertex in Fkis k .

From proposition 1 in [3] it is easy to verify that a fixed pointof peel value k must have minimum degree k and average degreestrictly less than 2k. In Figure 1 we show the distribution of theaverage degrees of fixed points in the Patents citation and Friendstersocial networks, along with two black lines indicating the minimumand supremum average degree possible.

cit-Patents

com-Friendster

Figure 1: Plots of average degree of fixed points (y-axis) by peelvalue (x-axis) with frequency of fixed points of a given peel valueand average degree along the z-axis being emphasized by color (blueto red) and point size. Note that the x-axis and z-axis are shown ona log scale.

What we call fixed points of degree peeling k , fixed points of peelvalue k, or simply fixed points, are in fact extensions of forests calledk-dense forests in [14]. In that paper the authors first formulate thenotion of k-dense cycles as connected graphs where each vertexis of degree > k. From there it is natural to define a k-dense forestas a graph without k-dense cycles. Using any algorithm to findthe k-core of a graph, one can find what [14] calls a k-eliminationorder of a fixed point of degree peeling k. Thus, by Lemma 2.9in [14] the definition of a fixed point of degree peeling k and thedefinition of a k-dense forest are equivalent. The importance ofnoticing that fixed points are generalizations of trees is that itmotivates a decomposition of fixed points that works by iterativelypeeling “leaf-like” fragments (subsection 3.1).

3 Waves and FragmentsWaves are introduced in order to decompose Fixed points. Previousworks do not address decompositions of fixed points if they arelarge. To further decompose fixed points we introduce a “wave de-composition” which forms an edge partition by iteratively removing“leaf-like” fragments.

3.1 Wave DecompositionDefinition 4. We define the boundary of a vertex set S ⊂ V as∂S = {v ∈ V : ∃ (u,v) ∈ E, u ∈ S, v < S}. We define the k-boundary of S as ∂kS to be the vertices in ∂S with degree less thank restricted to the graph induced by V \ S . We extend these boundarydefinitions to a collection of disjoint sets by taking their union.

Definition 5. (Graph Fragment) Given a seed set of vertices S ⊂V the fragment generated by S, denoted frag(S), is the set of edges,(u,v), such that u ∈ S .

In Figure 2 (right) we show a decomposition of a fixed point intofragments. Next, we use graph fragments to introduce the notionof graph waves of fixed points.Definition 6. (Graph Waves of Fixed Points) A wave of a fixedpoint of peel valuek is the union of fragments in the sequence {frag(Sj )}mj=0with S0 being the set of vertices of minimum degree k and each sub-sequent Sj+1 = ∂k (∪

ji=0Si ).

Notice that a fixed point could consist of only one wave and awave could consist of only one fragment (i.e. k-regular graphs).

Due to space limitations pseudo code of the wave decomposition,Alg.1, is in the appendix. If we let k = 1 then F1 is a forest and thefirst fragment is the neighborhood of the leaves of that forest. Fork > 1, the wave decomposition mimics the iterative removal ofleaves in a forest by removing collections of k-leaves (a.k.a frag-ments) from k-dense forests (a.k.a fixed points of degree peeling)[14]. The example in Figure 2 shows a coloring corresponding tothe edge partition formed by applying the wave decomposition.The wave decomposition was heavily inspired by the method ofcomputing k-cores in [10] and much like the implementation ofthat algorithm the wave decomposition can be computed in lineartime and space with respect to the number of edges in a graph.

Figure 2: (Left): A fixed point of peel value 2whose edges are coloredbased on the wave decomposition (wave 1: blue, wave 2: green, wave3: red). (Right): The same graph sub-divided into fragments: wave 1with 1 blue fragment, wave 2 with 3 fragments (green, yellow, or-ange), and wave 3 with 1 red fragment.

3.2 DAG Cover of Waves and Fixed PointsThe concept of fragments is powerful in its own right. Considerthe ordered collection of vertex sets S = {S0, S1, ..., Sj } as defined

Page 3: Graph Waves - bigvis.imsi.athenarc.gr · in Def. 6. These sets form a vertex partition of the wave vertices. We construct a directed acyclic graph (DAG) representation of the wave

in Def. 6. These sets form a vertex partition of the wave vertices.We construct a directed acyclic graph (DAG) representation of thewave decomposition by directing thoses edges (x,y) with x ∈ Siand y ∈ Si+1 for 0 ≤ i < j − 1. We call this a DAG cover of awave because it contains (or covers) all the vertices of the waveplus the edge cuts between consecutive sets in S . We create a DAGcover of a fixed point by using its ordered wave decomposition.Similarly we could create a DAG cover for an entire graph by usingits fixed point decomposition. Moreover, we use this DAG cover inconjunction with other techniques to produce graph drawings. Inparticular, we next illustrate how we use the DAG cover to visualizestratification of fixed points and their waves.3.2.1 Visualization of Fixed Points with DAG Cover

We use three different techniques – color, force manipulation,and edge sparsification – driven by the DAG cover to provide betterinterpretability of the structure of fixed points.

The most visually striking technique is to use the DAG coverto color the vertices and edges of a graph. First we assign a colorto each vertex in the DAG cover based on the sets in S , then wecolor the edges of the original graph as a function of the color oftheir end points (e.g. hue average). Using a spectrum color schemefrom blue to red, we show a few examples of fixed points (Figure 3)for which the DAG coloring reveals more clearly a stratificationstructure than an uncolored drawing with the same layout.

Figure 3: (Left): A fixed point of peel value 22 from the Patents cita-tion network with 742 vertices and 12985 edges shown using a basicforce-directed layout in grey. (Right): The same fixed point shownwith the DAG coloring.

Another technique we use is to manipulate the forces in a force-directed layout using the information of the DAG set ordering. In astandard force-directed layout, there is a repulsive force betweeneach pair of vertices and an attraction between pairs of verticesconnected by an edge. In addition to those forces, we apply uniqueradial forces to each of the sets of vertices in S centered at increas-ingly larger radii based on their order in S . Furthermore, we removeforces along the edges which are not part of the DAG cover. Thistends to separate out the vertex sets in the layout and helps untan-gle hairball looking fixed points when the standard force-directedlayout does not. Figure 4 shows a fixed point with four sets dis-played using a standard force-directed layout and a DAG inducedforce-directed layout. Unlike the standard layout, the DAG inducedlayout helps to easily identify independent (or “near-independent”)sets of vertices to highlight a “bipartite-like” structure even withoutusing color.

For a typical, interactive force-directed layout drawing applica-tion there is a maximum number of edges that can be renderedbefore making the system unresponsive. For such cases we couldpush the system to draw larger fixed points by only rendering the

Figure 4: A fixed point of peel value 30 from the Patents citationnetworkwith 80 vertices and 1547 edges drawnwith a force-directedlayout (Left) and with forces applied to the DAG sets (Right).

edges in the DAG cover. Figure 5 shows a graph rendered beforeand after filtering by the DAG cover. Note that the full graph tookon the order of a minute to render on a machine without a dedi-cated GPU whereas the DAG cover visualization rendered nearlyinstantly and remained interactive. The fraction of edges in theDAG cover is clearly less than or equal to the number of edges inthe original graph and in practice the reduction often exceeds afactor of 2. Despite not having all the edges of the original graph,the graph drawing of just the DAG cover edges in combinationwith the DAG coloring and DAG set forces often still shows thegeneral graph stucture.

Figure 5: A fixed point of peel value 32 from the Friendster socialnetwork with 752 vertices and 21,055 edges drawn with a force-directed layout (Left) and with only the 2,686 DAG cover edges(Right).

4 Visual ExplorationUsing the iterative edge core and wave decompositions we havedesigned an interactive graph visualization tool capable of exploringlarge graph data up to 1.8 billion edges. The system offers multiplehigh level views including a spiral view and a layer view that, in ourexperience, aid in meaningful graph exploration. These structuresallow the user to make intuitive decisions as to which subgraphs(created by our decompositions) to zoom in to for deeper analysis.We define interactivity parameters Ie and Iv to be the maximumnumber of edges and vertices respectively that a user allows tobe drawn on their screen. These parameters are predominantlydependent on screen size, GPU power, and user preference forinteractivity versus displaying more data points/links at a time.

4.1 Fixed Points Spiral ViewTo offer users an overall view of the entire collection of fixed pointsappearing in a very large graph we tabulate the fixed points of

Page 4: Graph Waves - bigvis.imsi.athenarc.gr · in Def. 6. These sets form a vertex partition of the wave vertices. We construct a directed acyclic graph (DAG) representation of the wave

degree peeling found through the iterative edge core decomposition.We then create buckets containing connected fixed points. Theseconnected fixed points are grouped together into buckets accordingto the number of edges. Each ith bucket has fixed points of sizes such that loдi−1(m) < s ≤ loдi (m). This bucketing scheme isvisually represented by a spiral of boxes that we call the spiral view.It provides direct access to any group of similarly sized fixed pointsin the graph data set.4.1.1 Spiral View

The spiral view consists of boxes of two types corresponding tothe type of fixed points the bucket contains. If a bucket has manyfixed points we show a polar bar graph of frequencies of fixedpoints per unique size. If a bucket has only one fixed point largerthan the interactivity parameter, Ie , we show a 2D wave map asdescribed in subsubsection 4.2.1. Notice how in (Figure 6) the sizeof the spiral view of the larger Friendster network (1.8 billion edges)is about the same as that of the smaller Patents citation network(only 0.16 billion edges). Since the size of the spiral view scales atmost logarithmically with graph size it works well for navigatinggraphs at almost any scale.

Figure 6: The spiral view of the Patents citation network with3,774,768 vertices and 16,518,947 edges (Left) and the Friendstersocial network with 65,608,366 vertices and 1,806,067,135 edges(Right). The colors indicate the relative size of the buckets (num-ber of edges increasing from blue to red) and with stronger opacityindicating higher density.

4.1.2 Layer ViewThe layer view (Figure 7 top) shows a rectangular representa-

tion of buckets. The size of each sub-rectangle encodes the overallnumber of fixed points in the bucket. When a bucket is clicked,either the fixed points in the bucket are displayed sorted by peelvalue and size, or a 2D wave map (see subsubsection 4.2.1) is shownif there is only one very large fixed point (i.e. size greater thanIe ) in the bucket. Actually, the system only shows one graph foreach fixed point of the same degree distribution in a bucket andexplicitly indicates the multiplicity of such graphs. For example, ifa fixed point is annotated as x16 it means that there are 16 fixedpoints with the same degree distribution as the one being shown.Furthermore, the user can filter the displayed fixed points by peelvalue using a layer ribbon (Figure 7 bottom left).

4.2 Visualizing Waves and FragmentsFor fixed points larger than the interactivity parameter Ie we usethe wave decomposition of the fixed point to partition its edgesinto smaller chunks which can be viewed individually. To have an

Figure 7: (Top): The layer view of a bucket smaller than Ie . (BottomLeft) The first 13 layers of the layer ribbon. (Bottom Right): The 2Dwave map of a bucket larger than Ie which consists of a fixed pointwith 10,585 vertices and 103,101 edges. All of these are producedfrom the Patents citation network

overview of the whole fixed point we use 2D and 3D wave maps,that give a sense of the distribution of wave sizes in a fixed point.4.2.1 2D Wave map

A 2D wave map (Figure 7 bottom right) is a collection of con-centric rings. Each ring represents a wave and is made of multiplearcs representing connected components (or sub-waves) of a wave.The thickness of any arc represents the number of fragments inthe sub-wave and the arc length is proportional to the number ofedges in that sub-wave. The arcs are color coded by increasingwave number, from blue to red.4.2.2 3D Wave map

A 3D wave map (Figure 8) is a “vase” made of multiple conicfrustums, each representing a wave. The volume of each frustum isproportional to the number of edges in that wave and the area ofthe intersection of two frustums represents the number of sharedvertices between those two waves. Recall, waves form an edgepartition of the fixed point so any two waves may share commonvertices, however only the shared vertices of adjacent waves arerepresented in the 3D wave map. Users can either select a waveor individual fragments of wave larger than Ie to view the corre-sponding subgraph. The color coding of the frustums and the colorbar is the same as for the 2D wave map (see subsubsection 4.2.1).

5 Implementation and Results5.1 ProcedureThe process we use to perform the iterative edge-core decomposi-tion is taken from [1]. The implementation of the wave decomposi-tion was partly inspired by the implementation for computing thek-core of a graph in [10]. The connected components implemen-tation used was from the Boost Graph Library. The visualization

Page 5: Graph Waves - bigvis.imsi.athenarc.gr · in Def. 6. These sets form a vertex partition of the wave vertices. We construct a directed acyclic graph (DAG) representation of the wave

Figure 8: (Left): The 3D wave map of a fixed point of peel value 11with 18,208 vertices and 151,196 edges from the Patents citation net-work. (Right): The 3D wave map of a fixed point of peel value 11with 126,835 vertices and 8,339,470 edges from the Friendster socialnetwork.

was developed using D3, three.js, chart.js, plotly.js. To produce ourresults we performed the following steps:(1) Input: a graph G = (V , E) represented using an edge list.(2) Pre-processing: each edge e ∈ E is reversed (i.e. e = (u,v)

becomes e = (v,u)) and appended to the original edge list. Themodified edge list is then sorted in increasing order removingduplicates. Additionally we remove self loops, i.e edges of thetype e = (u,u).

(3) Iterative edge-core decomposition: we use the parallelizedimplementation of the algorithm in [1] to compute the peellayers of the graph. This assigns a layer value to each edge inthe graph. The metadata, like that shown in Table 1 is writtento a separate output log.

(4) Connected Components: the connected components of theentire graph, as well as of each layer are computed using theBoost Graph Library. The metadata of this calculation (time,number of components) is logged as well as the metadata foreach component (size of component).

(5) Waves/Fragments:Using the metadata from the previous stepswe compute the wave decomposition of fixed points with morethan Ie = 216 edges.We also compute the connected componentsfor each wave. The metadata of this calculation (time, numberof waves) is logged as well as the metadata for each wave (size,number of components, fragment distribution). Results for thePatents citation network waves are shown in Table 2 and for theFriendster social network in Table 3We chose a wide range of of graphs, varying in both size and

domain (e.g. social, geographic, hyperlink, and co-occurrence net-works) [18]. We performed most of our experiments on a singlecomputer equipped with an Intel® Core™ i7-8750H CPU clocked at2.20GHz with 32GB of RAM. The com-friendster data set howeverwas processed on a server with an Intel® Xeon® CPU E5-2620 v2clocked at 2.10GHz with 126GB of RAM. Results of the iterativeedge-core decomposition are reported in Table 1, which includesthe graph that is decomposed, its vertex and edge count, its highestdegree, the number of layers and connected component in eachgraph, the highest peel value and number of waves from the decom-position, and the algorithm compute time and I/O time averaged

over 5 runs. Notice that the computation times of the iterative edgecore decomposition for graphs with at least 1 million edges grows as|E |

√|V |. Due to space limitations we include the results of the wave

decomposition on the Patents citation and Friendster networks inTable 2 and Table 3 of the appendix.

For our experiments (see Table 2 and Table 3) we computed thewave decomposition on all the fixed points in a layer simultaneouslyas this was actually quicker than filtering by connected components.

Although not included in Table 2 and 3, we note that layer 1 ofthe Patents citation network has 534,601 trees totaling 2,370,043vertices and 1,835,442 edges out of a total 3,774,768 vertices and16,518,947. What is surprising is that about 63% of vertices in thisnetwork are part of a forest. In the Friendster network, layer 1 has11,222,669 trees totaling 43,053,668 vertices and 31,830,999 edges.So similarly about 65% of vertices in Friendster are part of trees.

5.2 Handling Large FragmentsThe sheer size of very large networks is usually dealt with someform of iterative vertex or edge decomposition. The level of granu-larity of the decomposition may be driven by storage or computa-tional resources coupled with graph structure. The iterative edgecore decomposition first introduced in [3] has been used in [1] as auseful graph abstraction to make sense of very large graphs. In thiswork, we go two steps beyond and decompose “large” fixed pointsinto waves and if they are still “large” we decompose them furtherinto fragments (see subsection 4.2). It is worth mentioning thatthe overall approach is basically the same, i.e., iterative removal ofvertices that satisfy certain degree conditions. Fragments can beviewed as the most atomic graph types obtained by plain vanillaiterative degree removal methods. However, the main limitationis that these fragments can be “large” too and require specializedmethods beyond the degree peeling based approaches.

In these situations we propose a variation of the maximal match-ing edge contraction approach first suggested in [4]. Namely, weiteratively select a random maximal matching {e1, e2, ..., ej } andcontract its edges until the set of vertices remaining have cardinalityequal to the interactivity parameter, Iv . This contracted graph canthen be visualized as a metagraph, where each vertex representsthe subgraph of contracted edges. The combination of contractionsand matchings is an area of study that deserves further research.

6 Summary and Future WorkOur approach presents a high level computational overview of“large graphs” as sequences of “waves”. This is fundamentally dif-ferent to previous approaches in the sense that it is amenable to avisual representation (i.e. wave maps ) that is closely tied to “intu-itive” hierarchical navigation and exploration. Our next step willbe to perform user experiments to evaluate the usability of theproposed system. To our knowledge this is the first type of systemthat offers a high level representation of graphs at the billion edgescale that can be visually explored at different levels of connectivitywithin a global context.

We believe that graph partitioning processes will become usefultools to addressmassive graph computations in a divide and conquermanner. This line of thinking opens up research to investigatewhichtypes of massive graph problems can be solved by composing theirlocal fixed point and wave solutions.

Page 6: Graph Waves - bigvis.imsi.athenarc.gr · in Def. 6. These sets form a vertex partition of the wave vertices. We construct a directed acyclic graph (DAG) representation of the wave

Table 1: Results of performing the iterative edge-core decomposition across a number of different graphs varying in size and domain sortedby number of edges. (CC: number of connected components MP: max peel value, L: number of layers, MW: max number of waves)

Graph Name |V | |E | Max Deg CC MP L MW Time (s) I/O Time (s)Gnutella P2P network (p2p-gnutella31) 62586 147892 95 12 6 5 9 0.05 0.05Astro Physics (ca-astroph) 18771 198050 504 289 56 47 10 0.14 0.10Amazon co-purchasing (amazon0601) 403394 2443408 2752 7 10 10 22 1.00 0.98California road network (roadNet-CA) 1965206 2766607 12 2638 3 3 63 1.34 1.33Berkeley-Stanford web (web-BerkStan) 685230 6649470 84230 676 201 88 314 5.37 3.50Patents citation network (cit-Patents) 3774768 16518947 793 3627 64 41 47 18.37 13.33Pokec (soc-pokec) 1632803 22301964 14854 1 47 29 45 15.62 14.86LiveJournal network (LiveJournal) 3997962 34681189 14815 1 360 119 49 54.04 35.73Orkut (com-orkut) 3072441 117185083 33313 1 253 91 88 81.62 56.90Friendster (com-friendster) 65608366 1806067135 5214 1 304 72 213 3085.42 2351.32

We close by mentioning that being able to efficiently computethe proposed wave decomposition in semi-external memory set-tings or in a streaming fashion will enhance in a major way theapplicability of our approach not only to “large graph visualization”but to massive graph computation in general.

A video demonstrating our current prototype can be accessed at:https://dl.dropboxusercontent.com/s/a4kpwv5609op73w/graphwaves_bigvis.avi. Other formats are available at: https://www.dropbox.com/sh/bowfhdrr82ti18u/AAAAFwUZwkq5pQD63Kqyh-Ela?dl=0.

AcknowledgmentsWe thank Qi Dong for helping to develop graph drawing software,Yi-Hsiang Lo for interface design, and Haodong Zheng for interfacedesign and system integration. This work was partially supportedby NSF grants IIS-1563816 and IIS-1563971.

References[1] James Abello, Fred Hohman, Varun Bezzam, and Duen Horng Chau. 2019. Atlas:

Local Graph Exploration in a Global Context. In Proceedings of the InternationalConference on Intelligent User Interfaces. ACM.

[2] James Abello and Shankar Krishnan. 2000. Navigating graph surfaces. In Ap-proximation and Complexity in Numerical Optimization. Springer, 1–16.

[3] James Abello and François Queyroi. 2013. Fixed points of graph peeling. InProceedings of the 2013 IEEE/ACM International Conference on Advances in SocialNetworks Analysis and Mining. ACM, 256–263.

[4] James Abello, Frank Van Ham, and Neeraj Krishnan. 2006. Ask-graphview: alarge scale graph visualization system. IEEE Transactions on Visualization andComputer Graphics 12, 5 (2006), 669–676.

[5] J Ignacio Alvarez-Hamelin, Luca Dall’Asta, Alain Barrat, and Alessandro Vespig-nani. 2006. Large scale networks fingerprinting and visualization using the k-coredecomposition. In Advances in Neural Information Processing Systems. 41–50.

[6] Daniel Archambault, Tamara Munzner, and David Auber. 2007. Topolayout:multilevel graph layout by topological features. IEEE Transactions on Visualizationand Computer Graphics 13, 2 (2007).

[7] Alessio Arleo, Oh-Hyun Kwon, and Kwan-Liu Ma. 2017. GraphRay: Distributedpathfinder network scaling. In 2017 IEEE 7th Symposium on Large Data Analysisand Visualization (LDAV). IEEE, 74–83.

[8] Benjamin Bach, Nathalie Henry Riche, Christophe Hurter, Kim Marriott, andTim Dwyer. 2017. Towards unambiguous edge bundling: Investigating conflu-ent drawings for network visualization. IEEE Transactions on Visualization &Computer Graphics (2017), 1–1.

[9] Vladimir Batagelj, Franz J Brandenburg, Walter Didimo, Giuseppe Liotta, PietroPalladino, and Maurizio Patrignani. 2011. Visual analysis of large graphs using(x, y)-clustering and hybrid visualizations. IEEE transactions on visualization andcomputer graphics 17, 11 (2011), 1587–1598.

[10] Vladimir Batagelj and Matjaz Zaversnik. 2003. An O(m) algorithm for coresdecomposition of networks. arXiv preprint cs/0310049 (2003).

[11] Emilio Di Giacomo, Walter Didimo, Giuseppe Liotta, Fabrizio Montecchiani,and Ioannis G Tollis. 2014. Techniques for edge stratification of complex graphdrawings. Journal of Visual Languages & Computing 25, 4 (2014), 533–543.

[12] Tim Dwyer, Nathalie Henry Riche, Kim Marriott, and Christopher Mears. 2013.Edge compression techniques for visualization of dense directed graphs. IEEE

transactions on visualization and computer graphics 19, 12 (2013), 2596–2605.[13] Dezhi Fang, Matthew Keezer, Jacob Williams, Kshitij Kulkarni, Robert Pienta, and

Duen Horng Chau. 2017. Carina: interactive million-node graph visualizationusing web browser technologies. In International Conference on World Wide WebCompanion. International World Wide Web Conferences Steering Committee,775–776.

[14] Gianni Franceschini, Fabrizio Luccio, and Linda Pagli. 2006. Dense trees: a newlook at degenerate graphs. Journal of Discrete Algorithms 4, 3 (2006), 455–474.

[15] Christos Giatsidis, Fragkiskos D Malliaros, Nikolaos Tziortziotis, CharanpalDhanjal, Emmanouil Kiagias, Dimitrios M Thilikos, and Michalis Vazirgiannis.2016. A k-core Decomposition Framework for Graph Clustering. arXiv preprintarXiv:1607.02096 (2016).

[16] Michelle Girvan and Mark EJ Newman. 2002. Community structure in social andbiological networks. Proceedings of the national academy of sciences 99, 12 (2002),7821–7826.

[17] Humayun Kabir and Kamesh Madduri. 2017. Parallel k-core decomposition onmulticore platforms. In IEEE International Parallel and Distributed ProcessingSymposium Workshops. IEEE, 1482–1491.

[18] Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large NetworkDataset Collection. http://snap.stanford.edu/data.

[19] Zhiyuan Lin, Nan Cao, Hanghang Tong, Fei Wang, and U Kang. 2013. Interactivemulti-resolution exploration of million node graphs. In IEEE Conference on VisualAnalytics Science and Technology, Poster.

[20] Peng Mi, Maoyuan Sun, Moeti Masiane, Yong Cao, and Chris North. 2016. Inter-active graph layout of a million nodes. In Informatics, Vol. 3. MultidisciplinaryDigital Publishing Institute, 23.

[21] Quan Hoang Nguyen, Seok-Hee Hong, Peter Eades, and Amyra Meidiana. 2017.Proxy graph: visual quality metrics of big graph sampling. IEEE transactions onvisualization and computer graphics 23, 6 (2017), 1600–1611.

[22] Robert Pienta, Fred Hohman, Alex Endert, Acar Tamersoy, Kevin Roundy, ChrisGates, Shamkant Navathe, and Duen Horng Chau. 2018. VIGOR: interactivevisual exploration of graph query results. IEEE Transactions on Visualization andComputer Graphics 24, 1 (2018), 215–225.

[23] Loïc Royer, Matthias Reimann, Bill Andreopoulos, and Michael Schroeder. 2008.Unraveling protein networks with power graph analysis. PLoS computationalbiology 4, 7 (2008), e1000108.

[24] Ahmet Erdem Sariyuce, C Seshadhri, Ali Pinar, and Umit V Catalyurek. 2015.Finding the hierarchy of dense subgraphs using nucleus decompositions. InProceedings of the 24th International Conference on World Wide Web. InternationalWorld Wide Web Conferences Steering Committee, 927–937.

[25] Stephen B Seidman. 1983. Network structure and minimum degree. Socialnetworks 5, 3 (1983), 269–287.

[26] Nikolaj Tatti. 2019. Density-friendly graph decomposition. ACM Transactions onKnowledge Discovery from Data (TKDD) 13, 5 (2019), 54.

[27] Wouter van Heeswijk, George HL Fletcher, and Mykola Pechenizkiy. 2016. Onstructure preserving sampling and approximate partitioning of graphs. In Proceed-ings of the 31st Annual ACM Symposium on Applied Computing. ACM, 875–882.

[28] Yang Zhang, Yusu Wang, and Srinivasan Parthasarathy. 2017. Visualizing at-tributed graphs via terrain metaphor. In Proceedings of the 23rd ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining. ACM, 1325–1334.

[29] Hong Zhou, Panpan Xu, Xiaoru Yuan, and Huamin Qu. 2013. Edge bundling ininformation visualization. Tsinghua Science and Technology 18, 2 (2013), 145–156.

[30] Michael Zinsmaier, Ulrik Brandes, Oliver Deussen, and Hendrik Strobelt. 2012.Interactive level-of-detail rendering of large graphs. IEEE Transactions on Visual-ization and Computer Graphics 18, 12 (2012), 2486–2495.

Page 7: Graph Waves - bigvis.imsi.athenarc.gr · in Def. 6. These sets form a vertex partition of the wave vertices. We construct a directed acyclic graph (DAG) representation of the wave

Appendix

Alg. 1: Wave Decomposition (O(m))Input: Fk = (V , E), a fixed point of peel value k.Output:M = {W1,W2,W3, ...,Wm } where eachWi ⊂ M are

waves.Output: S = {S0, S1, S2, ..., Sn } where each Sj ⊂ V .

1 function waves(Fk ):2 M ← ∅

3 S ← ∅

4 i ← 15 j ← 06 while E , ∅ do7 Wi ← ∅

8 Sj ← {v ∈ V : deg(v) = k}9 while Sj , ∅ do

10 S ← S ∪ {Sj }

11 E ← E \ frag(Sj )

12 Wi ←Wi ∪ frag(Sj )

13 Sj+1 ← {v ∈ ∂Sj | deg(v) < k}

14 j ← j + 115 end16 M ← M ∪ {Wi }

17 i ← i + 118 end19 end

Table 2: Results of performing the wave decomposition on layers2 through 13 of the Patents citation network. These are only thelayers that contain a fixed point of size at least 216 edges. (L: layer,Fk : number of fixed points, W: number of waves, F: number of frag-ments). The highlighted row corresponds to the layer containingthe largest fragment of 1,356,327 edges.

L |V | |E | Fk W F Time (s)2 773841 953738 7682 7 55 2.7793 1663386 4097177 37 33 411 18.5874 198984 579406 787 15 267 1.6525 617644 2457675 42 17 610 8.1876 438875 2210942 100 47 996 8.9357 246576 1406616 66 19 619 4.028 158620 1051389 85 44 765 3.2439 80554 590337 61 25 511 1.4610 56961 469522 73 45 579 1.21111 20602 168159 80 14 188 0.27212 13507 126909 73 12 236 0.19213 15043 154245 38 13 284 0.249

Table 3: Results of performing thewave decomposition on select lay-ers of the Friendster social network. These are only the layers thatcontain a fixed point of size at least 222 edges. (L: layer, Fk : num-ber of fixed points, W: number of waves, F: number of fragments).The highlighted row corresponds to the layer containing the largestfragment of 21,128,481 edges.

L |V | |E | Fk W F Time (s)2 20031131 26084775 39188 8 99 168.93 20912756 46640174 564 27 564 887.14 1509262 4760942 4816 11 317 27.05 14030072 54607126 447 10 536 314.26 6352083 32606979 793 25 1177 233.08 2882071 19767228 193 14 903 116.111 13635761 131376950 5 75 2532 1422.015 5011964 68883449 9 46 1874 496.017 738488 11279923 16 18 970 50.821 432014 7931887 5 106 978 40.925 10460847 229286219 1 17 1127 1511.529 244598 6430697 2 15 722 23.438 3144226 107272464 1 213 3279 1828.440 216081 8035443 2 50 889 31.453 6878498 322297063 1 10 1260 2543.165 155562 9282393 1 7 415 31.671 126835 8339470 1 56 768 30.989 4570064 367562291 1 49 737 5647.4120 908835 97669888 1 2 367 418.0140 764401 94970270 1 7 443 410.1169 228332 32277749 1 29 357 115.6175 197091 32971011 1 40 931 121.1234 157908 32985831 1 128 1188 123.8304 24528 6301889 1 7 203 19.2