2SCENT: An Efﬁcient Algorithm for Enumerating … An Efﬁcient Algorithm for Enumerating All Simple Temporal Cycles Rohit Kumar Universite Libre de Bruxelles´ Brussels, Belgium

2SCENT: An Efficient Algorithm for Enumerating AllSimple Temporal Cycles

Rohit KumarUniversite Libre de Bruxelles

Brussels, BelgiumUniversitat Politecnica de Catalunya

(BarcelonaTech)Barcelona, Spain

[email protected]

Toon CaldersUniversiteit Antwerpen

Antwerp, BelgiumUniversite Libre de Bruxelles

Brussels, Belgium

[email protected]

ABSTRACTIn interaction networks nodes may interact continuously andrepeatedly. Not only which nodes interact is important,but also the order in which interactions take place and thepatterns they form. These patterns cannot be captured bysolely inspecting the static network of who interacted withwhom and how frequently, but also the temporal nature ofthe network needs to be taken into account. In this paper wefocus on one such fundamental interaction pattern, namelya temporal cycle. Temporal cycles have many applicationsand appear naturally in communication networks. In finan-cial networks, on the other hand, the presence of a temporalcycle could be indicative for certain types of fraud, and inbiological networks, feedback loops are a prime example ofthis pattern type. We present 2SCENT, an efficient algo-rithms to find all temporal cycles in a directed interactionnetwork. 2SCENT consist of a non-trivial temporal exten-sion of a seminal algorithm for finding cycles in static graphs,preceded by an efficient candidate root filtering techniquewhich can be based on Bloom filters to reduce the memoryfootprint. We tested 2SCENT on six real-world data sets,showing that it is up to 300 times faster than the only ex-isting competitor and scales up to networks with millions ofnodes and hundreds of millions of interactions. Results ofa qualitative experiment indicate that different interactionnetworks may have vastly different distributions of temporalcycles, and hence temporal cycles are able to characterize animportant aspect of the dynamic behavior in the networks.

PVLDB Reference Format:Rohit Kumar and Toon Calders. 2SCENT: An Efficient Al-gorithm to Enumerate All Simple Temporal Cycles. PVLDB,11(11): 1441-1453, 2018.DOI: https://doi.org/10.14778/3236187.3236197

The extended version of this paper is availableas [13]. Implementation of all algorithms can befound at https://github.com/rohit13k/CycleDetection.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee. Articles from this volume were invited to presenttheir results at The 44th International Conference on Very Large Data Bases,August 2018, Rio de Janeiro, Brazil.Proceedings of the VLDB Endowment, Vol. 11, No. 11Copyright 2018 VLDB Endowment 2150-8097/18/07.DOI: https://doi.org/10.14778/3236187.3236197

1. INTRODUCTIONAnalyzing the temporal dynamics of a network is becom-

ing very popular. In 2011, Pan et al. [17] studied temporalpaths in empirical networks of human communication andair transport, and came to the conclusion that the tempo-ral dynamics of networks are poorly captured by their staticstructures: “Nodes that appear close from the static networkview may be connected via slow paths or not at all.” Thisobservation motivated research into temporal patterns indynamic graphs as an addition to the abundance of worksthat characterize networks based on their static structuresand motifs only. Recently, Paranjape et al. [18] introducedan algorithm for counting the number of occurrences of agiven temporal motif in a temporal network. In their paperthe authors show that datasets from different domains havesignificantly different motif counts, thus observing that tem-poral motifs are able to capture differences in the dynamicbehavior of temporal networks. Inspired by this line of work,our paper extends it to temporal cycles of any length.

Figure 1b illustrates our notion of a temporal cycle in thetemporal graph given in Figure 1a: a sequence of interac-tions, increasing in time, that starts and ends in the samenode. Cycles appear naturally in many problem settings:(1) In stock trading, cyclic patterns could indicate attemptsto artificially create high trading volumes; (2) In financialtransaction data, specific types of fraud lead to cycles in theinteractions [6], and recently, Giscard et al. [5] used simplecycles to evaluate balance in social networks. (3) In biolog-ical and neural networks [4], temporal cycles could indicatefeedback loops. Notice that in these applications it is es-sential that the temporal order is respected in the cycles.Consider for instance the last example; feedback loops in aneural network. In order to identify all possible feedbackloops in a neural network, a logical first step would be toidentify all cycles of interactions between neurons. To havea proper feedback loop it is important that the order of theinteractions is consistent with the order in the cycle.

We consider temporal cycles as a essential basic patterntype for temporal networks, and in this paper we study theproblem of identifying them all in huge databases of inter-actions. To avoid spurious cycles stretched out over time webound the window in which a cycle has to occur to ω. Fig-ure 1c contains some examples of cycles in the static graphwhich are not considered as they either (i) extend over a toolong time window (we used ω = 10), (ii) the interactions donot respect temporal order, or (iii) the cycle is not simple

1441

b

5,8

��

c

11��

a

1,7

??

d13

VV

7,10oo

9��

8

��f

12

^^

e

10

OO

(a) Temporal network with time stamped edges

(i) b5

��a

1??

d7

oo

(ii) b5

��a

1>>

d10

oo

(iii) b8

��a

1==

d10

oo

(iv) b8

��a

7==

d10

oo

(v) b8

��a

7>>

d

9��f

12

``

(vi) c11

}}d

8 ""e

10

OO

(b) Instances of Simple Temporal cycles for ω = 10

(i) b5

��a

1??

d

9��f

12

``

(ii) b8

��a

7>>

d7

oo

(iii) b5

��

c

11��d

13

SS

8

��e

10

OO

(c) Instances of patterns which are not Simple Temporal cyclesfor ω = 10

Figure 1: Example temporal network

in the sense that there are repeated vertices. As we willdetail in the related work section, however, the vast litera-ture on finding cycles in static graphs does not easily extendto temporal networks. Therefore we propose a new efficienttwo-phase algorithm, (2SCENT), for enumerating all simpletemporal cycles of bounded timespan.

In the first phase, called the Source Detection Phase, wegather candidate root nodes for cycles. The root node of atemporal cycle is the unique node in which the cycle startsand ends. For instance, for the simple cycle shown in Fig-ure 1b(iv), the root node is a. Surprisingly, root nodes ofcycles can be found very efficiently in one pass over the data.As a side-result we also get for each cycle its start and endtime and a superset of the nodes that appear in the cycle.

In the second phase, for every quadruple of root node,start time, end time, and set of candidate nodes, we run aconstrained Depth First Search (cDFS) algorithm. This al-gorithm is inspired by the seminal algorithm of Johnson [9].cDFS performs a depth-first search with backtracking, start-ing from the root node. In order to avoid unnecessarymultiple explorations of the same parts of the interactiongraph, for every visited node a so-called closing time is main-tained that allows to prune previously unsuccessful depth-first traversal paths. In this way we can output all simplecycles rooted at the given node in time O(c(n + m)) wherec is the number of cycles and n and m are respectively thenumber of nodes in the candidate set of the root node andthe number of interactions among these nodes in the given

time interval. Also this phase sometimes suffers from thepeculiarities of interaction networks. To handle the specialcase of networks with multiple, highly repetitive activitiesresulting in many similar cycles only differing in a few timestamps, we introduce so-called path bundles. A path bundlemaintains multiple temporal paths between the same nodes.The cycle finding algorithm is adapted to deal with thesepath bundles directly, instead of with each of the paths inthe bundle individually. In this way we can reduce the num-ber of depth-first traversal paths exponentially.

To validate the algorithm, we ran it on 6 real world datasets. The Experiments consistently show the performanceimprovements of the extensions and an improvement of twoorders of magnitude over our only competitor, the algorithmof Kumar and Calders [11]. As a qualitative experiment, westudy if temporal cycles can be used to quantify the dy-namic behavior of interaction networks. More specifically,we monitor the distribution of frequency and size of simpletemporal cycles in different kinds of interaction networks.We find that cycles of higher length are more frequent indata sets such as twitter as compared to the SMS or Face-book data sets. This observation hints that different kinds ofinformation exchange patterns occur in open social networkswhere people can interact with anyone without a friendshiplink as compared to closed social network of friends.

2. RELATED WORKSimple Cycles in a Static Graph. The classical prob-

lem of enumerating all simple cycles in a graph has beenstudied since the early 70s [20, 16, 28, 19, 31, 25, 9, 26].One algorithm that stands out both in elegance and effi-ciency is that of Johnson [9]. Johnson’s algorithm exploresa directed graph depth-first but at the same time uses acombination of blocking and unblocking of vertices to avoidfruitless traversal of paths which will not form a cycle forthe currently traversed path. For instance, if during a depth-first exploration to find cycles rooted at a, it is found thatthere is no path from b to a, b can be blocked such that inother depth-first explorations the paths originating from bare not explored in vain. When backtracking, however, somenodes can become unblocked again. Johnson’s algorithm [9]is based upon postponing the unblocking of a node as muchas possible. Using an ingenious system of cascading unblock-ing operations, Johnson’s algorithm is able to guarantee aworst case complexity of O((n + m)(c + 1)) for enumerat-ing all cycles in a directed graph, where n, m, and c denoterespectively the number of nodes, the number of edges, andthe number of simple cycles in the graph. Up to the currentdate, Johnson’s algorithm is one of the most efficient algo-rithms for directed graphs. For undirected graphs, recentlyFerreira et.al [2] presented a more optimal algorithm.

These algorithms work very well for static graphs but can-not be used directly on interaction networks. First of all, cy-cles in interaction graphs need to respect the temporal orderof the interactions, which leads to more complexity.In thispaper we provide an extension of Johnson’s algorithm for aninteraction network. Furthermore, in static networks edgesare never repeated while in interaction networks repetitionsof interactions are very common. Not taking this aspect ofinteraction networks into account leads to highly inefficientsolutions, a problem we handle by using path bundles.

Patterns in temporal graphs. Temporal graphs, alsoknow as interaction [14, 23] or temporal networks [7], are

1442

being studied using multiple approaches. One approach isto extend global properties from static graph theory suchas page rank [8, 22], shortest path [17, 24, 29], or centralitymeasures [1, 21] to temporal networks and to introduce ef-ficient algorithms to compute them. Other works focus onbetter understanding the nature and evolution of such tem-poral graphs. Recent studies use temporal motifs [10, 18]and their frequency distributions to analyze and character-ize temporal graphs. The algorithms in these two papers,however, cannot be used directly for our cycle detection al-gorithm. For the first paper by Kovanen et al. [10], motifsare considered at a higher level of abstraction. Whereas inour setting all sequences of interactions that form temporalcycles are enumerated, Kovanen et al. [10] would consider ageneric temporal cycle of length k as a pattern and count thenumber of embeddings of this generic pattern. The secondpaper by Paranjape et al. [18] on the other hand, assumesthe same setting as we do. Their work, however, concen-trates on efficiently counting the frequency of a specific givenpattern. In order to apply their algorithm for finding cycles,we would have to run it once for each cycle length. Whereasthis is certainly possible in theory, it has a number of dis-advantages, such as not knowing for which lengths we needto run the algorithm on the one hand, and the fact thatthe algorithm of Paranjape et al. [18] requires to first findall embeddings of the pattern in the static graph, withoutany temporal order or window being considered. A head-to-head comparison with our algorithm, however, would notbe fair; the authors are well-aware of this deficiency and forseveral special cases, such as triangles Paranjape et al. pro-pose efficient adaptations avoiding this costly first step. Forcycles, however, no such optimization is described and thereis no straightforward solution. The closest to our work is thework by Kumar and Calders [11], who study the same prob-lem, and propose the idea of using simple temporal cyclesand their frequency distribution to characterize the infor-mation flow in temporal networks. Kumar and Calders [11]introduce a naive algorithm which enumerates all possibletemporal paths in a window to find cycles. For interac-tion networks with large number of temporal paths this al-gorithm does not scale well. In the empirical evaluation2SCENT outperforms the algorithm of [11] by a factor of300 in terms of time. This gain in performance is becauseit is much more efficient to find roots of cycles than to findthe cycles themselves, and once the roots are known, manytemporal paths no longer have to be considered.

3. PRELIMINARIESLet V be a given set of nodes. An interaction is defined

as a triplet (u, v, t), where u, v ∈ V , and t is a strictly pos-itive natural number representing the time the interactiontook place. Interactions are directed and could denote, forinstance, the sending of a message in a communication net-work. Please note that multiple interactions can appear atthe same time. A temporal network G(V, E) is a set of nodesV , together with a set E of interactions over V . n = |V |denotes the number of nodes in the temporal graph, andm = |E| the number of interactions.

Definition 1. A temporal path between two nodes u, v ∈V is a sequence of interactions p = 〈(u, n1, t1), (n1, n2, t2),.., (nk−1, v, tk)〉 such that t1 < t2 < .. < tk and all interac-tions in p appear in E . Often we will use the more compact

Algorithm 1 GenerateSeeds

Require: Threshold ω, interactions EEnsure: All nodes s, time stamps ts and te, and a set C

such that there exists a loop from s to s using only nodesin C starting at ts and ending at te.

1: function GenerateSeeds(ω, E)2: for (a, b, t) ∈ E , ordered ascending w.r.t. t do3: if S(b) does not exist then4: S(b)← {}5: S(b)← S(b) ∪ {(a, t)}6: if S(a) exists then7: S(a)← S(a) \ {(x, tx) ∈ S(a) | tx ≤ t− ω}8: S(b)← S(b) ∪ S(a)9: for (b, tb) ∈ S(b) do

10: C ← {c | (c, tc) ∈ S(a), tc > tb} ∪ {b}11: Output (b, [tb, t], C)12: S(b)← S(b) \ {(b, tb)}13: if time to prune then14: for all summaries S(x) do15: S(x)← S(x)\{(y, ty) ∈ S(x) | ty ≤ t−ω}

notation ut1→ n1

t2→ n2 . . .tk→ v. dur(p) := tk − t1 denotes

the duration of the path, len(p) := k its length.A temporal path p is called a simple temporal path if no

node appears more than once in p. p is valid for a giventime window ω if dur(p) ≤ ω.

For example, in the temporal graph of Figure 1a, b5→

d8→ e

10→ c11→ d is a temporal path, but it is not a simple

temporal path as node d appears more than once in the path.

The duration of the path is 11− 5 = 6. b5→ d

8→ e10→ c is a

simple temporal path with duration 5.

Definition 2. A temporal cycle with root node u is a tem-poral path from u to itself. The cycle is called simple if eachinternal node in the cycle occurs exactly once. More specif-ically, a simple temporal cycle c with root node u consist

of a simple temporal path ut1→ n1 . . .

tk−1→ v followed by aninteraction (v, u, tk) with tk > tk−1. We consider a simpletemporal cycle to be valid for time window ω if the durationof the cycle is less than or equal to ω.

For example, the cycle in Figure 1c(i) is a simple temporalcycle but is not valid for ω = 10. Please note there could bemultiple cycles from the same root node of different lengthand duration. For example, Figure 1b (i)-(iv) represents4 different temporal cycles with the same root node a ofthe same length but with different durations. The cycles inFigure 1b (ii) and (iii) have the same duration and lengthbut still represent different cycles.

Definition 3. Simple Cycle Enumeration (SCE)Given a temporal network G(V, E) and a time window ω,enumerate all simple temporal cycles C with dur(C) ≤ ω.

In Figure 1a, the solution of SCE with ω = 10 are the

cycles of Figure 1b plus b5→ d

13→ b and b8→ d

13→ b.

4. SOURCE DETECTION PHASEIn this and the next two sections, we will address the

problem of efficiently finding all simple temporal cycles in

1443

a given temporal network. As temporal networks are gen-erally very large graphs, performing a DFS (Depth FirstSearch) or BFS (Breadth First Search) scan for every nodein the network would be very time consuming. Hence, wepresent a two-phase approach to efficiently find all simplecycles. In the first phase, we pass once over the interactionsof the given temporal network to identify the root nodes andthe start and end times of all cycles. We also get a set ofcandidate nodes which form a superset of the nodes presentin the cycle. We call this phase the Source Detection phase.The details of this phase are given in this section. We alsopresent a memory efficient variation of the source detectionphase using Bloom Filters, which requires two passes overthe data but is more memory and time efficient for partic-ular cases in which there are many temporal paths. In thesecond phase, which we will discuss in Section 5, we use theidentified root nodes from the first phase to find temporalcycles using a constrained DFS. The details of this phase aregiven in Section 5. Finally, in Section 6 we present an op-timization of our two-phase algorithm for special cases withmany repeated interactions.

4.1 Reverse Reachability SummaryWe find the source node and candidate sets by maintaining

a so-called reverse-reachability summary S(u) for all u in V .The reverse reachability summary of u at time t, denotedSt(u), is defined as the set of pairs (x, tx) such that there isa temporal path p from x to u starting at time tx and withtx ≥ t − ω within the set of interactions up to time stampt. Maintaining the summary is straightforward; whenever

an interaction at→ b is processed we add (a, t) to S(b) as it

captures the path of length 1 due to this new interaction.Also, every path to a is now extended to b, hence we add allpairs in S(a) to S(b). We remove paths which are older thanω; that is, pairs (x, tx) such that tx < t − ω. We call thisold path pruning. Whenever there is a path from b to b after

processing the new interaction at→ b; that is, there is a pair

(b, tb) ∈ S(a), we know there is a cycle with b as source node,that starts at tb and ends at t. Furthermore, every node x

in this cycle which was completed by at→ b is connected to

a and hence there must be a pair (x, tx) ∈ S(a). In this waywe can also construct a candidate set {x | ∃(x, tx) ∈ S(a) :tb < tx < t}.

Example 1. Consider the interaction in the example Fig-ure 1a. Before processing the interaction (d, a, 8), the sum-maries of nodes a and d are S(a) = {} and S(d) = {(a, 1),(b, 5)} respectively. While processing (d, a, 8) the summaryof a is updated to S(a) = {(b, 5), (d, 8)} and as there is(a, 1) in the summary of d it generates a seed candidate as(a, [1, 8], {b, d}). This seed candidate actually correspondsto the simple cycle in Figure 1b(i).

The details of the algorithm are given in Algorithm 1.One detail that still needs clarification is the inactive nodepruning (steps 13-15). In this step, at regular time instancesall pairs (x, tx) such that tx ≤ t − ω is removed from thememory. In this way we ensure that memory does not getfilled with summaries of nodes which are no longer active.In all our experiments we noticed that the overhead of thisstep was negligible because when executed regularly, onlynodes which were active within the past window of size ωwill have a summary, but the memory saving were huge.

Theorem 1. Let m = |E|, n = |V |, W be the numberof interactions in a window of size ω, and c the numberof valid temporal cycles. Algorithm 1 generates one tuple(a, ts, te, C) for each cycle c that starts and ends in a withrespectively an interaction at time ts and one at time te.All nodes of the cycle are in C. Furthermore, for each tuple(a, ts, te, C) output by the algorithm, a corresponding cycleexists. The time complexity for handling one interaction isbounded by O((m + c)W ), and the memory complexity isO(min(n,W )W ).

4.2 Improvements using Bloom FiltersDespite the regular pruning, the summaries may still grow

very large for large window lengths or large networks, caus-ing out-of-memory problems. This problem occurs for in-stance when there are many long temporal paths within thewindow of length ω. Therefore, for such extreme cases, wefurther refine the source detection phase by using a Bloomfilter [3] as summary. A Bloom filter is a compact datastructure for representing sets which allows for membershipqueries. It consists of an array B of q bits and uses k inde-pendent hash functions h1, . . . , hk that hash the elementsto be stored in the set uniformly over the set of valid indices1 . . . q for B. Initially all bits in the bitmap index are 0.Whenever a new element a arrives, all bits h1(a), . . . , hk(a)are set to 1. Whenever we need to know if an element xis in the set represented by B, we test if all entries h1(x),. . . , hk(x) are 1. If x was added to the Bloom filter at somepoint, for sure these bits must all be 1. Notice that theremay be false positives if the combined bits set to 1 by theother elements in the set cover all the bits for x. False neg-atives, however, are impossible. For the exact details on theBloom filter and how to select optimal values for q and k infunction of the number of elements to store in the set andthe false positive probability, we refer to [3]. If we have twoBloom filters representing sets S1 and S2, we can constructthe Bloom filter for their union by taking the bitwise OR ofthe two Bloom filters. Taking the intersection of two Bloomfilters can be done by taking the bitwise AND. In contrastto the union, however, the Bloom filter for the intersectioncannot be constructed exactly with this construction. Wewill denote the bitwise AND (respectively OR) of two Bloomfilters B1 and B2 with B1 ∩B2 (respectively B1 ∪B2).S(a) will hence be replaced by a Bloom filter B(a), that

represents the set of all nodes that can reach a. Whenever

an interaction at→ b is processed, we test if b is a hit for

the Bloom filter of a. If so, b will be listed as a potentialcycle source node. Then we union the Bloom filter of B(a)with that of B(b) to get the new Bloom filter for b. Usingthe Bloom filter approach we guarantee that all summarieshave equal (restricted) length and cannot grow unbound-edly. Notice, however, that this schema has a number ofdisadvantages as well. We list them in increasing order ofseverity: (1) There may be false positives when we test forb ∈ S(a). This will incorrectly lead to the conclusion thatthere is a cycle rooted at b. These spurious root nodes,however, will be eliminated in the second phase of the al-gorithm that will be discussed later. False positives do notaffect the correctness of the complete 2SCENT algorithmalthough they will affect the efficiency. (2) we can no longerapply the old path pruning because the Bloom filter doesnot contain the information when elements were added toit. We handle this problem by inactive nodes pruning. In

1444

Algorithm 2 GenerateSeedsBloom

Require: Threshold ω, interactions EHash functions h1, . . . , hk, Bloom filter size q.

Ensure: Candidate root nodes s with start and end time ofthe cycle and a bloom filter representing the candidateset. It is guaranteed that for each temporal simple cyclethere will be such a four-tuple.

1: function GenerateSeedsBloom(ω, E)2: fwSeeds ← ∅3: for (a, b, t) ∈ E , ordered ascending w.r.t. t do4: fwSeeds ← fwSeeds∪ processEdge(a,b,t,ω)

5: Remove all bloom filters6: bwSeeds ← ∅7: for (a, b, t) ∈ E , ordered descending w.r.t. t do8: bwSeeds ← bwSeeds∪ processEdge(b,a,t,ω)

9: Output all (a, [ts, te], (Bf ∩ Bb)) s.t. there exists(a, te, Bf ) ∈ fwSeeds and (a, ts, Bb) ∈ bwSeeds with 0 <te − ts ≤ ω

10: function processEdge(a,b,t,ω)11: seeds ← {}12: if B(b) does not exist or |Last(b)− t| > ω then13: B(b)← [0, . . . , 0] . Empty bloom filter

14: Set bits h1(a), . . . hk(a) to 1 in B(b)15: Last(b)← t . Update last modified time stamp16: if B(a) exists and |Last(a)− t| > ω then17: if h1(b), . . . , hk(b) all 1 in B(a) then18: seeds ← {(b, t, B(a))}19: B(b)← B(b) ∪B(a) . Bitwise or

20: if time to prune then21: for all summaries B(x) do22: if |Last(x)− t| > ω then remove B(x)

23: return seeds

inactive nodes pruning, we keep for every node a the lasttime, denoted Last(a), that B(a) was updated. In this waywe can prune all nodes that have not been active within thecurrent window. This pruning mechanism is less effective,but at least bounds the number of summaries that simul-taneously need to be held in memory. (3) The last, mostsevere disadvantage is that because of the use of a Bloomfilter we are no longer able to capture the starting time ofcycles. Indeed, where S(a) contains pairs (b, tb), B(a) canonly be used to test if there is a pair (b, ?) in S(a). Thisproblem can be resolved with an additional pass throughthe data. This additional pass is based on the observationthat every cycle rooted at node v that starts at ts and endsat te becomes the root node of a cycle starting at te andending at ts if we reverse time and the direction of all inter-

actions. E.g., the temporal cycle a1→ b

2→ c3→ a becomes

the inverse temporal cycle a3→ c

2→ b1→ a. In the end

we generate candidates by combining the inverse temporalcycle roots with the normal cycle roots.

Combining these elements we get Algorithm 2. The func-tion processEdge is similar to the function GenerateSeedsin Algorithm 1 with a difference that instead of the exactset summary S(a), a bloom filter B(a) is maintained andupdated. Also, instead of pruning individual nodes in thesummary set of S(a) based on the time of addition in the setwe reset the bloom filter B(a) if it has not been updated in a

window of size ω. As processEdge is used for both a forwardscan and a backward scan while checking for last update wetake an absolute difference of current time and update timein steps 12, 16, and 22. In the end, to find all root nodes withstart time, end time, and the bloom filter consisting of thecandidate nodes, the interactions are scanned both forwardand backwards. In steps 2-4 the forward scan is performedby processing every interaction (a, b, t) to find the end time,root nodes, and candidate sets of all cycles, which are storedin fwSeeds. Then in steps 6-8 a backward scan is performedby processing edges in reverse to find the start time, rootnode, and candidate set for each cycle, which are stored inbwSeeds. Finally, in step 9 we merge fwSeeds and bwSeedsto generate the final seed candidates.

Example 2. Consider again the example of Figure 1a. Af-ter the initial forward scan, we will have candidate rootswith end time and a Bloom filter for the candidates. For thissimple example, fwSeeds will contain at least the followingcandidates: {(a, 8, B4), (a, 10, B5), (a, 12, B6), (d, 11, B7)}. Af-ter the subsequent backward scan the set of backward seedswill be {(a, 1, B1), (a, 7, B2), (d, 8, B3)}. The next table liststhe compatible pairs and the resulting candidate set:

nr fwSeeds bwSeeds Candidate1 (a, 8, B4) (a, 1, B1) (a, [1, 8], B1 ∩B4)2 (a, 8, B4) (a, 7, B2) (a, [7, 8], B2 ∩B4)3 (a, 10, B5) (a, 1, B1) (a, [1, 10], B1 ∩B5)4 (a, 10, B5) (a, 7, B2) (a, [7, 10], B2 ∩B5)5 (a, 12, B6) (a, 7, B2) (a, [7, 12], B2 ∩B6)6 (d, 11, B7) (d, 8, B3) (d, [8, 11], B3 ∩B7)

In the second step of our algorithm the candidates will gener-ate the following cycles of Figure 1b: Candidate 1 generates(1), candidate 2 is a false positive due to the merging oper-ation and will not generate any cycle (issue (3) mentionedabove). Candidate 3 generates (ii) and (iii), candidate 4,(iv), candidate 5, (v), and finally candidate 6, (vi).

Theorem 2. Let q be the size of the bloom filters, Wbe the maximal number of interactions in a window of sizeω. The complexity of processing one interaction with pro-cessEdge is O(q). The time complexity of Generate-SeedsBloom is O(q(m + c′)) where c′ denotes the numberof cycle candidates that are generated by the merge of for-ward and backward candidates. The memory complexity isO(q min(W,n)).

4.3 Combining Root Node Candidate TuplesAn essential last step before we can proceed to the exact

cycle finding, is combining seeds for efficiency, and avoid-ing overlapping seeds. Suppose for instance that there ex-ist 3 cycles rooted at a, with start and end times respec-tively [100, 110], [106, 110], and [105, 120]. GenerateSeedswill produce three seeds (s, [100, 110], C1), (s, [106, 110], C2),and (s, [105, 120], C3). The second cycle, however, is in-cluded in all three seeds and will be generated three timesby the cDFS algorithm we will introduce in the next sec-tion. Furthermore, we can merge some of the highly over-lapping candidates. Consider again the example of Figure 1.For all the cycles rooted at a Figure 1b(i)-(v), the cor-responding seeds are (a, [1, 7], {b, d}), (a, [1, 10], {b, d, e, f}),(a, [7, 10], {b, d, e, f}), and (a, [7, 12], {b, d, e, f}). The firstthree seeds could be combined into a single seed given as

1445

(a, [1, 10], {b, d, e, f}). A cDFS run on this single seed willgenerate all the cycles rooted at a; i.e., cycles 1b(i)-(iv),by considering interactions only in interval [1, 10] betweenthe candidate nodes {b, d, e, f}. Furthermore, additionallywe will also record the starting time of the next seed withthe same root and add this information in the seed nodesto obtain the extended candidates: (a, [1, 10], 7, {b, d, e, f})and (a, [7, 12], 12, {b, d, e, f}) (The value 12 in the secondseed is a dummy value as there is no next seed). cDFS willuse these extended candidates (s, [ts, te], tn, C) to generateexactly those cycles rooted at s, consisting only of verticesin C, starting in the interval [ts, tn[, and ending the latestat time te. By adding the restriction on tn we avoid du-plicate cycle generation. The algorithm to combine seedsrooted at a single node s is given in Algorithm 3. It startswith sorting all candidates on start time ascending and endtime descending. Subsequently it gets the first non-mergedcandidate and merges it with all following compatible candi-dates. This procedure is repeated until all candidates havebeen processed. In this way we are often able to compressthe list of candidates considerably.

Algorithm 3 Combining Root Node Candidate

Require: List of cycle seeds C for a root node s. Each seedis of the form (s, [ts, te], C), window length ω

Ensure: Combined candidates1: function CombineSeeds(C,ω)2: Sort C on ts ascending, then te descending.3: while C not empty do4: Let (s, [ts, te], C) be first in C5: Let Compatible be the maximal prefix of C such

that for all (s, [t′s, t′e], C′) ∈ Compatible it holds that

t′e < ts + ω6: C ← C \ Compatible7: if C is empty then tn ← ts + ω8: else9: Let (s, [t′s, t

′e], C′) be first in C

10: tn ← t′s11: tmax ← max{t′e | (s, [t′s, t

′e], C′) ∈ Compatible}

12: Call ←⋃{C′ | (s, [t′s, t

′e], C′) ∈ Compatible}

13: Output (s, [ts, tmax], tn, Call)

Theorem 3. Algorithm 3 ensures that for every temporalcycle rooted at s, which start at ts and end at te, there isexactly one extended seed (s, [t′s, t

′e], tn, C) that contains the

cycle; that is: all nodes of the cycle are in C, ts ∈ [t′s, tn[,and te ∈ [t′s, t

′e].

5. CONSTRAINED DFSAfter finding candidates, we find the exact cycles. For

each extended candidate (s, [ts, te], tn, C) we run our con-strained Depth-First Search to find all cycles represented bythis candidate. Algorithm 7 gives the complete procedure.We will now step by step describe how this procedure works.

We apply a depth-first procedure to find all temporalpaths in a dynamic graph. If the path reaches a node whichis the same as the start node, we output it as a cycle. Westart with a given node s and a start time ts. All edgesthat branch out of s at this time stamp are now recursivelyexplored. A pure depth-first exploration, however, has thedisadvantage that some unsuccessful paths will be explored

Algorithm 4 Unblock

Require: Node v that gets a new closing time tv.Global: interactions E , closing times ct(v) and unblocklist U(v) for all nodes v ∈ V .

Ensure: Recursive unblocking of the nodes.1: function Unblock(Node v, time stamp tv)2: if tv > ct(v) then3: ct(v)← tv4: for (w, tw) ∈ U(v) do5: if tw < tv then6: U(v)← U(v) \ {(w, tw)}7: T [w, v] = {t | (w, v, t) ∈ E}8: T ← {t ∈ T [w, v] | tv ≤ t}9: if T 6= ∅ then

10: U(v)← U(v) ∪ {(w,min(T ))}11: tmax ← max{t ∈ T [w, v] | t < tv}12: Unblock(w,tmax)

Algorithm 5 Add to unblock list

Require: Unblock list U(v) of node v, pair (w, t) to beadded

Ensure: New unblock list U(v) with (w, t) added.1: function Extend(U(v),(w, t))2: if there is an entry (w, t′) ∈ U(v) then3: if t′ > t then U(v)← U(v) \ {(w, t′)} ∪ {(w, t)}4: else U(v)← U(v) ∪ {(w, t)}

over and over again. Consider for instance the example inFigure 2 without the dotted lines. As there exist 2 paths froma to c, an exhaustive depth-first exploration of all paths willvisit node c two times, and each time the subgraph formedby h, j, and k will be explored again. In order to avoidsuch fruitless repeated explorations, we will keep track ofthe success status of different nodes in earlier depth-firstexplorations of the dynamic network. This information isstored in the form of a so-called “closing time” of a node.Intuitively, node v having closing time ct(v) indicates thatthere do not exist paths back to a from node v that start attime ct(v) or later. Hence, if during the depth-first explo-ration, we arrive at a node on or after its closing time, thenwe can abort our search. So, while exploring node h, arriv-ing there at 11, we will notice that there are no paths fromh back to a and hence its closing time will become 11 and hwill never be expanded again. Similarly, after the first timewe visit node c, we will notice that the last path from c backto a starts at timestamp 7, so its closing time will become7. Due to this update in closing time of c any depth-firstexploration of c will be aborted from timestamp 7 onwards.

Let’s illustrate the principle with our example graph. Forthe subsequent steps we will show how the closing times ofthe nodes evolve and how this saves us costly repetitionsof useless explorations. For now the reader does not needto worry about how the closing times are affected by back-tracking to find additional solutions as this will be treatedin detail right after the example.

• a1→ b: ct(b) becomes 1; this node cannot be used to ex-

tend the path without violating the simplicity condition;

• b5→ c: ct(c) becomes 5;

1446

Algorithm 6 Algorithm AllPaths

Require: Prefix path st1→ v1

t2→ . . .tk→ vk that starts in

target node s.Ensure: All simple temporal paths in G(V, E) from v1 to

s, starting with the given prefix are output. The returnvalue is false if no such path exists, otherwise it is true.

1: function AllPaths(pr = st1→ v1 . . .

tk→ vk)2: vcur ← vk, tcur ← tk3: ct(vcur)← tcur, lastp← 04: Out← {(vcur, x, t) ∈ E | tcur < t}5: N ← {x ∈ V | (vcur, x, t) ∈ Out}6: if s ∈ N then7: for (vcur, s, t) ∈ Out do8: if t > lastp then lastp← t

9: Output pr · 〈(vcur, s, t)〉10: for x ∈ N \ {s} do11: Tx ← {t | (vcur, x, t) ∈ Out}12: while Tx 6= ∅ do13: tm ← min(Tx)14: pass← False15: if ct(x) ≤ tm then pass← False16: else pass← AllPaths(pr · 〈(vcur, x, tm)〉)17: if not pass then18: Tx ← ∅19: Extend(U(x),(vcur, tm))20: else21: Tx ← Tx \ {tm}22: if tm > lastp then lastp← tm

23: if lastp > 0 then Unblock(vcur,lastp)

24: return (lastp 6= 0)

• We explore recursively all paths that start with c11→ h.

No paths are found, hence during this recursion ct(h),ct(j), and ct(k) become respectively 11, 13, and 14;• Via recursive calls we find a path from c that start with

c7→ e and c

6→ d. We hence derive that the latest pathleaving c starts at time 7. Hence, when backtracking,ct(c) becomes 7, and during the recursive calls also theclosing times of the other nodes are updated.

In order to find additional paths, we backtrack and find thenext solution. Suppose now that we already explored the

subspace of all cycles that start with a1→ b. At this point

in time the closing times are as follows:

a b c d e f h j k− 5 7 8 10 12 11 13 14

• a5→ c can be explored next, because 5 < ct(c) = 7.

• From c we cannot go to node h because 11 6< ct(h).• From there on we continue to find our last 2 paths.

So far so good, but until now we have been ignoring a majorproblem with the closing times when backtracking to findthe next solution: while backtracking, the path becomesshorter again, and nodes become available again which onits turn may affect the correctness of the closing times. Weillustrate this problem by slightly extending the example inFigure 2 by adding the dotted lines. When exploring all

paths starting with the edge a1→ b, the node b temporarily

gets ct(b) = 1 to force that our cycles are simple. As a

b

5,8,10

��

17

��

h13 //

14��

j

16

||

a

1

@@

5 // c7//

6��

11

88

e

10

��

k

15

OO

d

8

__

f

12

WW

Figure 2: Example temporal network with simple cycles

Algorithm 7 Dynamic Depth-First Simple Cycle Search

Require: Source node s ∈ VGlobal: Interaction network G(V, E); closing time ct(v)and unblock list U(v) for all nodes v ∈ V ; Timestampts,te and tn; Set of candidates C ⊆ V

Ensure: All simple temporal cycles in E rooted at s start-ing in interval [ts, tn[ and ending before te, using onlyvertices of C.

1: function Cycle(s)2: E ← {(u, v, t) ∈ E | u, v ∈ C, t ∈ [ts, te]} . Reduce G3: V ← C4: for x ∈ C do5: ct(x)←∞, U(x)← ∅6: for (s, x, t) ∈ E|t < tn do

7: AllPaths(st→ x)

result, when recursively exploring all paths with prefix a1→

b5→ c, we will conclude there is no path from h, k, and

j back to a and set their closing times to 11, 13, and 14respectively. As a result, later on, when exploring all paths

with prefix a1→ b

8→ c and a1→ b

10→ c, we will correctlyabort exploration of the branch below h. However, when thesearch continues, at a certain point we will have explored

all paths starting with a1→ b, and we are back at node

a. The closing time of b is set to 17 because of the cycle

a1→ b

17→ a. We continue exploring all paths that start

with a5→ c. It is at this very moment that things start

becoming ugly. Indeed, at this point in time, we do haveto explore the branch below h, because now there is a cycle

that involves h, namely a1→ c

11→ h13→ j

16→ b17→ a! So, what

went wrong? The first time we visited node h, node b wasblocked as it appeared on the path from a to h. Therefore,we correctly concluded that h should be blocked, too. Thissituation remained until the point that b became unblockedbecause of backtracking. At that point, in fact, the closingtime of h should have been reconsidered. The mechanism torealize the correct update of the closing times is as follows:whenever we limit the closing time of a node, at the sametime we also evaluate under which conditions the closingtime of the node can increase again. In the case of node j,we see that there is an outgoing edge with time stamp 16

1447

s x y

t

pblock

pr.x

Figure 3: Illustration of the concepts introduced in theproof of completeness of AllPaths.

to node b with closing time 1. Hence, from the moment onthat the closing time of b increases to above 16, the closingtime of j should increase to 16. For this purpose, we addfor every node an “unblock list” U(v) that contains a listof nodes and thresholds (w, t). From the moment on thatthe closing time of v exceeds again the threshold t, for eachpair (w, t) in U(v), the closing time of node w will have tobe adapted as well. In our example this amounts to adding(j, 16) to U(b). Whenever we increase the closing time ofany node v in the graph, we will go over its unblock list andunblock the other nodes as needed. Notice that unblocking anode may result in a cascade of unblock operations; indeed,in our example, unblocking b causes j to become unblocked,which on its turn causes h and k to become unblocked. Thepseudo code of the algorithm is given in Algorithms 4, 6,and 7.

Theorem 4. Correctness. Cycle(s) returns all simplecycles rooted at s starting in interval [ts, tn[ and ending be-fore te.

Proof Sketch First of all, it is important to realize that thecDFS algorithm is a truncated depth-first search: all pathsare explored from a node s that a normal depth-first searchwould also explore, except for paths that (a) do not respectthe temporal order, (b) contain duplicate nodes other thans, or (c) paths that are blocked because of a closing time thatis too low. The fact that it is a depth-first search togetherwith (a) and (b) guarantees the correctness of each simpletemporal cycle that is output. For completeness, from (a)and (b) it is easy to see that these cases do not restrictthe completeness of the algorithm. Hence, what is left toshow is that whenever there exists a simple temporal cy-

cle a1t1→ a2 . . .

tn→ antn→ a1, none of the edges ai−1

ti→ ai

is blocked because of the closing time of ai at the moment

that a1t1→ a2 . . .

ti−1→ ai−1 is explored in the depth-firstsearch. Suppose, for the sake of contradiction, that never-

theless at some point in the algorithm, the interaction xt→ y

is blocked while there exists an extension of the current pre-fix to s that uses this edge. Let py→s be this path such that

pr · x t→ y · py→s is the simple temporal cycle that is notfound. Furthermore, we can assume without loss of gener-

ality that this is the first time this happens. xt→ y must

have gotten blocked in another path pblock that reached y(only the closing times of the last nodes on a path can de-crease). This situation is depicted in Figure 3. It is clearthat pblock · py→s is a temporal cycle. This temporal cycleis either simple, in which case y won’t get blocked, or the

path is not simple, which means that pblock contains at leastone of the nodes of py→s. In the full proof [13] it is shown,however, that in such case there is a chain of unblock opera-tions which will be invoked when the depth-first explorationback tracks from the path pblock, and it can be shown thatwhen the last common node between pblock and py→s gets

unblocked, xt→ y must be free again (that is: ct(y) > t).

This contradicts our earlier assumption that the interaction

xt→ y is blocked and hence proves the theorem. An impor-

tant invariant that facilitates the full proof is the followingconsistency between closing times and unblock lists that isguaranteed at crucial times: whenever there is a path fromy to s that starts at time t and does not intersect the cur-rent prefix, ct(y) > t. In all other cases, ct(y) ≤ t, and(x, t′) ∈ U(y) with t′ ≤ t. 2

Theorem 5. Complexity. Let m = |E| and n = |V |.We can implement Cycle(s) in such a way that in betweentwo cycles being output, Cycle(s) takes at most O(m + n)steps. Hence, if there are c cycles in the network, the totaltime complexity to find all of them is O((c + 1)(m + n)).

Proof Sketch The proof of this theorem is based on the ob-

servation that the only way to unblock an edge xt→ y; that

is, lower the closing time of y to lower than t, is by a callto Unblock, which only happens when a cycle is output.

Whenever a cycle a1t1→ a2 . . .

tn→ antn→ a1 is found, Un-

block will be executed for an (for the prefix a1t1→ a2 . . .

tn→an), then for an−1 (for the prefix a1

t1→ a2 . . .tn−1→ an−1), etc,

until it is called for a1 (for prefix a1). Each of the calls toUnblock, however, unblock different interactions. Indeed,it can be shown that, if we are in a call to AllPaths, and

the prefix is a1t1→ a2 . . .

ti−1→ ai, a call to Unblock, only

unblocks interactions xt→ y such that there is a temporal

path to the source node a1 that uses this interaction and ai,and at the same time there is no path to the source node a1

that does not intersect te prefix anywhere. This seeminglycumbersome condition is actually quite intuitive: Unblockonly unblocks interactions that at the moment cannot beused in a path to the source node a1, but once ai becomesavailable again, they can be used again. This condition im-mediately leads to the conclusion that every interaction canbecome unblocked at most once in between two cycles arebeing output. Therefore, since the depth-first search blocksthe interactions it visits, every interaction can be traversedat most twice in between two cycles are output. Hence, ei-ther a cycle is output after O(m) steps, or all interactionswill be blocked and the algorithm stops. The term O(n) inthe theorem is for the initialization of the closing times ofall nodes at the start of the algorithm. We furthermore referto the full proof in the extended version of the paper [13]for an in-depth discussion on the data structures that canbe used in order to guarantee that we can run the algorithmwithout the need to inspect blocked edges. 2

6. PATH BUNDLESThe algorithm presented in the last section still has one

big disadvantage: especially in the presence of repeatededges the same paths and cycles can be explored over andover again. Consider for instance the example in Figure 4.In this example there are 36 = 729 cycles and each of them

1448

will be generated separately. There will be one call startingwith a, 3 for a→ b, 9 for a→ b→ c, etc. A lot of this workcould be avoided though by combining the computations formultiple edges and paths. It is exactly for this purpose thatwe introduce the following notion of a path bundle.

Definition 4. A path bundle B in an interaction networkG(V, E) between nodes v1 and vk+1 consists of a sequence ofvertices v1, . . . , vk+1, and sets of timestamps T1, . . . , Tk suchthat for all i = 1 . . . k, t ∈ Ti it holds that (vi, vi+1, t) ∈ E .

We will denote the path bundle B by v1T1→ v2

T2→ . . .Tk→ vk+1.

The set of temporal paths represented by B, denoted P(B)is defined as:

P(B) := {v1t1→ v2 . . .

tk→ vk+1 | ∀i : ti ∈ Ti & t1 < . . . < tk}

A path bundle is called minimal if for all i = 1 . . . k, t ∈ Ti

it holds that

P(v1T1→ . . . vi

Ti\{t}−→ . . .Tk→ vk+1) (

P(v1T1→ . . . vi

Ti→ . . .Tk→ vk+1)

Lemma 1. Let B be a path bundle. There exists a uniqueminimal path bundle B′ such that P(B) = P(B′)

6.1 Expanding a BundleIn order to extend our algorithm to work with path bun-

dles instead of individual paths, we need to extend all oper-ations performed on paths in the algorithm to bundles. Thefirst operation we consider is extending the path with an ex-tra edge. This operation is easy enough, as we can just addthe edge with all its timestamps to the bundle. We do want,however, to keep the bundles minimal for efficiency reasons.Algorithm 8 does exactly that; it extends a bundle with anedge while maintaining the minimality of the bundle.

Let’s illustrate with an example. Suppose we have a path

bundle a1,5,7→ b

3,8→ c which we want to extend with theedges c

2,4,7→ d. Since there is no edge from b to c earlierthan timestamp 3, we can prune away 2 from the pathsbetween c and d. Furthermore, the last edge between cand d has timestamp 7, so all edges between b and c laterthan 7 should be removed. Only the edge with timestamp3 remains between c and d which causes the timestamps 5and 7 between a and b to be removed. Hence, the result of

the extension is: a1→ b

3→ c4,7→ d.

b

4

��5 //

6

@@ c 7

��

8

��9

11a

1 --

2

??

3

RR

d

10mm11

��

12

��f16

LL

17

^^18

qq

e

13

__ 14oo

15��

Figure 4: Example temporal network with simple cycleshaving multiple repeated edges

Lemma 2. Given a minimal bundle B between u and vand a bundle v

T→ w, Algorithm 8 returns a minimal bundleB′ such that P(B′) consists of all temporal paths from u tow that can be constructed by extending a path from P(B)

with an edge from vT→ w.

6.2 Extending the Algorithm to BundlesBy directly manipulating path bundles instead of individ-

ual paths we can significantly reduce the number of recur-sions needed as well as output the cycles much more com-pactly. In algorithm 9 we provide extensions of the algo-rithm presented in 6 to consider the path bundle notion.There is not much change in algorithm 7 except at step 7where instead of looking for path from x to the root nodes using algorithm 6, a path bundle is searched using algo-rithm 9. The output of the algorithm 9 is not all the simpletemporal cycles as we required, but a more compact repre-sentation of cycles using the path bundles.

6.3 Counting the Number of Paths in a BundleFor some applications we need the exact number of paths

represented by a bundle. This number, however, is not en-tirely straightforward to obtain efficiently. Indeed, we mayeasily come up with a recursive procedure that generates allvalid combinations of the timestamps, but that would some-what defy the purpose of the bundles, which is exactly toavoid such costly individual treatment of the paths.

Luckily it is not hard to develop a dynamic algorithmto count the number of paths in a path bundle. We can

iteratively compute the number of paths in a bundle v1T1→

. . .Tk→ vk+1 by considering all the prefixes of the bundle in

increasing length. For each prefix Pi = v1T1→ . . .

Ti→ vi+1, thenumber of paths are stored on a heap Hi. For each end timet of a path in Pi, the number of paths n ending at that timeor earlier is stored as a pair (t, n) on the heap. The heapHi+1 can easily be computed based on Ti and Hi. Due tospace constrained full details of this algorithm are providedin the technical report [13].

7. EXPERIMENTSWe evaluated the performance of our algorithms on 6 dif-

ferent real world temporal networks. The performance re-sults presented in this section are for a C++ implementa-tion of our algorithm. All experiments were run on a simpledesktop machine with an Intel Core i5-4590 CPU @3.33GHz

Algorithm 8 Extending a path bundle with an edge bundle

Require: Minimal path bundle B = v1T1→ . . .

Tk→ vk+1, edge

bundle E = vk+1

Tk+1→ vk+2

Ensure: Minimal path bundle with all valid paths com-posed of B and an edge of E.

1: function Expand(v1T1→ . . .

Tk→ vk+1,vk+1

Tk+1→ vk+2)2: T ′k+1 ← {t ∈ Tk+1 | t > min(Tk)}3: if T ′k+1 = ∅ then

4: return (v1∅→ . . . vi

∅→ . . .∅→ vk+2)

5: for i = k down to 1 do6: T ′i = {t ∈ Ti | t < max(T ′i+1)}

7: return (v1T ′1→ . . . vi

T ′i→ . . .T ′k+1→ vk+2)

1449

Algorithm 9 Algorithm AllBundles

Require: Prefix bundle B starting in node sGlobal: Interaction network G(V, E), closing times ct(v),unblock list U(v) for all nodes v ∈ V , latest timestampte in E .

Ensure: All simple temporal paths in G(V, E) from x to ve,prefixed with path.

1: function AllBundles(B = sT1→ v1

T2→ . . .Tk→ vk)

2: tcur ← minTk, vcur ← vk3: ct(vcur)← tcur, lastp← 04: Out← {(vcur, x, t) ∈ E | tcur < t ≤ ct(x)}5: N ← {x ∈ V | (vcur, x, t) ∈ Out}6: if s ∈ N then7: T ← {t | (vcur, s, t) ∈ Out}8: t← max(T )9: if t > lastp then lastp← t

10: Output Expand(B, vcurT→ s)

11: for x ∈ N \ {s} do12: Tx ← {t | (vcur, x, t) ∈ Out}13: T ′x ← {t ∈ Tx | t < ct(x)}14: if T ′x 6= ∅ then

15: lastx ← AllBundles(Expand(B, vcurT ′x→ x))

16: if lastx > lastp then lastp← lastx

17: tm ← min {t ∈ Tx | t > lastx}18: Extend(U(x),(vcur, tm))

19: if lastp > 0 then20: Unblock(vcur,lastp)

21: return lastp

CPU and 16 GB of RAM, running the Linux operating sys-tem. The code and instructions to run the experiments areavailable online 1.

7.1 DatasetAll datasets except SMS [30], Facebook [27] and USElection

[12] were obtained from the SNAP repository [15]. The char-acteristics of the datasets are given in Table 1. While run-ning the experiments we choose smaller windows for the highfrequency dataset SMS, Facebook, USElection, and Higgs

whereas for the low frequency datasets Stackoverflow andWiki-talk a longer window size were considered.

The exact meaning of a temporal cycle a1t1→ a2 . . . an

tn→a1 is that there is an interactions from a1 to a2 at time t1,followed by and itereaction at t2 from a2 to a3, and so on,followed by an interaction at time tn from an back to a1.For SMS, for instance, this means that a1 sms’ed a2 followedby a2 sms’ed a3, etc., until an sms’ed back to a1. Suchpattern could be of interest for application-oriented follow-up research such as sociological studies of communicationpatterns; although speculative at this stage, we think thatthese patterns could be indicators of feedback mechanismsbetween peers in which opinions are formed, reinforced andstrengthened over time.

7.2 Performance EvaluationEffect of bloom filter: The efficiency and effectiveness

of the bloom filter depends on the Bloom filter size andthe number of hash functions used. For our experiments,

1Code: https://github.com/rohit13k/CycleDetection

Table 1: Characteristics of interaction network along withthe time span of the interactions as number of days.

Dataset n[.103] m[.103] DaysFacebook 46.9 877.0 1592SMS 44.1 545 338Higgs 304.7 526.2 7Stackoverflow 2464.6 16266.4 2774Wiki-talk 1140 7833.1 2320USElection 233.8 1000 10 hours

Table 2: Time and Memory Comparison between Exactset based and bloom filter approach to find root candidates.

Dataset ωTime(seconds) Memory(MB)

Exact Bloom Exact Bloom

Facebook1 hour 4 12 20 22510 hours 6 17 24 375

SMS1 hour 12 40 27 73010 hours 50 59 112 972

Higgs1 hour 4 8 114 17010 hours 45 10 3048 325

Stackoverflow1 day 78 399 26 15781 week 138 454 346 2309

Wiki-talk10 hours 66 223 98 35411 day 147 344 269 5675

USElection1 hour 20 21 157 31510 hours - 27 - 700

Table 3: Effect of pruning (P) versus no pruning (NP) onTime and Memory usage.

DataSet ωTime(sec) Memory(MB)

P NP P NP

Facebook1 hour 3.9 4.1 9 2510 hours 4.9 5.1 11 28

SMS1 hour 11.6 12.1 16 5110 hours 45.6 46.1 41 90

Higgs1 hour 4.1 3.8 103 17710 hours 44.3 41.6 3037 3295

Stackoverflow1 day 79.7 97.4 26 14411 week 112.3 130.8 343 2184

Wiki-talk10 hours 58.5 62.5 98 12311 day 129 133.5 269 3174

we used a projected element count of 500 and false posi-tive probability of 0.0001, which results in a filter of size9592 using 13 hash functions. Using the bloom-filter-basedapproach for the SD phase is not always efficient. This ismostly because of two reasons: (1) in the Bloom Filter ap-proach we have to scan the data twice; and (2) creatingbloom filters for data sets where the candidate set is verysmall is an overkill. Hence, as long as the candidate set sizeis not getting so large that it stresses memory usage and setoperations like union and cardinality test, the set-based ap-

1450

proach is faster than the bloom-filter-based approach. Thesummary set size becomes very large for interaction net-works in which the ratio of the number of interactions overthe number of nodes is high. This is the case for Higgs andUSElection with ω set to 10 hours. In this case, the Bloom-filter-based approach is the best approach because of thetime and memory savings it provides. In our experiments,for USElection, the Exact-set-based approach ran out ofmemory after 18 minutes, whereas the Bloom-filter-basedapproach finished within 27 seconds taking only 700 MB ofspace. More results for time and memory consumption inthe SD phase are shown in table Table 2.

For deciding when to use Bloom filters, we remark that itis clear that they become useful only when there are manytemporal paths to be maintained. This situation is morelikely to occur when the ratio of edges to nodes is high in thewindows. This property could be used to estimate whetheror not to use the optimization. Another, easier way to es-timate, however, is by running the algorithm on a few win-dows of the dataset. Alternatively, it is straightforward todevelop a hybrid version of the algorithm that switches auto-matically to the use of Bloom filters whenever the memoryusage reaches a certain limit; e.g., 90% of available mem-ory. It is, however, unlikely that such hybrid version wouldbe very useful given the ease with which we could run theestimation on a part of the dataset.

Effect of Pruning: We also tested the effect of inactivenode pruning in the SD Phase. We ran pruning after pro-cessing every batch of 100,000 interactions. As expected,pruning has a huge impact on the memory requirements ofthe SD Phase. For instance, the memory requirements re-duced by a factor of 55 in case of Stackoverflow for a 1day window. This is because there are too many sourcenodes and most of them become inactive very quickly. Assuch, removing their summaries from the memory resultedin a huge gain in memory usage and runtime. In the caseof Higgs, however, the number of source nodes is very lowand they remain active throughout the whole duration of thedataset resulting in much less memory savings and a modestincrease in runtime. In all other cases, however, there aresignificant memory and time savings due to regular pruning.The results are shown in Table 3.

Effect of Bundling: As expected, using the path bun-dle approach is never slower than using the simple path ap-proach. On the other hand, in cases where there are multiplerepeated edges such as Higgs for a window of 10 hours, weget a speedup of up to 12 times thanks to the path Bundles.The results are shown in Table 4.

Runtime for Complete Cycle Enumeration. Finally,we also compare the total runtime of finding all cycles using2SCENT with exact set and path bundles to the algorithmpresented by Kumar and Calders [11] (Naive algorithm).As 2SCENT is a two-phase algorithm we compare the com-bined time taken by both phases with the runtime of theNaive algorithm. We observe that for small networks withless frequent interactions, such as Facebook, or for medium-sized networks with a small window length ω, such as SMS

with a window of 1 hour, or for large networks with veryinfrequent interactions, such as Mathoverflow with a 1 daywindow, the Naive algorithm outperforms 2SCENT and itsvariants. This is because in these cases there are only fewtemporal paths to be enumerated which easily fit in mem-ory. Hence a brute force approach as proposed in [11] is

Table 4: Time comparison (in seconds) to find cycles usingBundle path and without Bundle path.

Dataset ω WithoutBundle

WithBundle

Facebook1 hour 4.7 3.910 hours 9.4 7.3

SMS1 hour 24.5 10.310 hours 104.6 21.34

Higgs1 hour 2.65 2.2610 hours 1526.5 136.6

Stackoverflow1 day 62.7 63.31 week 147.7 118.4

Wiki-talk10 hours 693.9 320.21 day 2356 828

Table 5: Time Comparison between Naive and 2SCENTto find all cycles.

DataSet ω Naive 2SCENT

Facebook1 hour 6.5 sec 12.2 sec10 hours 9.3 sec 18.2 sec

SMS1 hour 21.1 sec 34.8 sec10 hours 15.7 hours 2.1 min

Higgs1 hour 10.6 min 10.7sec10 hours Crashed 3.6 min

Stackoverflow1 day 3.2 min 3.7 min1 week Crashed 6.6 min

Wiki-talk10 hours Crashed 7.5 min1 day Crashed 19 min

feasible. But when we run on larger interaction networksor with larger window lengths, 2SCENT outperforms theNaive algorithm with respect to runtime by a factor of upto 300. The massive gain in performance is due to the factthat the Naive algorithm maintains and updates all tempo-ral paths whereas 2SCENT needs to enumerate only pathsfrom nodes that are the starting point of cycles, and doesso in a memory-efficient way; one by one, and using a con-strained depth-first search. As a result 2SCENT is ableto deal with datasets such as Higgs, Stackoverflow, andWiki-talk, even for high window lengths, whereas the Naivealgorithm crashes due to the high number of temporal pathsit is maintaining in memory. The results are presented atTable 5.

Effect of Window Length(ω): Finding the best ω fora dataset is not straightforward, as it depends on the kindof interaction. For example, in the case of a social networkwhere people respond quickly, an ω of 1 to 10 hours mightdo, but for slow networks such as stack overflow, where peo-ple generally respond after few days only, longer cycles oflength 1 day or 1 week could still be interesting. In biologi-cal neural networks, ω could be in the range of milliseconds,as feedback loops take place almost instantaneously.

We also study the effect of increasing the window lengthon processing time and cycle count. We present the resultsfor the SMS dataset in Figure 5. We make two observations;

1451

0

5

10

15

20

25

30

35

40

1 10 20 30 40 50 60 70 80 90 100

0

5

10

15

20

25

30

Tim

e(se

cond

s)

Cou

nt(1

04 )

window

cDFS Time#Cycles

Figure 5: Effect of window length on processing time andcycle count for SMS data set

first, as expected, the processing time and count of simplecycles increases with an increase in window length, but af-ter a certain window length both become constant. Thisis because when the window is large enough, the temporalcharacteristic of the network do not change any more. Incase of the SMS data set, this happens at a window length of70 hours. Second, we see that the processing time increasesat first and then decreases slightly again before becomingconstant. This decrease in processing time is the result ofthe higher compression of candidate nodes for larger win-dows, resulting in fewer root candidates, but each with ahigher number of cycles, found in one cDFS scan.

Therefore, in cases where the application does not providea natural way to set ω, one could take the approach of start-ing with a small window at first and iteratively increase ituntil a sufficient number of interesting cycles is found. Inour experiments we take ω as 1 hour and 10 hours for fastinteraction network such as Twitter and Facebook and largeω for slower interaction networks.

7.3 Qualitative EvaluationCycle Frequency Distribution: In figure 6, we present

the frequency distribution of the number simple cycles bycycle length for the Facebook, SMS and Higgs data sets fora window of 10 hours. The maximum cycle length is 5 and11 respectively for the Facebook and SMS data set, and thenumber of triangles is very high as compared to the num-ber of longer cycles. In the Higgs data set, however, themaximal cycle length is 20 and the cycle count distributionis very different. We think this could be because the SMS

and Facebook data sets capture interactions between friendswhereas Higgs is an open interaction platform where follow-ers are interested in similar topic of discussion.

8. CONCLUSIONWe addressed the problem of enumerating simple tempo-

ral cycles that do not exceed a given time window length ωin an interaction network. One of the applications we pro-posed and explored in the paper is using the number andlength distribution of temporal cycles to characterize (partof) the dynamic behaviour of the temporal network. Thisis similar in spirit to using metrics such as clustering coef-ficient or diameter to characterize static networks. In orderto visualize this distribution, it is necessary to enumerate,or at least count the number of cycles of all lengths. We

0 10 20 30 40 50 60 70 80 90

100

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

#Cyc

les/

Tot

al C

ycle

s

length

FBSMS

Higgs

Figure 6: Distribution of simple cycle count and length forω = 10 hours.

presented an efficient algorithm, 2SCENT, which consists oftwo phases. In the first phase all sources of cycles are de-tected, which are then further expanded into the full cyclesin the second phase. The base version of 2SCENT was ex-tended in two important ways: first, we introduced the useof Bloom filters to reduce the memory consumption of thesource detection phase by replacing the reverse reachabilityset by a reverse reachability filter. The second extension,using path bundles, handles the common case of repeatedinteractions leading to an explosion in the number of cycles.In experiments, we found that 2SCENT with its extensionsruns up to 300 times faster than the only existing competi-tor. The experiments show that the algorithm could scaleto millions of nodes and interactions using only commodityhardware. While the focus of this paper was more on algo-rithms and general aspects of temporal cycle enumeration,we also presented a qualitative analysis of cycles in tempo-ral networks and analyzed the temporal nature of differentreal-world networks using the cycle count frequency distri-bution. For closed versus open friendship networks we couldobserve different cycle distributions, indicating different dy-namic behaviours in these networks.

We consider two important avenues for future work. First,more research is required to definitely answer the questionwhether or not the temporal cycle distribution is a good wayto represent dynamic behaviour in networks. Moreover, forthe datasets used in this paper, we did not have access tothe actual content of the interactions such as the tweetson the Twitter network. A qualitative study of the cyclesfound and their meaning and significance from an applica-tion perspective are of great interest. Secondly, it is alsoimportant to take into account the frequency of interactionbetween nodes when assessing the significance of the cyclesfound. Indeed, for nodes that are closely collaborating andinteracting frequently, it is likely that accidental cycles mayemerge. Therefore, methods need to be developed to mea-sure the probability of temporal cycles emerging by chance.

AcknowledgmentThis work was supported by the Fonds de la RechercheScientifique-FNRS under Grant(s) no T.0183.14 PDR. Thestudent is also part of IT4BI DC program.

1452

9. REFERENCES

[1] E. Bergamini, H. Meyerhenke, and C. L. Staudt.Approximating betweenness centrality in largeevolving networks. In 2015 Proceedings of theSeventeenth Workshop on Algorithm Engineering andExperiments (ALENEX), pages 133–146. SIAM, 2014.

[2] E. Birmele, R. Ferreira, R. Grossi, A. Marino,N. Pisanti, R. Rizzi, and G. Sacomoto. Optimal listingof cycles and st-paths in undirected graphs. InProceedings of the twenty-fourth annual ACM-SIAMsymposium on Discrete algorithms, pages 1884–1896.Society for Industrial and Applied Mathematics, 2013.

[3] B. H. Bloom. Space/time trade-offs in hash codingwith allowable errors. Communications of the ACM,13(7):422–426, 1970.

[4] C.-Y. Dong, D. Shin, S. Joo, Y. Nam, and K.-H. Cho.Identification of feedback loops in neural networksbased on multi-step granger causality. Bioinformatics,28(16):2146–2153, 2012.

[5] P.-L. Giscard, P. Rochet, and R. C. Wilson.Evaluating balance on social networks from theirsimple cycles. Journal of Complex Networks, pagecnx005, 2017.

[6] F. Hoffmann and D. Krasle. Fraud detection usingnetwork analysis, 2015. EP Patent App.EP20,140,003,010.

[7] P. Holme and J. Saramaki. Temporal networks.Physics reports, 519(3):97–125, 2012.

[8] W. Hu, H. Zou, and Z. Gong. Temporal pagerank onsocial networks. In International Conference on WebInformation Systems Engineering, pages 262–276.Springer, 2015.

[9] D. B. Johnson. Finding all the elementary circuits of adirected graph. SIAM Journal on Computing,4(1):77–84, 1975.

[10] L. Kovanen, M. Karsai, K. Kaski, J. Kertesz, andJ. Saramaki. Temporal motifs in time-dependentnetworks. Journal of Statistical Mechanics: Theoryand Experiment, 2011(11):P11005, 2011.

[11] R. Kumar and T. Calders. Finding simple temporalcycles in an interaction network. In Proceedings of theWorkshop on Large-Scale Time Dependent Graphs(TD-LSG 2017) co-located with the EuropeanConference on Machine Learning and Principles andPractice of Knowledge Discovery in Databases (ECMLPKDD 2017), Skopje, Macedonia, September 18,2017., pages 3–6, 2017.

[12] R. Kumar and T. Calders. Information propagation ininteraction networks. In EDBT, pages 270–281, 2017.

[13] R. Kumar and T. Calders. 2SCENT: An EfficientAlgorithm for Enumerating All Simple TemporalCycles(Full version).http://rohit13k.github.io/doc/2SCENT.pdf, 2018.[Online; accessed 11-June-2018].

[14] R. Kumar, T. Calders, A. Gionis, and N. Tatti.Maintaining sliding-window neighborhood profiles ininteraction networks. In Joint European Conference onMachine Learning and Knowledge Discovery inDatabases, pages 719–735. Springer, 2015.

[15] J. Leskovec and A. Krevl. SNAP Datasets: Stanfordlarge network dataset collection, June 2014.

[16] P. Mateti and N. Deo. On algorithms for enumeratingall circuits of a graph. SIAM Journal on Computing,5(1):90–99, 1976.

[17] R. K. Pan and J. Saramaki. Path lengths, correlations,and centrality in temporal networks. Physical ReviewE, 84(1):016105, 2011.

[18] A. Paranjape, A. R. Benson, and J. Leskovec. Motifsin temporal networks. In Proceedings of the TenthACM International Conference on Web Search andData Mining, pages 601–610. ACM, 2017.

[19] J. Ponstein. Self-avoiding paths and the adjacencymatrix of a graph. SIAM Journal on AppliedMathematics, 14(3):600–609, 1966.

[20] V. B. Rao and V. Murti. Enumeration of all circuits ofa graph. Proceedings of the IEEE, 57(4):700–701, 1969.

[21] M. Riondato and E. M. Kornaropoulos. Fastapproximation of betweenness centrality throughsampling. Data Mining and Knowledge Discovery,30(2):438–475, 2016.

[22] P. Rozenshtein and A. Gionis. Temporal pagerank. InJoint European Conference on Machine Learning andKnowledge Discovery in Databases, pages 674–689.Springer, 2016.

[23] P. Rozenshtein, N. Tatti, and A. Gionis. Discoveringdynamic communities in interaction networks. In JointEuropean Conference on Machine Learning andKnowledge Discovery in Databases, pages 678–693.Springer, 2014.

[24] J. Tang, M. Musolesi, C. Mascolo, and V. Latora.Temporal distance metrics for social network analysis.In Proceedings of the 2nd ACM workshop on Onlinesocial networks, pages 31–36. ACM, 2009.

[25] R. Tarjan. Enumeration of the elementary circuits of adirected graph. SIAM Journal on Computing,2(3):211–216, 1973.

[26] J. C. Tiernan. An efficient search algorithm to find theelementary circuits of a graph. Communications of theACM, 13(12):722–726, 1970.

[27] B. Viswanath, A. Mislove, M. Cha, and K. P.Gummadi. On the evolution of user interaction infacebook. In Proceedings of the 2nd ACM workshop onOnline social networks, pages 37–42. ACM, 2009.

[28] J. T. Welch Jr. A mechanical analysis of the cyclicstructure of undirected linear graphs. Journal of theACM (JACM), 13(2):205–210, 1966.

[29] H. Wu, J. Cheng, S. Huang, Y. Ke, Y. Lu, and Y. Xu.Path problems in temporal graphs. PVLDB,7(9):721–732, 2014.

[30] Y. Wu, C. Zhou, J. Xiao, J. Kurths, and H. J.Schellnhuber. Evidence for a bimodal distribution inhuman communication. Proceedings of the nationalacademy of sciences, 107(44):18803–18808, 2010.

[31] S. Yau. Generation of all hamiltonian circuits, paths,and centers of a graph, and related problems. IEEETransactions on Circuit Theory, 14(1):79–81, 1967.

1453

Documents

2SCENT: An Efﬁcient Algorithm for Enumerating … An Efﬁcient Algorithm for Enumerating All Simple Temporal Cycles Rohit Kumar Universite Libre de Bruxelles´ Brussels, Belgium