Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Exploring Scalable Implementations of TriangleEnumeration in Graphs of Diverse Densities:
Apache-Spark vs. GPU
Travis Johnston, Stephen Herbein, and Michela Taufer
Global Computing LaboratoryUniversity of Delaware
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 1
Introduction
Graphs are powerful tools for modeling.
Model social interaction:
Friendship graphsSocial networksCollaboration/Co-authorship graphsPhone call graphs
Model computer networks:
WWW (pages linking to other pages)WWW (hardware linking to other hardware)
Model data moving through a network:
Moving data from servers to users (WWW-hardware network)Infectious disease moving through a social network
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 2
Introduction
What information can the structure of a graph convey?
Identify the most influential nodes:
Personalities with many Twitter followerse.g. Katy Perry, Justin Beiber, Taylor Swift, and Barrack ObamaProlific authors/collaboratorse.g. Paul Erdos with ≥ 500 collaborators and ≥ 1525 papersImportant web pagese.g. get.adobe.com/reader/, cnn.com, and google.com
Identify communities
Friends with similar interestsWebsites with similar topicCriminal networks
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 3
Introduction
Why triangle enumeration?
Used to calculate local clustering coefficient
Used to compute transitivity ratio
Directly applicable in spam detection and web link recommendation
Finding triangles in graphs is a classic theoretical problem with numerouspractical applications. The recent explosion of work on social networkshas led to a great interest in fast algorithms to find triangles in graphs.The social sciences and physics communities often study triangles in realnetworks and use them to reason about underlying social processes. ...Triangle enumeration is also a fundamental subroutine for other morecomplex algorithmic tasks. [1]
[1] http://www.cs.princeton.edu/~csesha/pubs/conf-triangle-enum.pdf[2] http://people.seas.harvard.edu/~babis/int-math-triangles.pdf[3] http://arxiv.org/abs/0904.3761
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 4
Goal and Contributions
Our goal:
Study the efficiency of highly parallel algorithms for triangleenumeration on two parallel architectures
Our contributions:
Present two algorithmic implementations of Triangle Enumeration
Triangle Enumeration via matrix multiplication on GPU.Triangle Enumeration via MapReduce using Apache-Spark.
Critically compare the performance on two graph models:
Erdos-Renyi (ER) random graph modelPreferential attachment model
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 5
Goal and Contributions
Our goal:
Study the efficiency of highly parallel algorithms for triangleenumeration on two parallel architectures
Our contributions:
Present two algorithmic implementations of Triangle Enumeration
Triangle Enumeration via matrix multiplication on GPU.Triangle Enumeration via MapReduce using Apache-Spark.
Critically compare the performance on two graph models:
Erdos-Renyi (ER) random graph modelPreferential attachment model
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 5
What is a Graph?
Definition
A graph G = (V ,E ) contains a set of vertices V and a set of edges E .
Each edge e ∈ E is a set of two (distinct) vertices, e = {i , j}.
vertices
edges
vertices
edges
e
i j
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 6
What is a Graph?
Definition
A graph G = (V ,E ) contains a set of vertices V and a set of edges E .
Each edge e ∈ E is a set of two (distinct) vertices, e = {i , j}.
vertices
edges
vertices
edges
e
i j
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 6
What is a Graph?
Definition
A graph G = (V ,E ) contains a set of vertices V and a set of edges E .Each edge e ∈ E is a set of two (distinct) vertices, e = {i , j}.
vertices
edges
vertices
edges
e
i j
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 6
What is a Triangle?
Definition
Three vertices form a triangle if each pair of vertices share an edge.
1
2
3 4
5
6
This graph contains 2 triangles (blue).
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 7
Triangle Enumeration via Matrix Multiplication (GPU)
Definition
The Adjacency Matrix of a graph is a matrix An,n = [aij ] where aij = 1 ifvertex i is adjacent to vertex j , and aij = 0 otherwise.
1
2
3 4
5
6
A =
0 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 0
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 8
Triangle Enumeration via Matrix Multiplication (GPU)
Definition
The Adjacency Matrix of a graph is a matrix An,n = [aij ] where aij = 1 ifvertex i is adjacent to vertex j , and aij = 0 otherwise.
1
2
3 4
5
6
A =
0 1 0 0 0 01 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 0
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 8
Triangle Enumeration via Matrix Multiplication (GPU)
Definition
The Adjacency Matrix of a graph is a matrix An,n = [aij ] where aij = 1 ifvertex i is adjacent to vertex j , and aij = 0 otherwise.
1
2
3 4
5
6
A =
0 1 1 0 0 01 0 0 0 0 01 0 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 0
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 8
Triangle Enumeration via Matrix Multiplication (GPU)
Definition
The Adjacency Matrix of a graph is a matrix An,n = [aij ] where aij = 1 ifvertex i is adjacent to vertex j , and aij = 0 otherwise.
1
2
3 4
5
6
A =
0 1 1 0 0 01 0 1 0 0 01 1 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 0 0 0
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 8
Triangle Enumeration via Matrix Multiplication (GPU)
Definition
The Adjacency Matrix of a graph is a matrix An,n = [aij ] where aij = 1 ifvertex i is adjacent to vertex j , and aij = 0 otherwise.
1
2
3 4
5
6
A =
0 1 1 0 0 01 0 1 0 0 01 1 0 1 0 00 0 1 0 0 00 0 0 0 0 00 0 0 0 0 0
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 8
Triangle Enumeration via Matrix Multiplication (GPU)
Definition
The Adjacency Matrix of a graph is a matrix An,n = [aij ] where aij = 1 ifvertex i is adjacent to vertex j , and aij = 0 otherwise.
1
2
3 4
5
6
A =
0 1 1 0 0 01 0 1 0 0 01 1 0 1 0 00 0 1 0 1 00 0 0 1 0 00 0 0 0 0 0
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 8
Triangle Enumeration via Matrix Multiplication (GPU)
Definition
The Adjacency Matrix of a graph is a matrix An,n = [aij ] where aij = 1 ifvertex i is adjacent to vertex j , and aij = 0 otherwise.
1
2
3 4
5
6
A =
0 1 1 0 0 01 0 1 0 0 01 1 0 1 0 00 0 1 0 1 10 0 0 1 0 00 0 0 1 0 0
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 8
Triangle Enumeration via Matrix Multiplication (GPU)
Definition
The Adjacency Matrix of a graph is a matrix An,n = [aij ] where aij = 1 ifvertex i is adjacent to vertex j , and aij = 0 otherwise.
1
2
3 4
5
6
A =
0 1 1 0 0 01 0 1 0 0 01 1 0 1 0 00 0 1 0 1 10 0 0 1 0 10 0 0 1 1 0
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 8
Triangle Enumeration via Matrix Multiplication (GPU)
Definition
The Adjacency Matrix of a graph is a matrix An,n = [aij ] where aij = 1 ifvertex i is adjacent to vertex j , and aij = 0 otherwise.
1
2
3 4
5
6
A =
0 1 1 0 0 01 0 1 0 0 01 1 0 1 0 00 0 1 0 1 10 0 0 1 0 10 0 0 1 1 0
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 8
Triangle Enumeration via Matrix Multiplication (GPU)
Theorem
If A is the adjacency matrix of a simple graph G , then the ij th entry of Ak
is the number of walks on k edges beginning at vertex i and ending atvertex j.
Corollary
If A is the adjacency matrix of a simple graph G and A3 = [aij ] then the
number of triangles in G is1
6
∑aii =
tr(A3)
6.
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 9
Triangle Enumeration via Matrix Multiplication (GPU)
Theorem
If A is the adjacency matrix of a simple graph G , then the ij th entry of Ak
is the number of walks on k edges beginning at vertex i and ending atvertex j.
Corollary
If A is the adjacency matrix of a simple graph G and A3 = [aij ] then the
number of triangles in G is1
6
∑aii =
tr(A3)
6.
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 9
Triangle Enumeration via Matrix Multiplication (GPU)
CUBLAS is a CUDA implementation of the BLAS library for GPUs.
Algorithm:
Construct the adjacency matrix A
Copy A to the device (data movement)
Compute A3 using matrix multiplication (gemm)
Sum the diagonal entries of A3 (divide by 6)
Advantages:
Easy to use (library function call, twice)
A single GPU can execute many parallel threads (≥ 1000s)
Disadvantages:
Data movement from host to device can be expensive
Shared memory per thread is relatively small on GPU
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 10
Triangle Enumeration via Matrix Multiplication (GPU)
CUBLAS is a CUDA implementation of the BLAS library for GPUs.
Algorithm:
Construct the adjacency matrix A
Copy A to the device (data movement)
Compute A3 using matrix multiplication (gemm)
Sum the diagonal entries of A3 (divide by 6)
Advantages:
Easy to use (library function call, twice)
A single GPU can execute many parallel threads (≥ 1000s)
Disadvantages:
Data movement from host to device can be expensive
Shared memory per thread is relatively small on GPU
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 10
Triangle Enumeration via Matrix Multiplication (GPU)
CUBLAS is a CUDA implementation of the BLAS library for GPUs.
Algorithm:
Construct the adjacency matrix A
Copy A to the device (data movement)
Compute A3 using matrix multiplication (gemm)
Sum the diagonal entries of A3 (divide by 6)
Advantages:
Easy to use (library function call, twice)
A single GPU can execute many parallel threads (≥ 1000s)
Disadvantages:
Data movement from host to device can be expensive
Shared memory per thread is relatively small on GPU
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 10
Triangle Enumeration via MapReduce (Spark)
Spark is a framework designed for in-memory, fault-tolerant computing.
Algorithm:
(map) Each vertex v emits edges containing v
(map) Each vertex v emits angles (potential triangles)
(reduce) Combine edges and angles to form triangles
Advantages:
Can take advantage of all memory available to node
Agnostic to the number of nodes/cores
Disadvantages:
Between map and reduce there can be an expensive data movement
Processing done on CPU limits parallelism
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 11
Triangle Enumeration via MapReduce (Spark)
Spark is a framework designed for in-memory, fault-tolerant computing.
Algorithm:
(map) Each vertex v emits edges containing v
(map) Each vertex v emits angles (potential triangles)
(reduce) Combine edges and angles to form triangles
Advantages:
Can take advantage of all memory available to node
Agnostic to the number of nodes/cores
Disadvantages:
Between map and reduce there can be an expensive data movement
Processing done on CPU limits parallelism
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 11
Triangle Enumeration via MapReduce (Spark)
Spark is a framework designed for in-memory, fault-tolerant computing.
Algorithm:
(map) Each vertex v emits edges containing v
(map) Each vertex v emits angles (potential triangles)
(reduce) Combine edges and angles to form triangles
Advantages:
Can take advantage of all memory available to node
Agnostic to the number of nodes/cores
Disadvantages:
Between map and reduce there can be an expensive data movement
Processing done on CPU limits parallelism
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 11
Triangle Enumeration via MapReduce (Spark)
1
2 3
45
1→ ((1, 2),#), ((1, 5),#)
2→ ((2, 5),#)
3→ ((3, 2),#), ((3, 4),#)
4→ ((4, 5),#)
5→
1→ ((2, 5), 1)
2→3→ ((2, 4), 3)
4→5→
Phase I: Emit Edges (map)If vertex i and j share an edge, then vertex iemits a KV pair ((i , j),#) following RULE 1.
RULE 1:If (i , j) is emitted as an edge then either
deg(i) < deg(j), or
deg(i) = deg(j) and i < j .
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 12
Triangle Enumeration via MapReduce (Spark)
1
2 3
45
1→ ((1, 2),#), ((1, 5),#)
2→ ((2, 5),#)
3→ ((3, 2),#), ((3, 4),#)
4→ ((4, 5),#)
5→
1→ ((2, 5), 1)
2→3→ ((2, 4), 3)
4→5→
Phase II: Emit Angles (map)Each vertex k emits a KV pair ((i , j), k) if kshares an edge with both i and j , followingRULE 2.
RULE 2:If ((i , j), k) is emitted as an angle then (i , j)follows RULE 1 and either:
deg(k) < deg(i), or
deg(k) = deg(i) and k < i .
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 12
Triangle Enumeration via MapReduce (Spark)
1
2 3
45
1→ ((1, 2),#), ((1, 5),#)
2→ ((2, 5),#)
3→ ((3, 2),#), ((3, 4),#)
4→ ((4, 5),#)
5→
1→ ((2, 5), 1)
2→3→ ((2, 4), 3)
4→5→
Phase III: Combine by key (reduce)Transform the two sets of KV pairs into aset of Key Multi-value pairs. The result arepairs of the form ((i , j), L = [a, b, ...]).
If # ∈ L then the edge (i , j) completes thetriangles {a, i , j}, {b, i , j}, ...
If # /∈ L then (i , j) is not an edge of thegraph and so {a, i , j}, ... do not formtriangles.
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 12
Triangle Enumeration via MapReduce (Spark)
1
2 3
45
Phase I:
1→ ((1, 2),#), ((1, 5),#)
2→ ((2, 5),#)
3→ ((3, 2),#), ((3, 4),#)
4→ ((4, 5),#)
5→
Phase II:
1→ ((2, 5), 1)
2→3→ ((2, 4), 3)
4→5→
Phase III:
((1, 2), [#])
((1, 5), [#])
((2, 5), [#, 1])
((2, 4), [3])
((3, 2), [#])
((3, 4), [#])
((4, 5), [#])
(2, 5) completes the {1, 2, 5} triangle
(2, 4) would complete the {2, 3, 4} triangle
but (2, 4) is not an edge in the graph.
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 13
Triangle Enumeration via MapReduce (Spark)
1
2 3
45
Phase I:
1→ ((1, 2),#), ((1, 5),#)
2→ ((2, 5),#)
3→ ((3, 2),#), ((3, 4),#)
4→ ((4, 5),#)
5→
Phase II:
1→ ((2, 5), 1)
2→3→ ((2, 4), 3)
4→5→
Phase III:
((1, 2), [#])
((1, 5), [#])
((2, 5), [#, 1])
((2, 4), [3])
((3, 2), [#])
((3, 4), [#])
((4, 5), [#])
(2, 5) completes the {1, 2, 5} triangle
(2, 4) would complete the {2, 3, 4} triangle
but (2, 4) is not an edge in the graph.
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 13
Triangle Enumeration via MapReduce (Spark)
1
2 3
45
Phase I:
1→ ((1, 2),#), ((1, 5),#)
2→ ((2, 5),#)
3→ ((3, 2),#), ((3, 4),#)
4→ ((4, 5),#)
5→
Phase II:
1→ ((2, 5), 1)
2→3→ ((2, 4), 3)
4→5→
Phase III:
((1, 2), [#])
((1, 5), [#])
((2, 5), [#, 1])
((2, 4), [3])
((3, 2), [#])
((3, 4), [#])
((4, 5), [#])
(2, 5) completes the {1, 2, 5} triangle
(2, 4) would complete the {2, 3, 4} triangle
but (2, 4) is not an edge in the graph.
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 13
Triangle Enumeration via MapReduce (Spark)
1
2 3
45
Phase I:
1→ ((1, 2),#), ((1, 5),#)
2→ ((2, 5),#)
3→ ((3, 2),#), ((3, 4),#)
4→ ((4, 5),#)
5→
Phase II:
1→ ((2, 5), 1)
2→3→ ((2, 4), 3)
4→5→
Phase III:
((1, 2), [#])
((1, 5), [#])
((2, 5), [#, 1])
((2, 4), [3])
((3, 2), [#])
((3, 4), [#])
((4, 5), [#])
(2, 5) completes the {1, 2, 5} triangle
(2, 4) would complete the {2, 3, 4} triangle
but (2, 4) is not an edge in the graph.
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 13
Triangle Enumeration via MapReduce (Spark)
1
2 3
45
Phase I:
1→ ((1, 2),#), ((1, 5),#)
2→ ((2, 5),#)
3→ ((3, 2),#), ((3, 4),#)
4→ ((4, 5),#)
5→
Phase II:
1→ ((2, 5), 1)
2→3→ ((2, 4), 3)
4→5→
Phase III:
((1, 2), [#])
((1, 5), [#])
((2, 5), [#, 1])
((2, 4), [3])
((3, 2), [#])
((3, 4), [#])
((4, 5), [#])
(2, 5) completes the {1, 2, 5} triangle
(2, 4) would complete the {2, 3, 4} triangle
but (2, 4) is not an edge in the graph.
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 13
Graph Model 1: Erdos-Renyi (ER) Random Graphs
Two parameters: n (number of vertices) and p ∈ [0, 1] (a probability)
n − 1 vertices
new vertex
Each edge appearsindependently withprobability p
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 14
Graph Model 2: Preferential Attachment
Two parameters: n (number of vertices) and d (number of connections)
n − 1 vertices
new vertex
Choose d edges randomly,preferentially attaching tovertices with higher degree
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 15
Graph Model Comparison
Erdos-Renyi Random Graph:
Vertex degree: p(n − 1)
Edges: p2(n2
)∼ n2 (dense)
Triangles: p3(n3
)∼ n3
Preferential Attachment:
Vertex degree: 2d
Edges: nd ∼ n (sparse)
Triangles: O(n1.5)
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 16
Experimental Setup
Single Node:
2x Intel Xeon E5520 processors
4 physical cores, 8 logical cores each2.26 GHz
24 GB RAM
1x K20c GPU
2688 CUDA cores6 GB RAM
Random Graphs:5 random graphs with every pair of parameters:
n ∈ {1000, 1250, 1500, 1750, 2000, ..., 16000}p ∈ {.01, .02, .04, .08, .16} (Erdos-Renyi)
d ∈ {2, 4, 6, 8} (Preferential attachment)
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 17
Results: (Dense) Erdos-Renyi Random Graph Model
cuBLAS+GPU
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 18
Results: (Dense) Erdos-Renyi Random Graph Model
Spark
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 19
Results: (Dense) Erdos-Renyi Random Graph Model
cuBLAS+GPU Spark
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 20
Results: (Dense) Erdos-Renyi Random Graph Model
cuBLAS+GPU Spark
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 20
Results: (Sparse) Preferential Attachment Model
cuBLAS+GPU
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 21
Results: (Sparse) Preferential Attachment Model
Spark
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 22
Results: (Sparse) Preferential Attachment Model
Comparison of cuBLAS+GPU v. Spark
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 23
Conclusions
We explored the performance of two triangle enumeration algorithms.
Our algorithm using cuBLAS library + GPU
Performance depended only on graph size (vertices)Graph size bound∗ by memory on GPU
Larger graphs require the same data be moved between host and devicemore than once
a MapReduce algorithm tailored to Apache-Spark
Performance depends primarily on the vertex degreesAutomatically shares memory from all nodes (and disk)
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 24
Conclusions
We explored the performance of two triangle enumeration algorithms.
Our algorithm using cuBLAS library + GPU
Performance depended only on graph size (vertices)Graph size bound∗ by memory on GPULarger graphs require the same data be moved between host and devicemore than once
a MapReduce algorithm tailored to Apache-Spark
Performance depends primarily on the vertex degreesAutomatically shares memory from all nodes (and disk)
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 24
Conclusions
We explored the performance of two triangle enumeration algorithms.
Our algorithm using cuBLAS library + GPU
Performance depended only on graph size (vertices)Graph size bound∗ by memory on GPULarger graphs require the same data be moved between host and devicemore than once
a MapReduce algorithm tailored to Apache-Spark
Performance depends primarily on the vertex degreesAutomatically shares memory from all nodes (and disk)
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 24
Acknowledgements
Global Computing Laboratory, circa 2015
Travis Johnston, Stephen Herbein, and Michela Taufer Triangle Enumeration: Spark v. GPU 25