27
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 29-May 3, 2013 Mr. Scan: Efficient Clustering with MRNet and GPUs Evan Samanas and Ben Welton

Mr. Scan: Efficient Clustering with MRNet and GPUs

  • Upload
    rusk

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Mr. Scan: Efficient Clustering with MRNet and GPUs. Evan Samanas and Ben Welton. Density-based clustering. Discovers the number of clusters Finds oddly-shaped clusters. Clustering Example (DBSCAN [1] ). Goal: Find regions that meet minimum density and spatial distance characteristics. - PowerPoint PPT Presentation

Citation preview

Page 1: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Paradyn Project

Paradyn / Dyninst WeekMadison, Wisconsin

April 29-May 3, 2013

Mr. Scan: Efficient Clustering with MRNet and GPUs

Evan Samanas and Ben Welton

Page 2: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Density-based clusteringo Discovers the number of clusterso Finds oddly-shaped clusters

2Mr. Scan: Efficient Clustering with MRNet and GPUs

Page 3: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Goal: Find regions that meet minimum density and spatial

distance characteristics

The two parameters that determine if a point is in a cluster

is Epsilon (Eps), and MinPtsIf the number of points in Eps is > MinPts, the point is a core point.For every discovered point, this

same calculation is performed until the cluster is fully expanded

Clustering Example (DBSCAN[1])

3Mr. Scan: Efficient Clustering with MRNet and GPUs

EpsMinPts

MinPts: 3

[1] M. Ester et. al., A density-based algorithm for discovering clusters in large spatial databases with noise, (1996)

Page 4: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Scaling DBSCANo PDBSCAN (1999)[2]

o Quality equivalent to single DBSCANo Linear speedup up to 8 nodes

o DBDC (2004)[3]

o Sacrifices qualityo ~30x speedup on 15 nodes

o PDSDBSCAN (2012) [4]

o Quality equivalent to single node DBSCANo 5675x Speedup on 8192 nodes (72 Million Points)

o 2 Map/Reduce attempts (2011, 2012)o Quality equivalent to single node DBSCANo 6x speedup on 12 nodes

4Mr. Scan: Efficient Clustering with MRNet and GPUs

[2] X. Xu et. al., A fast Parallel Clustering Algorithm for Large Spatial Databases (1999)[3] E. Januzaj et. al., DBDC: Density Based Distributed Clustering (2004)[4] M Patwary et. al., A new scalable parallel DBSCAN algorithm using the disjoint-set data structure (2012)

Page 5: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Challenges of scaling DBSCANo Data distribution

o How do we effectively take an input file and create partitions that can be clustered by DBSCAN?oDistributed 2-D partitioner reading from a distributed file

systemo Load balancing

o How to keep variance in clustering times across nodes to a minimum?oDense Box

o Mergeo How do we reduce the amount of data needed for

the merge while keeping accuracy high?oRepresentative points

5Mr. Scan: Efficient Clustering with MRNet and GPUs

Page 6: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

6Mr. Scan: Efficient Clustering with MRNet and GPUs

MRNet – Multicast / Reduction Networko General-purpose

TBON APIo Network: user-defined

topologyo Stream: logical data channel

o to a set of back-endso multicast, gather, and custom

reductiono Packet: collection of datao Filter: stream data operator

o synchronizationo transformation

o Widely adopted by HPC toolso CEPBA toolkit o Cray ATP & CCDBo Open|SpeedShop & CBTFo STATo TAU

FE

…… …BE

appappappapp

BE

appappappapp

BE

appappappapp

BE

appappappapp

CP CP

CP CP CP CP

F(x1,…,xn)

Page 7: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

TBON Computation

7Mr. Scan: Efficient Clustering with MRNet and GPUs

FE

BE

appappappapp

BE

appappappapp

BE

appappappapp

CP CP

BE

appappappapp

Ideal Characteristics:o Filter output size constant or decreasing

o Computation rate similar across levels

o Adjustable for load balance

Data Size: 10MB per BE

Packet Size: ≤10 MB

Packet Size:≤10 MB

~10 sec

~40 sec

…4x

~10 sec

~10 sec

~10 sec

Total Time: ~30 sec

Total Time: ~60 sec

Page 8: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Intro to Mr. Scan

8Mr. Scan: Efficient Clustering with MRNet and GPUs

BE BE BE

CP CP

BE

DBSCAN

Merge

FE

Mr. Scan Phases

Partition: Distributed

DBSCAN: GPU(@ BE)

Merge: CPU (x #levels)

Sweep: CPU (x #levels)FE

BE BE BE BE

Merge

FS

Sweep

Sweep

Page 9: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Mr. Scan Architecture

9Mr. Scan: Efficient Clustering with MRNet and GPUs

Time: 0 Time: 18.2 Min

Partitioner

DB

SC

AN

Merge &

Sw

eep

Clustering 6.5 Billion Points

FS Read 224 Secs

FS Write489 Secs

MRNetStartup

130 Secs

FS Read: 24 Secs

DBSCAN168 Secs

Merge Time: 6 Secs

Sweep Time: 4 Secs

Write Output: 19 Secs

Page 10: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Partition Phaseo Goal: Partitions computationally equivalent to

DBSCANo Algorithm:

o Form initial partitionso Add shadow regionso Rebalance

10Mr. Scan: Efficient Clustering with MRNet and GPUs

Page 11: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Distributed Partitioner

11Mr. Scan: Efficient Clustering with MRNet and GPUs

Page 12: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

GPU DBSCAN Filter

12Mr. Scan: Efficient Clustering with MRNet and GPUs

DBSCAN is performed in two distinct steps

Step 1: Detect Core Points

Block 1

Block 2

Block 900

T1

T2

T512

T1

T2

T512

T1

T2

T512

Block 1T1

T2

T512

Block 2T1

T2

T512

Block 900T1

T2

T512

Step 2: Expand core points and color

Page 13: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Dense Box

13Mr. Scan: Efficient Clustering with MRNet and GPUs

• One significant scalability issue is dealing with dense regions of data

• Density increases the computation cost of DBSCAN

R2 Requires morecomparison operations

R1 R2

• We reduce the computation cost of high density regions by pre-clustering these regions

KD-Tree

Look at each leaf bounding box looking for boxes with point count > minpts and size < 0.35 * eps

DBSCAN no longer needs to expand these regions

`

Page 14: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Merge Algorithmo Merge overlapping clusters found on

different nodes. o Two steps in the merge operation

1. Select Representative points (BE)2. Merge operation

14Mr. Scan: Efficient Clustering with MRNet and GPUs

Page 15: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Representative Pointso These are points that represent the core

points in the dataset. o Create a boundary which at least one

core point shared between overlapping clusters must be contained.

15Mr. Scan: Efficient Clustering with MRNet and GPUs

Representative points are the points closest to the corners and middle of the side of the eps box

These points create a boundary (shaded region) which a point must fall in to merge overlapping clusters

Page 16: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Merge Algorithm

16Mr. Scan: Efficient Clustering with MRNet and GPUs

• Merge algorithm is responsible for merging overlapping clusters detected on different DBSCAN nodes.

• Need to handle the merge with low overhead and without the full dataset

Node 1 Node 2

Core Point

Non-Core Point

1. Core/Core overlap

Core Point in common. 64 operations to detect.

Node 1 Node 2

Core Point

Non-Core Point

2. Non-core/Core overlap

Core point seen as non-core by one node. MinPts * 2 operations required to detect

Page 17: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Sweep Stepo Get cluster identifiers and file offsets

down to BE’s to write final clusters. o FE gives each cluster a unique ID and a

file offset. o This data is passed back down to the BE

that holds the data in the cluster.o Data is written out to disk by the BE.

17Mr. Scan: Efficient Clustering with MRNet and GPUs

Page 18: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Experiment Setupo Dataset: Generated data with

distribution from real Twitter datao Measuring:

oWeak Scaling up to 8192 GPUso Strong ScalingoQuality compared to single-threaded

DBSCAN

18Mr. Scan: Efficient Clustering with MRNet and GPUs

Page 19: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Results

19Mr. Scan: Efficient Clustering with MRNet and GPUs

Weak Scaling: 4096x data/compute increase 18.48x-31.68x time increase

Page 20: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Results Breakdown – Partition Phase@ 6.5 Billion Points: 65.9% of Mr. Scan’s time

94.6% I/O time

20Mr. Scan: Efficient Clustering with MRNet and GPUs

Page 21: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Results Breakdown – GPU Cluster Time

21Mr. Scan: Efficient Clustering with MRNet and GPUs

Page 22: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Strong Scaling

22Mr. Scan: Efficient Clustering with MRNet and GPUs

Page 23: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Quality

23Mr. Scan: Efficient Clustering with MRNet and GPUs

Page 24: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Future Worko Remove partitioner’s I/O bottlenecko Multiple dimensions

24Mr. Scan: Efficient Clustering with MRNet and GPUs

Page 25: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Conclusiono Clustered 6.5 billion points with

DBSCAN in 18.2 minuteso Controlled computational variance of

DBSCANo Partitioner I/O = scaling enemy

25Mr. Scan: Efficient Clustering with MRNet and GPUs

Page 26: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Questions?

26A Brief Discussion of Ways and Means

Page 27: Mr. Scan: Efficient Clustering with  MRNet  and GPUs

Summary of previous Mr. Scan implementation

27Mr. Scan: Efficient Clustering with MRNet and GPUs

FE

BE BE BE

CP CP

BE

DBSCAN

Algorithm Steps

SpatialDecomp: CPU(@ FE)

DBSCAN: CPU or GPU(@ BE)

DrawBoundBox: CPU or GPU

MergeCluster: CPU (x #levels)

MergeCluster