46
Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th 2010

Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Embed Size (px)

Citation preview

Page 1: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Scalable Data Clustering with GPUs

Andrew D. Pangborn

Thesis DefenseRochester Institute of Technology

Computer Engineering DepartmentFriday, May 14th 2010

Page 2: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Intro

• Overview of the application domain• Trends in computing architecture• GPU Architecture, CUDA• Parallel Implementation

Page 3: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Data Clustering

• A form of unsupervised learning that groups similar objects into relatively homogeneous sets called clusters

• How do we define similarity between objects?– Depends on the application domain, implementation

• Not to be confused with data classification, which assigns objects to predefined classes

Page 4: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Data Clustering Algorithms

Clustering Taxonomy from “Data Clustering: A Review”, by Jain et al. [1]

Page 5: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Example: Iris Flower Data

Page 6: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Flow Cytometry

• Technology used by biologists and immunologists to study the physical and chemical characteristics of cells

• Example: Measure T lymphocyte counts to monitor HIV infection [2]

Page 7: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Flow Cytometry

• Cells in a fluid pass through a laser

• Measure physical characteristics with scatter data

• Add fluorescently labeled antibodies to measure other aspects of the cells

Page 8: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Flow Cytometer

Page 9: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Flow Cytometry Data Sets

• Multiple measurements (dimensions) for each event– Upwards of 6 scatter dimensions and 18 colors per

experiment• On the order of 105 – 106 events• ~24 million values that must be clustered• Lots of potential clusters• Clustering can take many hours on a CPU

Page 10: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Parallel Computing

• Fortunately many data clustering algorithms lend themselves naturally to parallel processing

• Typically with clusters of commodity CPUs• Common APIs:– MPI: Message Passing Interface– OpenMP: Open Multi-processing

Page 11: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Multi-core

• Current trends: – Adding more cores– Application specific

extensions• SSE3/AVX, VT-x, AES-NI

– Point-to-Point interconnects, higher memory bandwidths

Page 12: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

GPU Architecture Trends

• Throughput Performance

•Pr

ogra

mm

abili

ty

• CPU

•G

PU• Figure based on Intel Larabee Presentation at SuperComputing 2009

• Fixed Function

• Fully Programmable

• Partially Programmable

• Multi-threaded • Multi-core • Many-core• Intel LarabeeNVIDIA CUDA

Page 13: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Tesla GPU Architecture

Page 14: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Tesla Cores

Page 15: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

GPGPU

• General Purpose computing on Graphics Processing Units

• Past– Programmable shader languages: Cg, GLSL, HLSL– Use textures to store data

• Present:– Multiple frameworks using traditional general

purpose systems and high-level languages

Page 16: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

CUDA: Software Stack

Page 17: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

CUDA: Streaming Multiprocessors

Page 18: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

CUDA: Thread Model• Kernel

– A device function invoked by the host computer

– Launches a grid with multiple blocks, and multiple threads per block

• Blocks– Independent tasks comprised of

multiple threads– no synchronization between blocks

• SIMT: Single-Instruction Multiple-Thread– Multiple threads executing time

instruction on different data (SIMD), can diverge if neccesary

Page 19: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

CUDA: Memory Model

Page 20: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

CUDA: Program FlowApplication Start

Search for CUDA Devices

Load data on host

Allocate device memory

Copy data to device

Launch device kernels to process data

Copy results from device to host memory

CPUMain Memory

Device MemoryGPU Cores

PCI-Express

Page 21: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

When is CUDA worthwhile?

• High computational density– Worthwhile to transfer data to separate device

• Both coarse-grained and fine-grained SIMD parallelism– Lots of independent tasks (blocks) that don’t require

frequent synchronization map to different multiprocessors on the GPU

– Within each block, lots of individual SIMD threads• Contiguous memory access patterns• Frequently/Repeatedly used data small enough to fit

in shared memory

Page 22: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

C-means

• Minimizes square error between data points and cluster centers using Euclidean distance

• Alternates between computing membership values and updating cluster centers

Page 23: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

C-means Parallel Implementation

Page 24: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

C-means Parallel Implementation

Page 25: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

EM with a Gaussian mixture model

• Data described by a mixture of M Gaussian distributions• Each Gaussian has 3 parameters

Page 26: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

E-step

• Compute likelihoods based on current model parameters

• Convert likelihoods into membership values

Page 27: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

M-step

• Update model parameters

Page 28: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

EM Parallel Implementation

Page 29: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

EM Parallel Implementation

Page 30: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Performance Tuning

• Global Memory Coalescing– 1.0/1.1 vs 1.2/1.3 devices

Page 31: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Performance Tuning

• Partition Camping

Page 32: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Performance Tuning

• CUBLAS

Page 33: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Multi-GPU Strategy

• 3 Tier Parallel hierarchy

Page 34: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Multi-GPU Strategy

Page 35: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Multi-GPU Implementation

• Very little impact on GPU kernel implementations, just their inputs / grid dimensions

• Discuss host-code changes

Page 36: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Data Distribution

• Asynchronous MPI sends from host instead of each node reading input file from data store

Page 37: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Results - Kernels

• Speedup figures • Speedup figures

Page 38: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Results - Kernels

• Speedup figures • Speedup figures

Page 39: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Results – Overhead

• Time-breakdown for I/O, GPU memcpy, etc

Page 40: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Multi-GPU Results

• Amdahl’s Law vs. Gustafson’s Law– i.e. Strong vs. Weak Scaling– i.e. Fixed Problem Size vs. Fixed-Time – i.e. True Speedup vs. Scaled Speedup

Page 41: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Fixed Problem Size Analysis

Page 42: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Time-Constrained Analysis

Page 43: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Conclusions

Page 44: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Future Work

Page 45: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

Questions?

Page 46: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th

References

1. A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM Comput. Surv., vol. 31, no. 3, pp. 264–323, 1999.

2. H. Shapiro, J. Wiley, and W. InterScience, Practical flow cytometry. Wiley-Liss New York, 2003.