28
IIIT Hyderabad Scalable Clustering using Multiple GPUs K Wasif Mohiuddin P J Narayanan Center for Visual Information Technology International Institute of Information Technology (IIIT) Hyderabad

Scalable Clustering using Multiple GPUs

  • Upload
    rozene

  • View
    42

  • Download
    1

Embed Size (px)

DESCRIPTION

Scalable Clustering using Multiple GPUs. K Wasif Mohiuddin P J Narayanan Center for Visual Information Technology International Institute of Information Technology (IIIT) Hyderabad. Introduction. Classification of data desired for meaningful representation. - PowerPoint PPT Presentation

Citation preview

Page 1: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Scalable Clustering using Multiple GPUs

K Wasif MohiuddinP J Narayanan

Center for Visual Information TechnologyInternational Institute of Information Technology (IIIT)

Hyderabad

Page 2: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Introduction• Classification of data desired for

meaningful representation.• Data in subset ideally shares common traits.• Unsupervised learning for finding hidden

structure.• Application in data mining, computer vision

with– Image Classification– Document Retrieval

• Simple K-Means algorithm December 20, 2011 HiPC - 2011 2

Page 3: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Need for High Performance Clustering

• Time Complexity of O(ndk+1 log n) where n- input vectors, d- dimension, k-centers

• A fast, efficient clustering implementation is needed to deal with large data, high dimensionality and large centers.

• In computer vision, 128-dim SIFT and 512-dim (GIST) are common. Features can run into several millions

• Bag of Words for Vocabulary generation using SIFT [Lowe IJCV04] vectors

December 20, 2011 HiPC - 2011 3

Page 4: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Challenges and Contributions

• Data: Storage of data format for quick and repeated access.

• Computational: O(ndk+1 log n) • Contributions: A complete GPU based

implementation with– Exploitation of intra-vector parallelism– Efficient Mean evaluation– Data Organization – Multi GPU framework

December 20, 2011 4HiPC - 2011

Page 5: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Related Work• General Improvements

– KD-trees [Moor et al, SIGKKD-1999– Triangle Inequality [Elkan, ICML-2003]

• Pre CUDA GPU Efforts Improvements– Fragment Shader [Hart et al, SIGGRAPH-

2004]

December 20, 2011 HiPC - 2011 5

Page 6: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Related Work (cont)• Recent GPU efforts

– Mean on CPU [Che et al, JPDC-2008] – Mean on CPU + GPU [Hong et al, WCCSIE-

2009] – GPU Miner [Ren et al, HP Labs-2009] – HPK-Means [Wu et al, UCHPC-2009] – Divide & Rule [Li et al, ICCIT-2010]

• Parallelism not exploited within data object.• Lacking efficiency in Mean evaluation on

GPU.• Proposed techniques are parameter dependant. December 20, 2011 HiPC - 2011 6

Page 7: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

K-Means

• Objective Function ∑i∑j‖xi(j) -cj‖2

1≤ i ≤ n, 1≤ j ≤ k• Euclidean distance : L2 norm• Steps:

– Membership Evaluation– New Mean Evaluation– Convergence

December 20, 2011 HiPC - 2011 7

Page 8: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Algorithm

• K random centers are initially chosen from input.

• Partitions data into k clusters• Observation belongs to the cluster with the

nearest mean.• Re-evaluate the new centers & continue the

process till convergence is attained.

December 20, 2011 HiPC - 2011 8

Page 9: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

K-Means on GPU

Membership Evaluation• Involves Distance and Minima evaluation.• Single thread per component of vector• Parallel computation done on ‘d’

components of input and center vectors stored in row major format.

• Log summation for distance evaluation.• For each input vector we traverse across all

centers stored in L2 cache.December 20, 2011 9HiPC - 2011

Page 10: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

K-Means on GPU (Cont)

December 20, 2011 HiPC - 2011 10

Membership Evaluation• Data objects stored in

row major format• Provides coalesced

access• Distance evaluation

using shared memory.• Root finding avoided

Page 11: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

K-Means on GPU (Cont)

• Mean Evaluation Issues

– Data rearrangement on CPU as per membership is time consuming.

– Concurrent writes– Random reads and writes– Non uniform distribution of labels for data

objects.

December 20, 2011 HiPC - 2011 11

Page 12: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Mean Evaluation on GPU

• Store labels and index in 64 bit records• Group data objects with same membership

using Splitsort operation. • We split using labels as key• Gather primitive used to rearrange input in

order of labels.• Sorted global index of input vectors is

generated.

December 20, 2011 12HiPC - 2011

Splitsort : Suryakant & Narayanan IIITH, TR 2009

Page 13: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Splitsort & Transpose Operation

December 20, 2011 HiPC - 2011 13

Page 14: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Mean Evaluation on GPU (cont)

• Row major storage of vectors enabled coalesced access.

• CUDPP segmented scan followed by compact operation for histogram count.

• Transpose operation before rearranging input vectors.

• Using segmented scan again we evaluated mean of rearranged vectors as per labels.

December 20, 2011 HiPC - 2011 14

Page 15: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Implementation Details

• Tesla – 2 vectors per block , 2 centers at a time– Centers accessed via shared memory

• Fermi – 2 vectors per block, 4 centers at a time– Centers accessed via global memory using L2

cache– More shared memory for distance evaluation

• Occupancy of 83% using 5136 KB of shared memory in case of Fermi.

December 20, 2011 HiPC - 2011 15

Page 16: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

ISSUES

• Too many distance evaluations• Convergence highly dependent on cluster

centers chosen.• Prior seeding using K-Means++ can reduce

the number of iterations.• Parameters like dimension, cluster centers

affect the performance apart from the input size of the vectors.

December 20, 2011 HiPC - 2011 16

Page 17: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Limitations of GPU device

• Highly computational & memory consuming algorithms.

• Limited Global and Shared memory on a GPU device.

• Division of computational load if more than one device is available.

• Utilization of every resource available.• Scalability of the algorithm

December 20, 2011 HiPC - 2011 17

Page 18: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Multi GPU Approach• Partition input data into

chunks proportional to number of cores.

• Broadcast ‘k’ centers to all the nodes.

• Perform Membership & partial mean on each of the GPUs sent to their respective nodes.

• Nodes direct partial sums to Master node.

• New means evaluated by Master node for next iteration.

December 20, 2011 HiPC - 2011 18

Page 19: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Results

• Generated Gaussian SIFT vectors• Variation in parameters n, d, k• Performance on CPU(32 bit, 2.7 Ghz), Tesla

T10, GTX 480 tested up to nmax :4 Million, kmax : 8000 , dmax : 256

• MultiGPU (4xT10 + GTX 480) nmax : 32 Million, kmax : 8000, dmax : 256

• Comparison with previous GPU implementations.

December 20, 2011 19HiPC - 2011

Page 20: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Overall Results

N, K CPU GPU Tesla T10

GTX 480 4xT10

10K, 80 1.3 0.119 0.18 0.097

50K, 800 71.3 2.73 1.73 0.891

125K, 2K 463.6 14.18 7.71 2.47

250K, 4K 1320 38.5 27.7 7.45

1M, 8K 28936 268.6 170.6 68.5

Times of K-Means on CPU, GPUs in seconds for d=128.

December 20, 2011 20HiPC - 2011

Page 21: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Overall Performance

• Mean evaluation reduced to 6% of the total time for large input of high dimensional data.

• Multi GPU provided linear speedup• Speedup of up to 170 on GTX 480• 6 Million vectors of 128 dimension

clustered in just 136 sec per iteration.• Achieved up to twice increase in speedup

against the best GPU implementationDecember 20, 2011 HiPC - 2011 21

Page 22: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Performance vs ‘n’

December 20, 2011 HiPC - 2011 22

Linear performance for variation in n, with d=128 and k=4,000.

Page 23: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Performance vs ‘d’

December 20, 2011 HiPC - 2011 23

Performance for variation in d, with n=1M and k=8,000.

Page 24: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Performance vs ‘k’

December 20, 2011 HiPC - 2011 24

Linear performance for variation in k, with n=50k and d=128.

Page 25: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Comparison

N K D Li et al Wu et al Our K-Means

2 Million 400 8 1.23 4.53 1.27

4 Million 100 8 0.689 4.95 0.734

4 Million 400 8 2.26 9.03 2.4

51,200 32 64 1320 - 0.191

51,200 32 128 28936 - 0.282

Running time of K-Means in seconds on GTX 280.

December 20, 2011 25HiPC - 2011

Page 26: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Performance on GPUs

December 20, 2011 HiPC - 2011 26

Performance of 8600, Tesla, GTX 480 for d=128 and k=1,000.

Page 27: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Conclusions• Achieved a speed of over 170 on single NVIDIA Fermi

GPU.• Complete GPU based implementation.• High Performance for large ‘d’ due to processing of vector

in parallel.• Scalable in problem size n, d, k and number of cores.• Use of operations like Splitsort, Transpose for coalesced

memory access.• Overcame memory limitations using Multi GPU frame

work.• Code will be available at http://cvit.iiit.ac.in soonDecember 20, 2011 27HiPC - 2011

Page 28: Scalable Clustering using Multiple GPUs

IIIT

Hyd

erab

ad

Thank You

Questions?