Upload
clara-may
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
YOU LISUPERVISOR: DR. CHU XIAOWEN
CO-SUPERVISOR: PROF. LIU JIMINGTHURSDAY, MARCH 11, 2010
Speeding up k-Means by GPUs
1
Outline
Introduction Efficiency of data mining -> GPGPU -> k-means on
GPU;
Related work
Method
Research Plan
2
Efficiency of Data mining
Face the challenge of efficiency due to the increasing data
Parallel data mining
Fig.1
Fig.2
3
GPGPU
A general-purpose and high performance parallel hardware;
Supply another platform for parallelizing data mining algorithms.
Quadro FX 5600
NV35 NV40
G70G70-512
G71
Tesla C870
NV30
3.0 GHzCore 2 Quad3.0 GHz
Core 2 Duo3.0 GHz Pentium 4
GeForce8800 GTX
0
100
200
300
400
500
600
Jan 2003 Jul 2003 Jan 2004 Jul 2004 Jan 2005 Jul 2005 Jan 2006 Jul 2006 Jan 2007 Jul 2007
GFLOPS
DRAM
Cache
ALUControl
ALU
ALU
ALU
DRAM
CPU
GPU
Fig.3
5
k-means on GPU
Programming on GPU CUDA: integrated CPU+GPU , C program
k-Means Widely used in statistical data analysis, pattern recognition,
etc.; Easy to implement on CPU, suitable to implement on GPU;
6
Outline
Introduction
Related work UV_k-Means, GPUMiner and HP_k-Means;
Method
Research Plan
7
Related work
n k d MineBechon CPU
HP k-Means
UVk-Means
GPU Miner
2million
100 2 19.36 1.45 2.84 61.39
400 2 70.93 2.16 5.96 63.46
100 8 39.81 2.48 6.07 192.05
400 8 152.25 4.53 16.32 226.79
4million
100 2 38.74 2.88 5.64 130.36
400 2 141.84 4.38 11.94 126.38
100 8 79.60 4.95 12.85 383.41
400 8 304.46 9.03 34.54 474.83
Speed of k-Means on low dimension data, in second.
NVIDIA GTX 280 GPU; Intel(R) Core(TM) i5 CPU;
8
Outline
Introduction
Related work
Method and Results k-Means (three steps)-> step 1 -> step 2 -> step 3; Experiments;
Research Plan
9
k-Means algorithm
n data point;k centroid;
Compute distanc (ni, ki)
find the closest
centroid
compute new centroid
If centroid change?
Yes
No
End
Step 1 O(nkd)
Step 2 O(nk)
Step 3 O(nd)
Memory Mechanism
10
Memory Mechanism of GPU
Global Memory Large size Long latency
Register Small size Short latency User cannot control
Shared memory Medium size Short latency User control
11
k-Means on GPU
Key idea Increase the number of computing operation for each
global memory access; Adopts the method from matrix multiplication and
reduction.
Dimension is a key parameter For low dimension: use register; For high dimension: use shared memory;
12
k-Means on GPU
For low dimension
Read each data from global memory
once
13
k-Means on GPU
For high dimension
Read each data from global memory
once
14
Experiments
The experiments were conducted on a PC with an NVIDIA GTX280 GPU and an Intel(R) Core(TM) i5 CPU.
GTX 280 has 30 SIMD multi-processors, and each one contains eight processors and performs at 1.29 GHz. The memory of the GPU is 1GB with the peak bandwidth of 141.7 GB/sec.
The CPU has four cores running at 2.67 GHz. The main memory is 8 GB with the peak bandwidth of 5.6 GB/sec. We use Visual Studio 2008 to write and compile all the source code. The version of CUDA is 2.3.
We calculate the time of the application after the file I/O, in order to show the speedup effect more clearly.
15
Experiments
On low dimension data Compare with HP, UV and GPUMiner, the data is
generated randomly
Four to ten times faster than HP
16
Experiments
On high dimension data Compare with UV and GPUMiner, the data is from
KDD 1999.
Four to eight times faster than UV
17
Experiments
Compare with CPU
The results illustrate that our algorithm compares very favorably with other existing algorithms.
Forty to two hundred times faster than CPU version
18
Outline
Introduction
Related work
Method
Research Plan
19
Research Plan
Detail analysis about k-Means on GPU GFLOPS Deal with even larger data set
Other data mining algorithms on GPU K-nn SDP (widely used in protein identification )
20
Q & A
Thanks very much
21