View
2
Download
0
Category
Preview:
Citation preview
Undergraduate ProjectAdvisor : Prof. Subodh Kumar
Chirag Jain (2010CS10215)IIT Delhi
Parallelizing Smith Waterman Algorithm on
GPUs
OUTLINE
• Smith Waterman Problem
• Parallelization [A. Khajeh-Saeed et al. ]
• Reproduced Results
SW Problem
Local Alignment:
A T C A G A G T CG T C A G T C A
SW Problem
A T C A G A G T CG T C A G - - T C A
Local Alignment:
A T C A G A G T CG T C A G T C A
SW Problem
SW ProblemRecursion :
Hi,j = Score at cell (i,j)Si,j = Match / Mismatch ScoreGs = Gap opening penalty
Ge = Gap extension penalty
SW Problem
Time Complexity : O(L1L2(L1+L2))
L1 = Length of Database SequenceL2 = Length of Database Sequence
SW Problem
Time Complexity : O(L1L2)
L1 = Length of Database SequenceL2 = Length of Database Sequence
Store three values at each cell :
Ei,j = Max {Ei,j-1, Hi,j-1 - Gs} - Ge
Fi,j = Max {Fi-1,j, Hi-1,j - Gs} - Ge
Hi,j = Score at cell (i,j) = Max {Hi-1,j-1 + Si,j, Ei,j, Fi,j}
SW Problem
Result:GCC- UCGCGCCAUUGC
SW Problem
Optimization
Parallel Scan Approach
[A. Khajeh-Saeed et al. 2010]
Parallel Scan Approach
Step 1
Fi,j = Max {Fi-1,j, Hi-1,j - Gs} - Ge~Hi,j = Score at cell (i,j) = Max {Hi-1,j-1 + Si,j, Fi,j}
Parallel Scan Approach
Step 2
~Ei,j = Maxi<k<j (~Hi,j-k - kGe)
Parallel Scan Approach
Step 3
Hi,j = Max (~Hi,j , ~Ei,j - Ge)
Parallel Scan ApproachStep 2 (Revisited)
Parallel Scan ApproachStep 2 (Revisited)
Results
• SSCA Benchmark
• Kernel 1 : Pairwise Local Alignment of Sequences
• Kernel 3 : Locating similar sequences
Results
• Using:
• Single Core of Intel Xeon CPU
• GPUs : Tesla M2070
1.E+00&1.E+01&1.E+02&1.E+03&1.E+04&1.E+05&1.E+06&1.E+07&
1.E+07&
3.E+07&
6.E+07&
2.E+08&
4.E+08&
1.E+09&
3.E+09&
6.E+09&
2.E+10&
Time%(m
s)%
Input%Size%
Time%taken%by%Kernel%1%
CPU&2.67GHz&
GPU&M2070&
Results
0"5"10"15"20"25"30"35"40"
1.E+07"
3.E+07"
6.E+07"
2.E+08"
4.E+08"
1.E+09"
3.E+09"
6.E+09"
2.E+10"
Factor'
Input'Size'
Speed'Up(over'CPU)':'Kernel'1'
GPU"M2070"
Results
1.E+00&1.E+01&1.E+02&1.E+03&1.E+04&1.E+05&1.E+06&1.E+07&
1.E+07&
3.E+07&
6.E+07&
2.E+08&
4.E+08&
1.E+09&
3.E+09&
6.E+09&
2.E+10&
Time%(m
s)%
Input%Size%
Time%taken%by%Kernel%3%
CPU&2.67GHz&
GPU&M2070&
Results
0"
5"
10"
15"
20"
25"
1.E+07"
3.E+07"
6.E+07"
2.E+08"
4.E+08"
1.E+09"
3.E+09"
6.E+09"
2.E+10"
Factor'
Input'Size'
Speed'Up(over'CPU)':'Kernel'3'
GPU"M2070"
Results
Results
Over multiple GPUs
0"1"2"3"4"5"6"7"8"
1.E+07"
3.E+07"
6.E+07"
2.E+08"
4.E+08"
1.E+09"
3.E+09"
6.E+09"
2.E+10"
Factor'
Input'Size'
Kernel'1:'Speed'Up'over'Single'GPU'
Two"M2070"
Four"M2070"
Eight"M2070"
Sixteen"M2070"
Results
0"1"2"3"4"5"6"7"8"
1.E+07"
3.E+07"
6.E+07"
2.E+08"
4.E+08"
1.E+09"
3.E+09"
6.E+09"
2.E+10"
Factor'
Input'Size'
Kernel'3:'Speed'Up'over'Single'GPU'
Two"M2070"
Four"M2070"
Eight"M2070"
Sixteeen"M2070"
Results
References
• A. Khajeh-Saeed et al. , Acceleration of the Smith–Waterman algorithm using single and multiple graphics processors, J. Computational Physics 229 (2010) 4247–4258
• "Designing Scalable Synthetic Compact Applications for Benchmarking High Productivity Computing Systems ," CTWatch Quarterly, Volume 2, Number 4B, November 2006 B.
Thank You
Recommended