Upload
blake-tucker
View
214
Download
0
Embed Size (px)
Citation preview
HIGH PERFORMANCE
BIOINFORMATICS
Group May 09-06
Bryan McCoy
Kinit Patel
Tyson Williams
Advisor/Client: Zhao Zhang
What is Bioinformatics? Genetic sequencing Massive amounts of data Many simple operations Perfect for distributed
computing
Problem Current solutions are not realistically feasible
Too expensive○ Super computers○ High powered servers
Too slow○ Some inputs can takes several days
Need for high speed, low cost solutions
Our Solution Cell Processor
Based on Phase 1 Cluster of PlayStation 3s MPI
Message Passing Interface
IBM Cell Broadband Engine 1 Power Processing Element (PPE) 8 Synergistic Processing Elements (SPEs)
Only 6 SPEs are accessible on a PlayStation 3 4 high speed rings for processor communication
DNAPenny Compares DNA strands
from different species Score indicates
evolution similarities between two species
Branch and bound search algorithm
Functional requirements
FR1. Ported applications shall run on the Cell B.E.
FR2. The results returned shall be the same as the original program.
FR3. The applications shall return their runtime.
FR4. The applications shall execute in parallel on multiple Cell B.E.s.
Non-Functional Requirements NF1. The Cells shall all run on the Linux
OS. NF2. The resulting runtimes of the
ported applications shall be faster than on the original applications.
NF3. The ported application shall be coded in the C language.
Market Survey Results of the survey point to a huge speed
up of computationally intensive programs. Dr. Gaurav Khanna at the University of
Massachusetts Dartmouth used cluster of 8 PS3s to replace a supercomputer.
Universitat Pompeu Fabra, in Barcelona, deployed in 2007 a BOINC system called PS3GRID for collaborative biological computing.
Risk Assessment
Slow network speed Software support Limited RAM Hardware Failure
Lower quality entertainment hardware Limited prior experience Software development schedule
Resource Requirements
3 PlayStation 3s High performance network switch Cell programming books Front node (desktop computer) Time
Software Environment Use Fedora 9 OS as it
is currently supported by the Cell SDK 3.1.
Uses the command line for user interface.
Use the IBM XLC compiler and/or the current GCC compiler.
Hardware Environment 3 PlayStation 3s High speed Crossbar switch Private network Front Node (desktop computer)
Proxy serverNetwork File Store (NFS)
I/O Input
Inputs are DNA sequences stored in a text file.
Text is a CustalW alignment organized in Phylip format, a standard format for biological applications.
OutputOutputs are
○ The parsimony score○ The best trees○ The execution time
The score and best trees are output to the screen and to text files.
The execution time is output to a CSV (Comma Separated Value) file.
Work Breakdown Structure
Port Apps to Cluster PS3s
Problem Definition
Research Cell/B.E
Research Bioperf Suite
Research Distributed Parallel
Algorithms
Research Previously Done
Work
End Product Design
Design Requirements
Design Process
Design Documents
Considerations and Selections
Decide Which Linux to Install
Decide which applications to port
End Product Implementation
Hardware Implementation
Prototyping Implementation
Software Implementation
End Product Testing
Ensure Correctness of Output Results
Benchmarking
Final Documentation and
Demonstration
Create Final Report
Create Project Poster
Prepare for Presentation
Work Schedule Gant chart
Deliverables
Source Code Compiled Executable Runtime Comparisons Final Report Poster Final Presentation
Costs Time
Approximately 555 man hours total.
Freely donated.
Total cost $0.
Equipment3 PlayStation 3s
○ Provided by clientCrossbar router
○ Provided by clientStandard desktop computer
○ Provided by department
Total cost $0.
Development: Initial Overview Use MPI to distribute the program to the
multiple PlayStations. Each PlayStation would search one
branch of the tree. 1 function (supplement) took 90% of the
runtimePhase 1 ported this function to the SPEs
Development: Difficulties Found a bug in supplement. The bug did not affect results but did affect
runtime. We contacted the original developer, Dr.
Felsenstein at the University of Washington, who fixed the bug.
The fix significantly improved runtime. However, the fix negated all work done by
Phase 1 as supplement no longer took a significant amount of runtime.
Development: Reworking After the bug fix, no
single function took a significant amount of runtime.
We decided to distribute branches of the tree search to different processors.
Development: Results
Completed our goals Divided work among 3 PlayStation 3s.Produced faster code that comparable
sequential environment. Due to time constraints, we were not
able to port the code to the SPEs.
Testing
Used script to test multiple inputs. Averaged the runtimes. Used several different code revisions
and machines to provide comparisons. Projected the speedup that could be
attained if code was ported to SPEs.
Results: Actual Our current code is
20.76 times faster than the it was at the beginning of the semester.
Surpassed our original projections, which assumed the use of the SPEs.
Code revision Runtime (sec)
X Speedup X Speedup
(compared to desktop)
# of available cores
Original (Core2)
1861.66 0.116 1
With Bug Fixes (Core 2)
218.8 1 1
Original (1 PPE)
4953.51 1 0.044 1
With Bug Fixes (1 PPE)
662.70 7.47 0.330 1
MPI with Bug Fixes (3 PPEs)
238.57 20.76 0.917 3
MPI with Bug Fixes (3 PPEs, 18 SPEs)
(Projected)
34.08 145.35 6.420 21
Original Projections
334.82 14.79 0.653 21
Results: MPI The speedup for
MPI was 2.78. Excellent speedup
for 3 nodes.
Code revision Runtime (sec)
X Speedup X Speedup
(compared to desktop)
# of available cores
Original (Core2)
1861.66 0.116 1
With Bug Fixes (Core 2)
218.8 1 1
Original (1 PPE)
4953.51 1 0.044 1
With Bug Fixes (1 PPE)
662.70 7.47 0.330 1
MPI with Bug Fixes (3 PPEs)
238.57 20.76 0.917 3
MPI with Bug Fixes (3 PPEs, 18 SPEs)
(Projected)
34.08 145.35 6.420 21
Original Projections
334.82 14.79 0.653 21
Results: Comparison Our final code came
close to a high powered desktop.Core 2 Quad at
2.66 GHz Our projected
results indicate a speedup of 6.4.
Code revision Runtime (sec)
X Speedup X Speedup
(compared to desktop)
# of available cores
Original (Core2)
1861.66 0.116 1
With Bug Fixes (Core 2)
218.8 1 1
Original (1 PPE)
4953.51 1 0.044 1
With Bug Fixes (1 PPE)
662.70 7.47 0.330 1
MPI with Bug Fixes (3 PPEs)
238.57 20.76 0.917 3
MPI with Bug Fixes (3 PPEs, 18 SPEs)
(Projected)
34.08 145.35 6.420 21
Original Projections
334.82 14.79 0.653 21
Results: Projected
Using all SPEs, the speedup should be 145.35 Assuming SPEs run as fast
as the PPEs○ Before SPE vectorization
Code revision Runtime (sec)
X Speedup X Speedup
(compared to desktop)
# of available cores
Original (Core2)
1861.66 0.116 1
With Bug Fixes (Core 2)
218.8 1 1
Original (1 PPE)
4953.51 1 0.044 1
With Bug Fixes (1 PPE)
662.70 7.47 0.330 1
MPI with Bug Fixes (3 PPEs)
238.57 20.76 0.917 3
MPI with Bug Fixes (3 PPEs, 18 SPEs)
(Projected)
34.08 145.35 6.420 21
Original Projections
334.82 14.79 0.653 21
Conclusions
Achieved our goal of using MPI to get runtime improvement.
Contributed a major fix to a widely used application.
Surpassed our initial runtime goal. Projected results show an even larger
runtime improvement still possible.
Acknowledgements May08-24 group (phase I)
Kyle Byerly Shannon McCormick Matt Rohlf Bryan Venteicher
DNAPenny Author Dr. Felsenstein
Advisor Zhao Zhang
Environment Help Steve Nystrom
Questions?