15
OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Cluster Computing Applications Cluster Computing Applications Project Project Parallelizing BLAST Parallelizing BLAST Research Alliance of Minorities (RAM), Computer Science and Mathematics Division William Burke York College, City University of New York John Mugler and Stephen Scott Oak Ridge National Laboratory

Cluster Computing Applications Project Parallelizing BLAST

  • Upload
    gerald

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Cluster Computing Applications Project Parallelizing BLAST. William Burke York College, City University of New York John Mugler and Stephen Scott Oak Ridge National Laboratory. Research Alliance of Minorities (RAM), Computer Science and Mathematics Division. - PowerPoint PPT Presentation

Citation preview

Page 1: Cluster Computing Applications Project  Parallelizing BLAST

OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY

Cluster Computing Applications ProjectCluster Computing Applications Project Parallelizing BLAST Parallelizing BLAST

Research Alliance of Minorities (RAM), Computer Science and Mathematics Division

William BurkeYork College, City University of New York

John Mugler and Stephen ScottOak Ridge National Laboratory

Page 2: Cluster Computing Applications Project  Parallelizing BLAST

OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY

Parallelizing the BLAST Algorithm: Parallelizing the BLAST Algorithm: Feasible or NotFeasible or Not??

Bioinformatics Research needs faster text string matching algorithms.

The purpose of this project is to analyze the BLAST algorithm: Define the structure of BLAST.

State why it is a valuable Bioinformatics tool.

Explore parallelizations of BLAST.

BLAST matches query string fragments against a target database. Eliminates need to run a full text string comparison.

Speeds up search database search time.

Several methods of parallelizing BLAST have been explored.

Page 3: Cluster Computing Applications Project  Parallelizing BLAST

OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY

Introduction

Cluster infrastructure

Open Source Cluster Application Resources (OSCAR)

Cluster, Command and Control (C3)

eXtreme TORC (XTORC)

Cluster applications

Bioinformatics Toolsets

Basic Local Alignment Sequence Tool (BLAST)

Page 4: Cluster Computing Applications Project  Parallelizing BLAST

OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY

Infrastructure Overview

Red Hat Linux 7.2

OSCAR 1.3C3 - http://www.csm.ornl.gov/torc/C3/ LAM/MPI - http://www.lam-mpi.org/ Maui Scheduler - http://supercluster.org/maui/ MPICH - http://www-unix.mcs.anl.gov/mpi/mpich/ OpenSSH - http://www.openssh.com/ OpenSSL - http://www.openssl.org/ PBS - http://www.openpbs.org/ PVM - http://www.csm.ornl.gov/pvm/ SIS - http://www.sisuite.org/

Page 5: Cluster Computing Applications Project  Parallelizing BLAST

OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY

Red Hat Linux 7.2

Installation

Configuration

AdministrationNetwork Configuration.

Performance Monitoring.

Creating Scripts.

Page 6: Cluster Computing Applications Project  Parallelizing BLAST

OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY

OSCAR 1.3 and C3 Tools

OSCAR configures the head node.

OSCAR builds and configures compute nodes.

C3 reduces time and effort to operate and manage a cluster.

Page 7: Cluster Computing Applications Project  Parallelizing BLAST

OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY

eXtreme TORC

eXtreme TORC powered by OSCAR•65 Pentium IV Machines

•Peak Performance: 129.7 GFLOPS

•RAM memory: 50.152 GB

•Disk Capacity: 2.68 TB

•Dual interconnects

–Gigabit & Fast Ethernet

Page 8: Cluster Computing Applications Project  Parallelizing BLAST

OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY

The field ofThe field of

needs faster stringneeds faster string

BioinformaticsBioinformatics

matching algorithmsmatching algorithms

Page 9: Cluster Computing Applications Project  Parallelizing BLAST

OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY

Applications Overview

BLAST a Bioinformatics tool.

http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html

Parallelize BLAST’s algorithm.

BLASBLASTT

Page 10: Cluster Computing Applications Project  Parallelizing BLAST

OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY

BLAST a Bioinformatics Tool

What is BLAST?

A heuristic algorithm used for string matching query strings to a database.

How does BLAST algorithm work?String fragmentation.

Statistical means for comparison.

How can you parallelize BLAST on a computational cluster?

Page 11: Cluster Computing Applications Project  Parallelizing BLAST

OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY

Query word (W = 3)

QUERY: GSVEDTTGSQSLAALLNKCKTPQGQRLVNQWIKWPLMDKNRIEERLNLVEAFVEDA

PQG 18

neighborhood PEG 15

words PRG 14

PKG 14

PMG 13 neighborhood

PSG 13 score threshold

PQN 12 ( T = 13 )

Etc...

QUERY STRING SLAALLNKCKTPQGQWLVNQWIKWPLMDKNRIEERLN 365

----L--++K-P-G--+-----+-------------N

n DATABASE STRING GSWNLAALDKDPMGDKNRIEERLNLVEAIKWPLMDJN 330

The BLAST Search Algorithm

Page 12: Cluster Computing Applications Project  Parallelizing BLAST

OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY

Parallelization of BLAST

NBLAST SLRI Bioinformatics Toolkit

ParAlign

MOBLAST

www.usenix.org/publications/library/proceedings/ als2000/michalickova.html

DNA sequence matching processor

PARALIGN™

Page 13: Cluster Computing Applications Project  Parallelizing BLAST

OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY

Conclusion

BLAST algorithm has a diverse family

of programs.

Several implementations exist for parallelizing the BLAST algorithm.

Future work to include further

exploration of the various parallelized

BLAST algorithms on clusters.

Page 14: Cluster Computing Applications Project  Parallelizing BLAST

OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY

Acknowledgements

I would like to extend my thanks to Stephen L. Scott,

John Mugler, Thomas Naughton, and Brian Luethke for

their invaluable mentoring, Michaelangelo Salcedo for

his guidance, Debbie McCoy and Cheryl Hamby for their

support in the RAM program.

Page 15: Cluster Computing Applications Project  Parallelizing BLAST

OAK RIDGE NATIONAL LABORATORYU.S. DEPARTMENT OF ENERGY

Disclaimer

This research was performed under the Research Alliance for Minorities Program administered through the Computer Science and Mathematics Division, Oak Ridge National Laboratory. This Program is sponsored by the Mathematical, Information, and Computational Sciences Division; Office of Advanced Scientific Computing Research; U.S. Department of Energy. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725. This research used resources of the Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science, U.S. Department of Energy. This work has been authored by a contractor of the U.S. Government under contract DE-AC05-00OR22725. Accordingly, the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes.