13
North Carolina Bioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina

North Carolina Bioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina

Embed Size (px)

Citation preview

Page 1: North Carolina Bioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina

North CarolinaBioinformatics Grid

Thom H. Dunning, Jr.

HPCC Division, MCNCChemistry, University of North Carolina

Page 2: North Carolina Bioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina

GenomicsA Compute- & Data-Intensive Science

* from TimeLogic

Page 3: North Carolina Bioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina

Data ExplosionRapid Growth of GenBank

No.

Gb

ases

Growth of GenBank Number of base pairs

increasing dramatically (exponentially)

Growth in 2002 due to additions in just 21 days!

1982 1986 1990 1994 1998 20020

5

10

15

20

Page 4: North Carolina Bioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina

Data ExplosionNumber and Diversity of Databases

Nucleic Acids Research, 2002, Vol. 30, No. 1

Table 1. Molecular Biology Database Collection

Major Public Sequence Repositories

DNA Data Bank of Japan (DDBJ) http://www.ddbj.nig.ac.jp All known nucleotide and protein sequences…

Varied Biomedical Content

VirOligo http://viroligo.okstate.edu Virus-specific oligonucleotides for PCR and…

333 Databases

Page 5: North Carolina Bioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina

Computing ExplosionAssembly and Analysis of Genomic Data

Celera Genomics–Assembling the Genome Compaq Alpha Clusters Number of processors: ~ 750 Peak performance: 1 teraops

NuTech Sciences–Mining the Genome IBM p640 System Number of processors: ~ 5,000 Peak performance: 7½ teraops Total memory: 2½ terabytes Total disk storage: 50 terabytes

Page 6: North Carolina Bioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina

GenomicsMeeting the Information Challenge

GridMiddleware

DataStorage

Computers

Network

Page 7: North Carolina Bioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina

North Carolina Supercomputing Center

Page 8: North Carolina Bioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina

North CarolinaResearch and Education Network

Greensboro

Charlotte

Pembroke

WinstonSalem

NCSU

NCSUCentennialCampus

NCCUDuke

UNC-CH

Wilmington

ElizabethCity

Asheville

Cullowhee Fayetteville

Greenville

RTP

MCNC

Boone

MoreheadCity

Rocky Mount

Qwest

RTP RPoP

NCREN3• Increased bandwidth• Increased reliability• Increased resiliency

Page 9: North Carolina Bioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina

Grid Technologies

Major New Computing Technology Under development since mid-1990s

Distinguishing Characteristics “Middleware” to support efficient resource sharing in a

distributed, heterogeneous computing and data storage environment

Focus on use of large-scale computing and data storage

Some Major Grid Efforts NASA IPG—Testbed linking selected NASA centers DataGrid—International Grid being developed for high-

energy physics (CERN)

Page 10: North Carolina Bioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina

Grid Technologies (cont’d)

Some Major Grid Efforts (cont’d)

GriPhyN—Research in Grid technologies for physics applications (Argonne, Florida)

e-Science Grid—Major effort in UK to develop a Grid infrastructure for science and engineering research

BIRN—Data Grid focused on neuroimaging data (UCSD, SDSC)

Page 11: North Carolina Bioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina

North CarolinaGenomics and Bioinformatics Consortium

Goal Provide a venue for Consortium members to share

information and resources, plan strategic initiatives, and form alliances

Distributed Across North Carolina Concentration in Research Triangle, but extends across

all of North Carolina

Diverse Goals and Expertise Human health, including animal models; agriculture

and forestry; evolutionary biology basic research; tool development

Page 12: North Carolina Bioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina

Overall NC BioGrid Architecture

Computing and Data Resources

Network

Grid Middleware

BioApp#1

BioApp#2

BioApp#3

Globus, Legion, …

Grid-aware, -enabled bioinformatics applications

NCREN3

NCSC plusMember’s Computing Centers

Page 13: North Carolina Bioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina

NC BioGrid Project

Two Phases Testbed Phase—test existing middleware, resolve

issues, prepare detailed plan (12-18 months) Production Phase—create and operate NC BioGrid

Funding for Testbed from MCNC

Project Manager Phil Emer, MCNC, Chief Architect/NC BioGrid

Project Oversight MCNC Board of Directors HPCC Advisory Board NC BioGrid Technical Advisory Group