36
An Introduction to CAMERA and Underlying Technologies Philip Papadopoulos University of California, San Diego San Diego Supercomputer Center California Institute of Telecommunications and Information Technology (Calit2)

An Introduction to CAMERA and Underlying Technologies

  • Upload
    allayna

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

An Introduction to CAMERA and Underlying Technologies. Philip Papadopoulos University of California, San Diego San Diego Supercomputer Center California Institute of Telecommunications and Information Technology (Calit2). PI Larry Smarr. Announced 17 Jan 2006. Public Release 13 March 2007 - PowerPoint PPT Presentation

Citation preview

Page 1: An Introduction to CAMERA and Underlying Technologies

An Introduction to CAMERA and Underlying Technologies

Philip Papadopoulos

University of California, San Diego

San Diego Supercomputer Center

California Institute of Telecommunications and Information Technology (Calit2)

Page 2: An Introduction to CAMERA and Underlying Technologies

PI Larry Smarr

Announced 17 Jan 2006. Public Release 13 March 2007$24.5M Over Seven Years

Page 3: An Introduction to CAMERA and Underlying Technologies

DNA Basics for Non-Biologists

• Nucleotide bases of DNA– ACTG (Adenine, Cytosine, Guanine, Thymine)– A Sequence of Bases Forms One Side of a DNA

Strand– Complementary Bases form the other side of

DNA – A matches T (pair)

– C matches G (pair)

• During cell replication, DNA is “unzipped” . The complementary side can then be replicated perfectly

• Human DNA is about 3 billion base pairs on 26 Chromosomes

Page 4: An Introduction to CAMERA and Underlying Technologies

Bases Amino Acids

• Triplets of nucleotide bases are called codons and define amino acids.– Amino acids are the basic building blocks of proteins– There are 20 amino acids, but 4^3 = 64 nucleotide combinations.– Many amino acids have multiple codons– Special codons (called start and stop codons) assist in DNA translation

during cell replication.

• Reading Frames of: GGGAAACCC– This raw sequence could be read as

– GGGAAACCC (GGG AAA CCC) (Glycine, Lysine, Proline)

– GGAAACCC (GGA AAC) (Glycine, Asparagine)

– GAAACCC (GAA ACC) (Glutamic Acid, Threonine)

Page 5: An Introduction to CAMERA and Underlying Technologies

Sequencing Tidbits

• The Institute for Genomic Research (TIGR) sequenced the genome of the bacterium Haemophilus influenzae in 1995 using shotgun sequencing

– 1.8 Million Base Pairs (Human: 3 Billion)

• Sequencing does NOT tell you what function a particular gene plays

• It is believed that only ~1.5% of human chromosome codes for expressed characteristics– The non-coding portions contain our genetic history– Unknown what function the rest our DNA plays

Page 6: An Introduction to CAMERA and Underlying Technologies

Most of Evolutionary Time Was in the Microbial World

You Are

Here

Source: Carl Woese, et al

Tree of Life Derived from 16S rRNA Sequences

Page 7: An Introduction to CAMERA and Underlying Technologies

Marine Genome Sequencing Project – Measuring the Genetic Diversity of Ocean Microbes

Sorcerer II Data Will Double Number of Proteins in GenBank!

Need Ocean Data

Page 8: An Introduction to CAMERA and Underlying Technologies

Some CAMERA Goals

• Provide an infrastructure where scientists from around the world can perform analysis on genetic communities– Global Ocean Sampling (GOS) is the initial large data set

– ~ 8.5 Billion base pairs of raw Reads

– Metadata is available for samples– Saline, Temperature, Geographic Location, Water Depth, Time of Day …

– Other metadata will be correlated with samples (e.g. MODIS Satellite)

• Allow others to search and compare input sequences against CAMERA data.

• Overall provide a resource dedicated to metagenomics – Support new datasets – Support new analysis tools and web services

Page 9: An Introduction to CAMERA and Underlying Technologies

Global Ocean Survey (GOS) Sequences are Largely Bacterial

Source: Shibu Yooseph, et al. (PLOS Biology in press 2006)

~3 Million Previously Known

Sequences

~5.6 Million GOS

Sequences

Page 10: An Introduction to CAMERA and Underlying Technologies

Reason for CAMERA

• The Global Ocean Survey (GOS) is a huge influx of sequence data

• Factors that interrelate microbes and microbial communities are not well known

• Significant analysis requires large resources– All-to-all comparisons – Integration of other environmental (meta) data (weather,

temperature, salinity,…) is essential

• Raw Sequence Data sets are mid-sized– Current set of GOS Raw Reads is about 100GB (FASTA

Files)

Page 11: An Introduction to CAMERA and Underlying Technologies

Calit2 CAMERA ProductionCompute and Storage Complex is On-Line

512 Processors ~5 Teraflops

~ 200 Terabytes Storage

Page 12: An Introduction to CAMERA and Underlying Technologies

User Map – 03 May 2007

• Site in production on 13 March 2007• More than 500 Registered users from around the globe (~10 new users/day)

Page 13: An Introduction to CAMERA and Underlying Technologies

Flat FileServerFarm

W E

B P

OR

TA

L

TraditionalUser

Response

Request

DedicatedCompute Farm(100s of CPUs)

TeraGrid: Cyberinfrastructure Backplane(scheduled activities, e.g. all by all comparison)

(10000s of CPUs)

Web(other service)

Local Cluster

LocalEnvironment

DirectAccess LambdaCnxns

Data-BaseFarm

10 GigE Fabric

Calit2’s Direct Access Core Architecture CAMERA’s Metagenomics Server Complex

Source: Phil Papadopoulos, SDSC, Calit2+

We

b S

erv

ice

s

Sargasso Sea Data

Sorcerer II Expedition (GOS)

JGI Community Sequencing Project

Moore Marine Microbial Project

NASA and NOAA Satellite Data

Community Microbial Metagenomics Data

Page 14: An Introduction to CAMERA and Underlying Technologies

Calit2 CAMERA ProductionCompute and Storage Complex is On-Line

Compute Nodes

1 and

10 Gb

it/sS

witch

ing

200 TB

File S

torag

e10 G

bit/s N

etwo

rk

Web

, Ap

plicatio

n, D

BS

ervers

Page 15: An Introduction to CAMERA and Underlying Technologies

Global Elements

• Data location – Storage Resource Broker Meta data catalog

• Data-type aggregation, cross-correlation, integration – BIRN Data Mediator

• Identity Management– Use Grid Security Infrastructure (GSI) Public Key

System – Integrated Grid Accounts Management Architecture

(GAMA) from SDSC for ease-of-use and Single Sign On

• Portal Services– Based on GridSphere – Small Dedicated Compute Cluster (32 nodes)

Page 16: An Introduction to CAMERA and Underlying Technologies

Cluster Nodes and File Servers

Logical Layout of Servers

Web Server

Portal Server

(Tomcat)

Single Sign-onServer

Postgres Database

GAMAServer

Blast Master (Jboss)

Cluster Frontend

Single Sign On Layer

Public Net

Private Net

Page 17: An Introduction to CAMERA and Underlying Technologies

An Incomplete List of Software Components

• Postgres Database• Apache Tomcat• Jboss Servlet Container• Google Web Toolkit• Sun Grid Engine• GAMA (Grid Accounting and Management Architecture)/GSI from Globus• OPAL (Grid/Web Services Wrapper)• GridSphere Portlet Container• CAMERA Registration Portal• Venter Application Portal• NCBI Blast, MPIBlast, ClustalW, MrBayes, CDHit, and host of other Bio

Software• Ergatis Workflow Engine• Jforums• Drupl• All Integrated with Rocks … Single Person Deployment

Page 18: An Introduction to CAMERA and Underlying Technologies

OptIPortal– Another Rocks Cluster Termination Device for the OptIPuter Global Backplane

• 20 Dual CPU Nodes, 20 24” Monitors, ~$50,000• 1/4 Teraflop, 5 Terabyte Storage, 45 Mega Pixels--Nice PC!• Scalable Adaptive Graphics Environment ( SAGE) Jason Leigh, EVL-UIC

Source: Phil Papadopoulos SDSC, Calit2

Page 19: An Introduction to CAMERA and Underlying Technologies

Use of OptIPortal to Interactively View Microbial Genome

Source: Raj Singh, UCSD

Acidobacteria bacterium Ellin345 (NCBI)Soil Bacterium 5.6 Mb

15,000 x 15,000 Pixels

Page 20: An Introduction to CAMERA and Underlying Technologies

Use of OptIPortal to Interactively View Microbial Genome

Source: Raj Singh, UCSDAcidobacteria bacterium Ellin345 (NCBI)

Soil Bacterium 5.6 Mb

15,000 x 15,000 Pixels

Page 21: An Introduction to CAMERA and Underlying Technologies

A Look at Networking

Introduction to QuartziteAn Experimental Network

Page 22: An Introduction to CAMERA and Underlying Technologies

Sunlight (10 Gigabit) Campus/WAN

Page 23: An Introduction to CAMERA and Underlying Technologies

Using a Lambda Network for CAMERA

• Many community databases – Protein Databank (PDB)

– GenBank

– SwissProt

• Support only web or web services interfaces– New analysis/programs need access to raw databases/files

– Usually, groups make a point-in-time copy of the database

– We call this a data “fork”– Updates are not processed– Papers published with point-in-time data out of date by months or

years• CAMERA “Direct Connect” will allow us to provide a high-speed connection

to the backend servers– Try to eliminate data forking

– Copies of CAMERA data is inevitable– Need mechanisms that allow others to keep their copies in synch with

CAMERA

Page 24: An Introduction to CAMERA and Underlying Technologies

UCSD Quartzite Core at Completion (Year 5 of OptIPuter)

QuartziteCore

CalREN-HPRResearch

Cloud

Campus Research Cloud

GigE Switch with Dual 10GigE Upliks

.....To cluster nodes

GigE Switch with Dual 10GigE Upliks

.....To cluster nodes

GigE Switch with Dual 10GigE Upliks

.....To cluster nodes

GigE

10GigE

...To othernodes

Quartzite Communications Core Year 3 (DWDM)

GlimmerGlass 128 port OOO

Juniper T320

4 GigE 4 pair fiber

Wavelength Selective

Switch(Lucent)

To 10GigE cluster node interfaces

..... To 10GigE cluster node interfaces and

other switches

Force10 E1200

32 10GigE

• Funded 15 Sep 2004

• Physical HW to Enable Optiputer and Other Campus Networking Research

• Hybrid Network Instrument

Reconfigurable Network and

Enpoints

Page 25: An Introduction to CAMERA and Underlying Technologies

25 | AT&T Labs, October 2007

4x4 Wavelength Cross-Connect:

• All integrated optics (except optical amplifiers)

– 4 1x4 WSS modules

– 4 4x1 passive optical combiners

• 4 x 40x 40Gbps = 6.4Tbps switching capacity

– currently using central 8

1x4 WSS

1x4 WSS

1x4 WSS

1x4 WSS

4x4 WXC rack

WSSs

combiners

OpticalAmps

Page 26: An Introduction to CAMERA and Underlying Technologies

26 | AT&T Labs, October 2007

WXC performance demonstration:

1x4 WSS

1x4 WSS

1x4 WSS

1x4 WSS

ASE source

4x1 switch OS

A8 lasers at centre of C-Band at 100GHz spacinguse ASE source to illustrate wide bandwidth1.use external 4x1 switch to scan WXC ports2.alter switch states of WSS1 and WSS3shown in movie on next page

WSS1

WSS2

WSS3

WSS4

1 1 2 3 42 2 3 4 13 3/1 4 1/3 24 4 1 2 35 1 2 3 46 2 3 4 11 3/1 4 1/3 28 4 1 2 3

Page 27: An Introduction to CAMERA and Underlying Technologies

27 | AT&T Labs, October 2007

WXC performance demonstration:

Page 28: An Introduction to CAMERA and Underlying Technologies

What Does it Cost to Drive the Network

• Dominant cost is DWDM optics

• Construction of Multiplexers is Simple, and not expensive ~ $250/Channel/End

Page 29: An Introduction to CAMERA and Underlying Technologies

Channel 31 Channel 32 Channel 33 Channel 34

10Gbps Switch X 4Per Side (optional)

XFP Switch Module X 4 Per Side (optional)

XFP DWDM Optics X 4 Per Side

Used in Host or Switch

SC to LC Fiber 2M X 5 Per Side

DWDM MuxTransmit X 1 Per

Side

DWDM DeMuxReceive X 1 Per Side

1 Fiber Pair

Corning 1U Rack Containing DWDM

Mux / DeMux + SC to SC couplers, 1 Per side

Layer 1 – Four Channel DWDM

Page 30: An Introduction to CAMERA and Underlying Technologies

1)OpticsSFP/XFP Optics Costs

DWDM Optics from AACTelecom

10Gbps Luminent XFP DWDM per unit (ZR 80Km) OC-192 and 10GE compatible

3500 US

10Gbps Luminent (assembled in US) XFP DWDM per Unit (ER 40Km) OC-192 and 10GE compatible

2900 US

1 Gbps SFP DWDM per Unit (80KM model)OC-48 compliantand 1 GE compatible

1220 US

10Gbps non-DWDM 1310nm (LR 10Km model)

700 US

Page 31: An Introduction to CAMERA and Underlying Technologies

10Gbps capable switch

SMC8748L2 (A0707505)+ EXP MOD-10G (A0707506) from Dell

Switch2 x 10Gbps XFP ports, 48 x 1Gbps Copper

1700 US

10 Gbps module (holds XFP)

300 US

2) Optional - Layer 2 Switch (10Gbps capable)

Page 32: An Introduction to CAMERA and Underlying Technologies

DWDM Mux DeMux (SC connector type)

4, 8 , 16 channel = DWDM-100From oemarket.com

4 Channel (31,32,33,34)

560 US

8 Channel 880 US

16 Channel 1600 (approx) US

3) DWDM Mux DeMux

Page 33: An Introduction to CAMERA and Underlying Technologies

Corning Mux DeMux container -1U rack mount

Corning PCH-01U from Ed Carlin Graybar

1 U (sufficient for 4, 8 or 16 channel)

200 US

2 sets of SC to SC adaptors

100 US (approx)

Fiber Patch Cables, Single Mode

From Ed Carlin Graybar

2M, SC to LC connector type

30 US (approx) each

4) Corning Rack Mount, Couplers, Fiber

Page 34: An Introduction to CAMERA and Underlying Technologies

Complete Solution

Page 35: An Introduction to CAMERA and Underlying Technologies

DWDM to Copper Media Converter

From Carl Stelling at Aaxeon.com

SFP pluggable DWDM to copper media converter

150 US each, not including DWDM optics (just converter)

5) Optional- DWDM Media Converter

Page 36: An Introduction to CAMERA and Underlying Technologies

Quartzite State Nov 2007

• Core Packet Switch with 68 10 GigE ports (More than ½ Terabit)• Approximately 30 Channels Lit• 64-port All-Optical Glimmerglass Switch - All Fiber into Quartzite is

switchable• 4 port x 8 Lambda DWDM switch at Lucent (On site at Calit2 in Dec)• 4 Channel DWDM Between Calit2 and SDSC

– One channel is used for 10Gigabit Production to BIRN Data Racks.

• Ordered, but waiting for fulfillment• 20 Mux/Demux (8 C-band DWDM Channels + 1 1310 (LR) Passband)• 32 DWDM XFPS (Channel 40-43 – will fill out rest of channels in 2008)