22
Ben Rogers ITS – Research Services 1

Ben Rogers ITS Research Services

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Ben Rogers

ITS – Research Services

1

Image: http://www.mmi.org 2

Image: http://genomebiology.com/2010/11/5/207/figure/F2 3

4

Org Year Total Storage Biggest Hard Drive # of Hard Drives

Psychiatry 2004 1TB 300GB 4

ICTS 2008 250TB 1.5TB 167

Psychiatry 2013 120TB 4TB 30

ICTS 2013 1500TB 4TB 375

ITS-RS + ICTS 2013 2500TB 4TB 625

Data Awareness

Data Management

Data Storage

Campus Resources

National Resources

Questions

5

Sequencing changes faster than IT

Understand the data you will produce

Understand the data you will keep

Understand how the data will move

6

Understand the sizes of the data each instrument produces ◦ How often will you collect this data?

◦ What resources are needed to analyze and store each data set?

How will you handle? ◦ Raw Data

◦ Intermediate Data

◦ Derived Data

7

Must decide what data to keep ◦ How long?

◦ How will it be stored?

Is it cheaper to: ◦ Rerun the experiment

◦ Rerun the analysis

8

Data captured by the instrument must be moved

Terabytes of data may be involved

Moving terabytes of data across networks is not yet trivial ◦ The network is not always the bottleneck

Have questions? Ask for help!

9

Common Data Movements ◦ Instrument to local capture storage

◦ Capture storage to shared storage

◦ Shared storage to computational analysis resource

◦ Shared storage to desktop

◦ Shared storage to backup/replication

10

Globus Online – Fastest for big files but requires GridFTP

scp – Fetch (Mac) WS_FTP (Windows)

Network Drive File Copy – Slowest but simplest

External Hard Drive – Reasonably fast but requires physical movement

11

Transfer Mechanism Max Transfer Speed

External Hard Drive 100MB/second read + 100MB/second write + Walking Time

Gigabit Ethernet Up to 120MB/second

Typical Desktop Hard Drive 100MB/second

Typical Desktop SSD 300MB/second

GridFTP over 1Gb 120MB/second

CIFS over 1Gb 60-80MB/second

scp over 1Gb 60-100MB/second

Fastest network filesystem on campus

600MB/second single copy 6GB/second aggregate

12

Moving 1TB can easily take 3 hours or more!

There are many solutions ◦ Wiki, spreadsheet, database, etc

◦ Campus Options

Campus Wiki

Sharepoint

Redcap

Galaxy

Make sure you have backups!

13

Cheap storage is easy ◦ 3TB External USB Drive - $20/TB/Year

Big storage is harder ◦ 100TB Storage Server - $60/TB/Year

Big, fast, cheap, safe storage is much harder ◦ 100TB Storage Server Pair - $120/TB/Year

Checksum

High Performance Network

Backups

Redundancy

No economies of scale

14

15

$-

$500,000

$1,000,000

$1,500,000

$2,000,000

$2,500,000

$3,000,000

$3,500,000

$4,000,000

3TB 50TB 500TB 1000TB

USB 1X

USB 3X

Storage Server

Storage Server X2

Amazon (3 years)

Where you store the data can impact how fast you can analyze your data.

On Helium cluster during testing we saw over 100% difference in analysis time for BWA depending on where we stored the data.

If doing analysis on your desktop fast storage will likely improve analysis time for NGS.

If running directly on a cluster ask for recommendations on how to achieve best performance.

16

Galaxy

Redcap

Storage Resources

Computing Resources

17

CCOM R Drive ◦ http://www.medicine.uiowa.edu/it_researchers.asp

x?id=331

ITS Research Storage Services ◦ http://its.uiowa.edu/researchstorage

◦ RDSS – Launches September 2013

◦ Large Scale/Computational/Backup Storage

18

Helium Cluster ◦ 4000 processor core HPC cluster

◦ ~300TB scratch storage

◦ Open to any U of I researcher

◦ http://hpc.uiowa.edu

Neon Cluster ◦ Scheduled for production in November 2013

◦ Can buy compute nodes now

◦ 64GB, 256GB, and 512GB node configurations

NSF MRI Grant Under Consideration

19

Xsede ◦ National Network of Computing Resources

◦ Trinity on Blacklight

Up to 16TB of Ram

http://trinity-use-on-blacklight-psc.wikispaces.com/Trinity+Usage+on+Blacklight

20

Safe Photo - http://americanbestlocksmith.com/wp-content/uploads/2010/09/safe-installation.jpg

BioTeam - http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CF4QFjAA&url=http%3A%2F%2Fwww.bioteam.net%2Fwp-content%2Fuploads%2F2010%2F03%2Fcdag-xgen-storageForNGS_v3.pdf&ei=0cwWUJPJG4WHqQGihoHoDw&usg=AFQjCNFrzHSvQ8y4Ze3igsXd9mFV_EWb_Q

22