Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
4
Org Year Total Storage Biggest Hard Drive # of Hard Drives
Psychiatry 2004 1TB 300GB 4
ICTS 2008 250TB 1.5TB 167
Psychiatry 2013 120TB 4TB 30
ICTS 2013 1500TB 4TB 375
ITS-RS + ICTS 2013 2500TB 4TB 625
Sequencing changes faster than IT
Understand the data you will produce
Understand the data you will keep
Understand how the data will move
6
Understand the sizes of the data each instrument produces ◦ How often will you collect this data?
◦ What resources are needed to analyze and store each data set?
How will you handle? ◦ Raw Data
◦ Intermediate Data
◦ Derived Data
7
Must decide what data to keep ◦ How long?
◦ How will it be stored?
Is it cheaper to: ◦ Rerun the experiment
◦ Rerun the analysis
8
Data captured by the instrument must be moved
Terabytes of data may be involved
Moving terabytes of data across networks is not yet trivial ◦ The network is not always the bottleneck
Have questions? Ask for help!
9
Common Data Movements ◦ Instrument to local capture storage
◦ Capture storage to shared storage
◦ Shared storage to computational analysis resource
◦ Shared storage to desktop
◦ Shared storage to backup/replication
10
Globus Online – Fastest for big files but requires GridFTP
scp – Fetch (Mac) WS_FTP (Windows)
Network Drive File Copy – Slowest but simplest
External Hard Drive – Reasonably fast but requires physical movement
11
Transfer Mechanism Max Transfer Speed
External Hard Drive 100MB/second read + 100MB/second write + Walking Time
Gigabit Ethernet Up to 120MB/second
Typical Desktop Hard Drive 100MB/second
Typical Desktop SSD 300MB/second
GridFTP over 1Gb 120MB/second
CIFS over 1Gb 60-80MB/second
scp over 1Gb 60-100MB/second
Fastest network filesystem on campus
600MB/second single copy 6GB/second aggregate
12
Moving 1TB can easily take 3 hours or more!
There are many solutions ◦ Wiki, spreadsheet, database, etc
◦ Campus Options
Campus Wiki
Sharepoint
Redcap
Galaxy
Make sure you have backups!
13
Cheap storage is easy ◦ 3TB External USB Drive - $20/TB/Year
Big storage is harder ◦ 100TB Storage Server - $60/TB/Year
Big, fast, cheap, safe storage is much harder ◦ 100TB Storage Server Pair - $120/TB/Year
Checksum
High Performance Network
Backups
Redundancy
No economies of scale
14
15
$-
$500,000
$1,000,000
$1,500,000
$2,000,000
$2,500,000
$3,000,000
$3,500,000
$4,000,000
3TB 50TB 500TB 1000TB
USB 1X
USB 3X
Storage Server
Storage Server X2
Amazon (3 years)
Where you store the data can impact how fast you can analyze your data.
On Helium cluster during testing we saw over 100% difference in analysis time for BWA depending on where we stored the data.
If doing analysis on your desktop fast storage will likely improve analysis time for NGS.
If running directly on a cluster ask for recommendations on how to achieve best performance.
16
CCOM R Drive ◦ http://www.medicine.uiowa.edu/it_researchers.asp
x?id=331
ITS Research Storage Services ◦ http://its.uiowa.edu/researchstorage
◦ RDSS – Launches September 2013
◦ Large Scale/Computational/Backup Storage
18
Helium Cluster ◦ 4000 processor core HPC cluster
◦ ~300TB scratch storage
◦ Open to any U of I researcher
◦ http://hpc.uiowa.edu
Neon Cluster ◦ Scheduled for production in November 2013
◦ Can buy compute nodes now
◦ 64GB, 256GB, and 512GB node configurations
NSF MRI Grant Under Consideration
19
Xsede ◦ National Network of Computing Resources
◦ Trinity on Blacklight
Up to 16TB of Ram
http://trinity-use-on-blacklight-psc.wikispaces.com/Trinity+Usage+on+Blacklight
20
21
Safe Photo - http://americanbestlocksmith.com/wp-content/uploads/2010/09/safe-installation.jpg
BioTeam - http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CF4QFjAA&url=http%3A%2F%2Fwww.bioteam.net%2Fwp-content%2Fuploads%2F2010%2F03%2Fcdag-xgen-storageForNGS_v3.pdf&ei=0cwWUJPJG4WHqQGihoHoDw&usg=AFQjCNFrzHSvQ8y4Ze3igsXd9mFV_EWb_Q
22