Computational Research in the
Battelle Center for Mathmatical
medicine
Statistical Genetics Research
• Develop statistical methodologies• Localize and characterize human disease
genes– Autism, CLP, AITD, Schizophrenia
• Our approach involves millions of likelihood calculations at each genomic position
• Each multi-parameter likelihood could be slow
Polynomial approach
• Represent likelihoods as algebraic polynomials
• Built once/family/position
• Evaluated millions of times
• Challenges – High memory demand– Time consuming
Computing Infrastructure
• 65 node Cluster w/ 4TB storage– Head Node (AMD Opteron 2xDuo Core 2.4
GHz, 16GB RAM) – 16 Compute Nodes (AMD Opteron 2xDuo
2.4GHz, 8GB RAM)– 16 Compute Nodes (AMD Opteron 2xDuo 2.4
GHz, 16GB RAM)– 32 Compute Nodes (AMD Opteron 2xDuo 2.8
GHz, 16GB RAM)
Database Servers
• Master server– 2 x 3.0 GHz Intel Xeon Quad Core– 16GB RAM– 300GB RAID 1 for OS– 900GB RAID 5 for data
• Slave server– 2 x 2.0 GHz Intel Xeon Dual Core– 16GB RAM– 300GB RAID 1 for OS– 900GB RAID 5 for data
Data Management
MySQL master/slave setup
Client
Master Slave
replicate
inputinput
Webserver
output
output
Data Management
• Manage/store genetic data– Million SNPs per individual
• Data upload to OSC– ~ 1TB a week– ~ 30 TB total
• Current live disk is full• Move data to tape library
– View content (just file and folder names)– start retrieve jobs on our own
Desire Summary
• Access to nodes with sufficient memory
• Access to massive storage
• Great collaborations