Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
P R O T E C T I N G , M A I N T A I N I N G A N D I M P R O V I N G T H E H E A L T H O F A L L M I N N E S O T A N S
Dave Boxrud
2019 APHL Annual Meeting
Development of Bioinformatics Infrastructure in Minnesota
I am not a bioinformaticist
26/11/2019
In 2016
• Limited bioinformatics capacity at MDH
• No bioinformatics infrastructure
• Little bioinformatics expertise
• Hard to communicate with IT (speaking different languages)
• Linux not supported by MDH
• Did have knowledge of Minnesota Supercomputing Institute, did not know how to use it, did not know who to contact, even if we had access we had nobody that was trained in bioinformatics
36/11/2019
MDH Vision for Bioinformatics
• NGS is the future, the future is here
• Accumulating NGS data for multiple organisms, need to use the data
• Bionumerics for foodborne agents coming, won’t fulfill all our needs
• Flexible
• Sustainable
• Cost effective
• Accessible
56 / 1 1 / 2 0 1 9
Three Approaches to Bioinformatics Infrastructure
NY-Wadsworth CenterCDC CO-PHL
Possible Solutions
Build own infrastructure Cloud Computing Use Existing System
APHL Bioinformatics Fellow to the Rescue
• Sean Wang start APHL Bioinformatics Fellowship August 2016
• PhD-U of MN
• Experience with MSI
• Able to be an IT/bioinformatics/PH translator
Minnesota Supercomputing Institute (MSI)
• Mission-To provide advanced research computing infrastructure and expertise to the U. of MN research and scholarly community and the State of Minnesota in order to advance and accelerate research…
• HPC, Interactive HPC, data storage, specialized hardware
• Can utilize open source software (pipelines)
• Consulting and expertise
8
• No Linux, Mac OS systems were supported in the state• Cloud computing not viable option in MN at the time• MSI is partly funded by the State of Minnesota• MSI-provide services at cost• MSI-computing infrastructure, HPC consulting• Pre-installed a comprehensive list of bioinformatics tools
MSI-Most Appealing Option
9
Hurdles during Bioinformatics infrastructure building:• Contract• IT regulation• Workforce
Building Bioinformatics Infrastructure Challenges
1 0
• Contract renew• No root access -> Stratus “cloud computing”• Had to use VPN every time -> “white list” MDH IP cluster
On-going Issues
Cloud Computing for State Public Health Labs
• Advantages• No capital cost
• No maintenance cost
• Expand/contract resources as needed
• Portable
• Root access
• Disadvantages
• Variable ongoing cost
• Data security
• Still requires some expertise to setup
• Lack of support
1 2
Regional Bioinformatics Support
ND
SD NE
IA
MO
AROK
KS
Regional Bioinformatician
Minnesota Cloud Platform for Bioinformatics
▪ Pre-configured virtual machines with common tools installed▪ Documentation with sample commands for running tools▪ Ongoing support provided by MSI personnel
1 5
The bioinformatics infrastructure and NGS data flow of MDH
5 Illumina Miseq
Local server(38 TB)
Minnesota Super
computing Institute
(MSI)
AIMS
CLC Genomic workbenc
h
GeneiousPro
Data flow1 TB = 1000 GB1 GB = 1000 MB1MB = 1000 kb1 MB = 1000 B
Linux (Cent OS) Web-based Galaxy 15 TB primary storage HPC (High performance
Computer) Access Dedicated node access Consulting
Illumina Base space
FDACDC
NY-PHL
Strep pneumo pipelineInfluenza pipeline
Data Sharing
Lord of Rings Sequencers
Stratus
Stratus cloud computing• 80vCPU• 10TB of ephemeral
storage• 160GB RAM• Root access• Amazon S3 access
MinION
Bionumerics
1 6
Access approach
MSI white-listed Minnesota Department of Health IP address
1 7
Data transfer
• Between MDH server and MSI rely on MNET (Minnesota’s Network for Enterprise Telecommunications)
• Sequencing files can also directly downloaded from Basespace to MSI HP-storage
1 8
Current PHL IT setting cannot fulfill bioinformatics computing needs
An interdisciplinary study which combine Statistics, Computer Science and Biology
“Big data” of Genomics Translate meaningless millions of short sequences into meaningful and
actionable biological results Population level infectious disease surveillance will require massive
computing power to process the large amount of biological data The “soul” of bioinformatics is open-source computing platform
Linux system*Supercomputer “Mesabi” at UMN
1 96 / 1 1 / 2 0 1 9
Three Approaches to Bioinformatics Infrastructure
NY-Wadsworth CenterCDC
MN-PHL MA-PHLVA-PHL IA-PHL
CO-PHL
Pro: • Configurable
computing resource;• Ready-to-use system• Pre-installed software
and dependencies• Dedicated consulting
Con: • No root access• IT security• Data transfer• University-
government contract
Pro: • Configurable
computing resource;• Portability• Development
environments
Con: • IT security;• Data transfer• Data ownership
Possible Solutions
Build own system Cloud Computing Use University System
ChallengesMNIT did not support LinuxMNIT did not support use of open source softwareCostExpertise