19
PROTECTING, MAINTAINING AND IMPROVING THE HEALTH OF ALL MINNESOTANS Dave Boxrud 2019 APHL Annual Meeting Development of Bioinformatics Infrastructure in Minnesota

Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

P R O T E C T I N G , M A I N T A I N I N G A N D I M P R O V I N G T H E H E A L T H O F A L L M I N N E S O T A N S

Dave Boxrud

2019 APHL Annual Meeting

Development of Bioinformatics Infrastructure in Minnesota

Page 2: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

I am not a bioinformaticist

26/11/2019

Page 3: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

In 2016

• Limited bioinformatics capacity at MDH

• No bioinformatics infrastructure

• Little bioinformatics expertise

• Hard to communicate with IT (speaking different languages)

• Linux not supported by MDH

• Did have knowledge of Minnesota Supercomputing Institute, did not know how to use it, did not know who to contact, even if we had access we had nobody that was trained in bioinformatics

36/11/2019

Page 4: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

MDH Vision for Bioinformatics

• NGS is the future, the future is here

• Accumulating NGS data for multiple organisms, need to use the data

• Bionumerics for foodborne agents coming, won’t fulfill all our needs

• Flexible

• Sustainable

• Cost effective

• Accessible

Page 5: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

56 / 1 1 / 2 0 1 9

Three Approaches to Bioinformatics Infrastructure

NY-Wadsworth CenterCDC CO-PHL

Possible Solutions

Build own infrastructure Cloud Computing Use Existing System

Page 6: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

APHL Bioinformatics Fellow to the Rescue

• Sean Wang start APHL Bioinformatics Fellowship August 2016

• PhD-U of MN

• Experience with MSI

• Able to be an IT/bioinformatics/PH translator

Page 7: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

Minnesota Supercomputing Institute (MSI)

• Mission-To provide advanced research computing infrastructure and expertise to the U. of MN research and scholarly community and the State of Minnesota in order to advance and accelerate research…

• HPC, Interactive HPC, data storage, specialized hardware

• Can utilize open source software (pipelines)

• Consulting and expertise

Page 8: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

8

• No Linux, Mac OS systems were supported in the state• Cloud computing not viable option in MN at the time• MSI is partly funded by the State of Minnesota• MSI-provide services at cost• MSI-computing infrastructure, HPC consulting• Pre-installed a comprehensive list of bioinformatics tools

MSI-Most Appealing Option

Page 9: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

9

Hurdles during Bioinformatics infrastructure building:• Contract• IT regulation• Workforce

Building Bioinformatics Infrastructure Challenges

Page 10: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

1 0

• Contract renew• No root access -> Stratus “cloud computing”• Had to use VPN every time -> “white list” MDH IP cluster

On-going Issues

Page 11: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

Cloud Computing for State Public Health Labs

• Advantages• No capital cost

• No maintenance cost

• Expand/contract resources as needed

• Portable

• Root access

• Disadvantages

• Variable ongoing cost

• Data security

• Still requires some expertise to setup

• Lack of support

Page 12: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

1 2

Regional Bioinformatics Support

ND

SD NE

IA

MO

AROK

KS

Regional Bioinformatician

Page 13: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

Minnesota Cloud Platform for Bioinformatics

▪ Pre-configured virtual machines with common tools installed▪ Documentation with sample commands for running tools▪ Ongoing support provided by MSI personnel

Presenter
Presentation Notes
Part of our role as BRR is to develop infrastructure in regional states Providing cloud VMs is one way we do this Collaboration with MSI Note that it is not available yet Training will transfer Also: Accepting names for the system
Page 14: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

WWW.HEALTH.MN.GOV

[email protected]

Thank you.

Page 15: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

1 5

The bioinformatics infrastructure and NGS data flow of MDH

5 Illumina Miseq

Local server(38 TB)

Minnesota Super

computing Institute

(MSI)

AIMS

CLC Genomic workbenc

h

GeneiousPro

Data flow1 TB = 1000 GB1 GB = 1000 MB1MB = 1000 kb1 MB = 1000 B

Linux (Cent OS) Web-based Galaxy 15 TB primary storage HPC (High performance

Computer) Access Dedicated node access Consulting

Illumina Base space

FDACDC

NY-PHL

Strep pneumo pipelineInfluenza pipeline

Data Sharing

Lord of Rings Sequencers

Stratus

Stratus cloud computing• 80vCPU• 10TB of ephemeral

storage• 160GB RAM• Root access• Amazon S3 access

MinION

Bionumerics

Page 16: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

1 6

Access approach

MSI white-listed Minnesota Department of Health IP address

Page 17: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

1 7

Data transfer

• Between MDH server and MSI rely on MNET (Minnesota’s Network for Enterprise Telecommunications)

• Sequencing files can also directly downloaded from Basespace to MSI HP-storage

Page 18: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

1 8

Current PHL IT setting cannot fulfill bioinformatics computing needs

An interdisciplinary study which combine Statistics, Computer Science and Biology

“Big data” of Genomics Translate meaningless millions of short sequences into meaningful and

actionable biological results Population level infectious disease surveillance will require massive

computing power to process the large amount of biological data The “soul” of bioinformatics is open-source computing platform

Linux system*Supercomputer “Mesabi” at UMN

Page 19: Development of Bioinformatics Infrastructure in Minnesota › conferences › proceedings... · Development of Bioinformatics Infrastructure in Minnesota. I am not a bioinformaticist

1 96 / 1 1 / 2 0 1 9

Three Approaches to Bioinformatics Infrastructure

NY-Wadsworth CenterCDC

MN-PHL MA-PHLVA-PHL IA-PHL

CO-PHL

Pro: • Configurable

computing resource;• Ready-to-use system• Pre-installed software

and dependencies• Dedicated consulting

Con: • No root access• IT security• Data transfer• University-

government contract

Pro: • Configurable

computing resource;• Portability• Development

environments

Con: • IT security;• Data transfer• Data ownership

Possible Solutions

Build own system Cloud Computing Use University System

ChallengesMNIT did not support LinuxMNIT did not support use of open source softwareCostExpertise