37
Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research Pankaj Shah, CEO LEARN Zac Blue, Baylor College of Medicine Deniz Gurkan, University of Houston Video and Remote: Jeffrey Early, Kim Andrews, Scott Mace Baylor College of Medicine

Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Embed Size (px)

Citation preview

Page 1: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Panel: Cyberinfrastructure to Support Large Data Transfers in

Genomics ResearchPankaj Shah, CEO LEARN

Zac Blue, Baylor College of Medicine

Deniz Gurkan, University of Houston

Video and Remote:

Jeffrey Early, Kim Andrews, Scott Mace Baylor College of Medicine

Page 2: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

CHAT History

• Video introduction by Jeff Early of Baylor College of Medicine’s historical role in the Texas Medical Center• PI of CC-DNI grant, 2015

• Baylor College of Medicine

• Director of Core Infrastructure Services

• Responsible for network engineering and security, data center operations, enterprise server compute, storage, Citrix and database administration

• Follow-up comments Zac Blue

Page 3: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Jeff Early Video

Page 4: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Before CHAT (mid-1990s)

Page 5: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

CHAT

Page 6: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Deniz Gurkan, University of Houston

Page 7: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

CC-NIE 2014: infrastructure

• University of Houston, Rice University, SETG

• 100 Gbps connection to Internet2 through SETG's LEARN port

• Grant award purchased optical nodes at UH, Rice, and SETG

• SETG purchased 100G interface and L2/L3 at LEARN PoP

• UH and Rice purchased their own L2/L3 at their sites

• BCM had optical units at various strategic locations• BCM empowered our cyberinfrastructure engineer, Scott Mace

Page 8: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Investments on Network

Page 9: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Process of Upgrade and Grant Deliverables

• SETG voting members (founders):• UH• Rice• BCM• TAMU• HCC

• SETG shared governance among network engineers of member institutions

• SETG cyberinfrastructure engineer:• 3 hours/month (otherwise a google employee) consultant• donated time-effort by member institutions' network engineers

Page 10: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Skillset of Engineering at Regional Networks

• Production Operations:• Vendor offerings• Balancing act of current and future needs

• NEW REQUIREMENT Federal grant award deliverables: frictionless support of science data transfers

• How to be ready for the researcher who asks for a circuit?• E.g.: GENI network experimenter• E.g.: a science data transfer use case from a domain science research lab in a

member institution• E.g.: Internet2 OESS/OSCARS interface to on-demand Layer 2 circuits

Page 11: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

People are key!

• Problems needed to be solved at all levels/layers of the network• financial• user• existing network layer 1-3

• System is an integration of toolsets• understanding the user apps

• Testing, planning, deployment strategy• not letting the vendors drive the solution

• Houston's Unique Blessing: Our network engineer knows the community and stays in the community after the project

Page 12: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

My lessons – education

• Teach computer networking with observation of protocol behavior so we can talk in terms of • what actually happens

• rather than what "a 13457" (or your favorite network box number of your favorite vendor) does at a node…

Page 13: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

My lessons – research infrastructure

• How does a researcher utilize the network without knowing networking for domain science transfers?• Custom science DMZ with configurable operational security knobs on data

• Circuit provisioning at every administrative domain on the path

• End host optimizations and tuning for effective utilization of available bandwidth

Page 14: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

My lessons – service

• How does an IT organization support talent to stay engaged in this environment of research + eyeball traffic + vendors + IT silos?• Clear problem definition

• Clear governance structure

• Clear financial outlook

• Community spirit

Page 15: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Zac Blue, Baylor College of Medicine

Page 16: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

CHAT in Late 2014

• CHAT infrastructure nearly 10 years old

• BCM’s Genomic Group (Human Genome Sequencing Center, or HGSC) pushing the envelope of 10G through Internet2

Page 17: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

…but why does this matter?

Page 18: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Pushing a Lot of Data!

HGSC transferred (outbound) ~200 TBytes of genomic data to AWS in May, 2014.

Page 19: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Pushing a Lot of Data!

HGSC transferred genomic data to AWS at a sustained rate of 5 Gbps in January, 2015.

Page 20: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

The Need

• Develop a formal Science DMZ for high-speed data transfers

• Upgrade CHAT to support 100G

• Upgrade path from CHAT to SETG to 100G

Page 21: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

CHAT (Quick Recap)

Page 22: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

CHAT in Late 2014

Page 23: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

CHAT Upgrade Proposal

Page 24: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Procurement – Had its Pros and Cons

•Process took much longer than expected•Multiple quotation iterations•Vendors became more and more aggressive•Unexpected cost savings• Stretched NSF funding beyond anything imagined

•Ran into logistical issues with shipping•Vendors shipped hardware to the wrong location•Hardware was lost for nearly a month

Page 25: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Procurement – Stretching NSF Funding

• Savings realized provided following benefits• Upgraded Science DMZ from 40G to 100G• Procured two 100G perfSONAR nodes• Procured one 135TB 40G-attached DTN• Added third CHAT router in Dallas

• Co-located with BCM DR infrastructure

• Only need cross-connects to connect to largest CDNs (Google, Facebook, Apple, Microsoft, Yahoo, etc.)• Reduced cost to consume content for nearly two dozen

institutions and >200,000 faculty, staff and students in SE Texas

Page 26: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

CHAT Upgrade Proposal

Page 27: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

CHAT Today

Page 28: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

100G Science DMZ

• Data Transfer Node (DTN)• BCM acquired a Globus site license (not part of grant)

• Early testing yielded ~20Gbps

• Optimization yielded ~30Gbps with ESNet DTN

• perfSONAR Node• First node was 10G

• Two 100G nodes purchased with grant – deployed mid-September

Page 29: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Early DTN Testing

40G DTN and tested it last night against three ESNet test DTNs across our new 100G infrastructure and Internet2

Page 30: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

CHAT Traffic

CHAT total usage one day in early September, 2017

Page 31: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Overall Challenges & Lessons Learned

• ScienceDMZ

• Telemetry

• Cyber Infrastructure engineering paramount to success of implementation

Page 32: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Human Genome Sequencing Center

• Overview by Zac Blue

• Video presentation by Dr. Kim Andrews, HGHS IT manager

Page 33: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

HGHC NIH Contract Award

• Total award amount $505M over 5 years

• Awarded Dec, 2016; announced publicly Jan, 2017

• HGSC had to guarantee minimum level of data transfer capability as a part of the contract

Page 34: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Human Genome Sequencing Center

• NSF CC* grant enabled 100G to our campus• This provides the means of facilitating much higher data transfer speeds than

ever before

• Enterprise IT now a point where we can support high-speed data transfers but HGSC still developing/acquiring means to leverage resources

• HGSC desire for 40G or 100G-attached DTNs and Aspera nodes

• Test/Development DTNs in the science DMZ instrumental in proving that network can support the data transfer needs

• End-to-end problem is ever-present, perfSONAR, test DTNs, etc.

Page 35: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

A LOT of Work Remains Ahead

• Organizational policies still need to be developed and formalized

• Processes need to be developed and operationalized around• Securing the data being sent and received

• Lifecycle management of the data on DTNs (purging)

• Tying DTNs to backend storage infrastructure

• Indexing data

• BCM (not CHAT) needs to develop user-friendly mechanism for mid-sized data set transfers (new “Bigfile”)

Page 36: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Dr. Kim Andrews Video

Page 37: Panel: Cyberinfrastructure to Support Large Data · PDF file · 2017-10-09Panel: Cyberinfrastructure to Support Large Data Transfers in Genomics Research ... •SETG purchased 100G

Questions?