21
UQ’s MeDiCI Data in the right place, at the right time. Jake Carroll, Senior ICT Manager, Research The Queensland Brain Institute, The University of Queensland, Australia [email protected]

MeDiCI - How to Withstand a Research Data Tsunami

Embed Size (px)

Citation preview

Page 1: MeDiCI - How to Withstand a Research Data Tsunami

UQ’s MeDiCIData in the right place, at the right time.

Jake Carroll, Senior ICT Manager, ResearchThe Queensland Brain Institute, The University of Queensland, Australia

[email protected]

Page 2: MeDiCI - How to Withstand a Research Data Tsunami

This is a story of data locality, performance, namespace and financial

complexity.

Page 3: MeDiCI - How to Withstand a Research Data Tsunami

QBI

CAI

IMB

AIBN

100’s of TB’s per day of data generated - eclectic mixture of Life

Sciences data, engineering, physics, nanotech

Every man, woman and child seems to build a (little) supercomputer, to deal with their problems…

Compute + Storage are tightly connected, in each building.

Page 4: MeDiCI - How to Withstand a Research Data Tsunami
Page 5: MeDiCI - How to Withstand a Research Data Tsunami

Instrument outputs + scientific endeavors grow - budgets for storage

and compute do not.

Page 6: MeDiCI - How to Withstand a Research Data Tsunami

To add another complexity…

Page 7: MeDiCI - How to Withstand a Research Data Tsunami

The MeDiCI Journey

Page 8: MeDiCI - How to Withstand a Research Data Tsunami

Thus, the problem (or question) definition:

“How do we provide parallel access to scientific data, through a multitudeof protocols and give the illusion that the data is ‘next to’ the applications, on a budget, keeping the rightdata near the right type of computational infrastructure, noting our budgetary constraints?

Page 9: MeDiCI - How to Withstand a Research Data Tsunami
Page 10: MeDiCI - How to Withstand a Research Data Tsunami

SpectrumScale AFM (cache)

{Parallel IO via NSD protocol}

SpectrumScale AFM (home)

Page 11: MeDiCI - How to Withstand a Research Data Tsunami
Page 12: MeDiCI - How to Withstand a Research Data Tsunami

Back at UQ

uqjcarr1

Scale cluster “A”using UQ creds

Scale cluster “B”using other creds

Out at Polaris

someOtherName

mmname2uuidmmuuid2Name

Page 13: MeDiCI - How to Withstand a Research Data Tsunami

Turns out, all that code was missing from SpectrumScale.

Page 14: MeDiCI - How to Withstand a Research Data Tsunami

Network stumbles…

• We had, at best, 10GbE between our buildings and around the campus.

• Not made for the parallel IO aggression of spectrumScale AFM over the NSD protocol.

• Needed to spawn an entire mini-project to upgrade campus networks for big storage IO to 40/100G around the “ring” of nodes.

Page 15: MeDiCI - How to Withstand a Research Data Tsunami

Recovery storms - AFM is a work in progress

• When you’re trying to recover 10’s of millions of files, AFM doesn’t always keep up.

• IBM working on it, for us (and others, globally).

• Scaling to 100’s of millions of files in a single (or multiple) file-sets, if not billions of files in sync/push/recovery is required.

Page 16: MeDiCI - How to Withstand a Research Data Tsunami

Things we assumed users would doas per our mental model.

User puts data in cache frominstruments to send to a

supercomputer, at remote site

User processes data out atremote site on said supercomputer

Page 17: MeDiCI - How to Withstand a Research Data Tsunami

Things people actually did, breaking our mental model.

User puts data in cache frominstruments. They start processing

on a supercomputer locally.

Simultaneously, they start using the storage fabric to process other “bits”of the outputs of the run on the other supercomputer for an additive workflow.[culminating in the fabric becoming a means for both supercomputers to work on the same tasks at the same time]

Page 18: MeDiCI - How to Withstand a Research Data Tsunami

Same data namespaceended up everywhere.

That much, was intentional.

As a result, user could leverage*every bit of the compute* everywhere

simultaneously, if their workflowis smart enough…

IMB QBI

RCC

Page 19: MeDiCI - How to Withstand a Research Data Tsunami
Page 20: MeDiCI - How to Withstand a Research Data Tsunami

Turns out, we’re onto something

Page 21: MeDiCI - How to Withstand a Research Data Tsunami

Thank you.

• UQ RCC, David Abramson for mentorship and a true sense of adventure.

• The Queensland Cyber Infrastructure Foundation (QCIF)

• My colleagues at UQ QBI, IMB, CAI, AIBN, ITS

• AIIA, ACS

• Justin Glen @ DDN