56
Big Scientific Data and Data Science Professor Tony Hey Chief Data Scientist Rutherford Appleton Laboratory, STFC [email protected]

Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Big Scientific Dataand Data Science

Professor Tony Hey

Chief Data Scientist

Rutherford Appleton Laboratory, STFC

[email protected]

Page 2: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Thousand years ago – Experimental Science• Description of natural phenomena

Last few hundred years – Theoretical Science• Newton’s Laws, Maxwell’s Equations…

Last few decades – Computational Science• Simulation of complex phenomena

Today – Data-Intensive Science• Scientists overwhelmed with data sets

from many different sources

• Data captured by instruments

• Data generated by simulations

• Data generated by sensor networks

e-Science and the Fourth Paradigm

2

2

2.

3

4

a

cG

a

a

eScience is the set of tools and technologiesto support data federation and collaboration

• For analysis and data mining• For data visualization and exploration• For scholarly communication and dissemination

With thanks to Jim Gray

Page 3: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Examples of Data-Intensive Science

Page 4: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration
Page 5: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Cosmic Dawn(First Stars and Galaxies)

Galaxy Evolution(Normal Galaxies z~2-3)

Cosmology(Dark Energy, Large Scale Structure)

Cosmic Magnetism(Origin, Evolution)

Cradle of Life(Planets, Molecules, SETI)

Testing General Relativity(Strong Regime, Gravitational Waves)

Exploration of the Unknown

Extremely broad range of science!

Page 6: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Data Flow through the SKA

Footer text

SKA1-LOW

SKA1-MID

~2 Pb/s

8.8 Tb/s

7.2 Tb/s

~50 PFLOPS

~5 Tb/s

100 PFLOPS

Users

130 - 300 PB/yr

Page 7: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Large data sets: satellite observations

Page 8: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration
Page 9: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration
Page 10: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration
Page 11: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Some Machine Learning Methods

Neural networks

K-means clustering

Principal Component Analysis

Boltzmann machinesSupport Vector Machines

Hidden Markov Models

Kalman filters

Decision trees

Bayesian networks

Radial basis functions

Linear regression

Markov random fields

Random forests

Page 12: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

The Machine Learning Revolution• Neural networks are just one example of a

Machine Learning (ML) algorithm

• Deep Neural Networks are now exciting the whole of the IT industry since they enable us to:

• Build computing systems that improve with experience

• Solve extremely hard problems

• Extract more value from Big Data

• Approach human intelligence

e.g. natural language processing

• The change in the Word Error Rate (WER) with time for the NIST “Switchboard” data.

• In 2016 Microsoft researchers achieved a word error rate (WER) of 6.3 percent, the lowest in the industry.

Page 13: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

HPDA Architectures for High Performance Data Analytics

Page 14: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration
Page 15: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

• Distributed SQLServer cluster/cloud• 50 servers, 1.1PB disk, 500 CPU

• Connected with 20 Gbit/sec Infiniband

• Linked to 1500 core compute cluster

• Extremely high speed seq I/O (75GB/s)

• Balanced: Amdahl number >0.5

• Dedicated to eScience, provide public access through services

• Funded by Moore Foundation, Microsoft and Pan-STARRS

• Winner of SC08 Storage Challenge!

Page 16: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

An Example:The JASMIN Environmental Science

Super Data Cluster

Page 17: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

http://ndg.nerc.ac.uk

British Atmospheric Data Centre

British Oceanographic Data Centre

Simulations

Assimilation

The e-Science NERC DataGrid Project +

Page 18: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Centre for Environmental Data Analytics

JASMIN Super-Data Cluster infrastructure

Page 19: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

The UK Met Office UPSCALE campaign

10

01

00

10

00

01

110101

5 TB

per

day

Data conversion & compression

2.5

TB JASMINData transfer

HERMIT @ HLRS

Automation controller

Clear data from HPC once successfully transferred and

data validated

Page 20: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Example Data Analysis

• Tropical cyclone tracking has become routine; 50 years of N512 data can be processed in 50 jobs in one day

• Eddy vectors; analysis we would not attempt on a server/workstation (total of 3 months of processor time and ~40 GB memory needed) completed in 24 hours in 1,600 batch jobs

• JASMIN HPDA architecture has clearly demonstrated the value of cluster computing to data processing and analysis.

M Roberts et al: Journal of Climate 28 (2), 574-596

Page 21: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Big Scientific Data from Large Experimental Facilities

Page 22: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

UK Science and Technology Facilities Council (STFC)

Daresbury LaboratorySci-Tech Dasresbury CampusWarrington, Cheshire

Page 23: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Big Data and Cognitive Computing:Hartree Centre collaboration with IBM Research

Page 24: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Central Laser Facility

ISIS (SpallationNeutron Source)

Diamond Light Source

LHC Tier 1 computingJASMIN Super-Data-Cluster

Rutherford Appleton Laboratory

Page 25: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Diamond Light Source

Page 26: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Science Examples

Pharmaceutical manufacture &

processing

Casting aluminium

Structure of the Histamine H1

receptor

Non-destructive imaging of fossils

Page 27: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Detector data rates increasing faster than Moore’s Law

1

10

100

1000

10000

2007 2012

Detector Performance (MB/s)

Data Rates at Diamond

Thanks to Mark Heron

Page 28: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Thanks to Mark Heron

0

1

2

3

4

5

6

Cumulative Amount of Data Generated at Diamond

Data

Siz

e in P

B

Page 29: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Nucleous

Cryo-SXT Data

● Noisy data, missing wedge artifacts, missing

boundaries

● Tens to hundreds of organelles per dataset

● Tedious to manually annotate

● Cell types can look different

● Few previous annotations available

● Automated techniques usually fail

Segmentation

Neuronal-like mammalian cell line; single slice

Nucleus

Cytoplasm

Challenges:

Data

● B24: Cryo Transmission X-ray Microscopy beamline at DLS

● Data Collection: Tilt series from ±65° with 0.5° step size

● Reconstructed volumes up to 1000x1000x600 voxels

● Voxel resolution: ~40nm currently

● Total depth: up to 10μm

● GOAL: Study structure and morphological changes of whole cells

3D Volume Data

Segmentation of Cryo-Soft X-ray Tomography (Cryo-SXT) data

Computer VisionLaboratory

B24 beamlineData Analysis Software Group

[email protected]

Page 30: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Nucleous

Workflow

Data Preprocessing

Data Representation

Feature Extraction

User’s Manual Segmentations

Classification

Tomographic Cell Analysis: Feature ExtractionFeatures are extracted from voxels to represent their appearance:

● Intensity-based filters (Gaussian Convolutions)

● Textural filters (eigenvalues of Hessian and Structure Tensor)

User Annotation + Machine Learning

Refinement

User Annotations

Predictions Refinement

Using few user annotations as an input:

● Machine learning classifier (Random Forest) trained to discriminate between

Nucleus and Cytoplasm and predict the class of each SuperVoxel

● Markov Random Field then used to refine the predictions

[email protected]

Page 31: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

The Cryo-Electron Microscopy Revolution

• CMOS sensors have revolutionised TEM

• FALCON 1 generates 300 Mbps

• FALCON 2 will generate 180 Gbps

• Exciting scientific results on membrane protein structures already published and much more to come

With thanks to Nicola Guerrini

➢ Faster and less noisy sensors for better performance are the way forward

➢ Systems becoming widely available and will generate huge datasets

Page 32: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

ISIS

Page 33: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Peak Assignment in Inelastic Neutron Scattering

• Vibrational motion of atoms crucial for many properties of a material -e.g., how well it conducts electricity or heat

• Peaks in INS spectrum correspond to specific atomic vibrations

• Peak assignment: what specific vibrational motions of atoms give rise to specific peaks ?

INS Spectrum of crystalline benzene

S. Parker and S. Mukhopadhyay (ISIS)

Page 34: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Modelling & Simulation forINS Peak Assignment Calculated INS Spectrum of crystalline benzene

• INS spectra can be computed for a given atomic structure

• Calculations allow us to see what specific vibrational motion of atoms occur, and at what frequency

L. Liborio

Page 35: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Materials Workbench

K. Dymkowski

Page 36: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

The Central Laser Facility

Page 37: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

• National imaging facility with peer-reviewed, funded access

• Located in Research Complex at Harwell

• Cluster of microscopes and lasers and expert end-to-end multidisciplinary support

• Operations and some development funded by STFC

• Key developments funded through external grant – BBSRC, MRC

OCTOPUS Facility in the CLF

With thanks to Dan Rolfe

Page 38: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Example: EGFR cell signalling in cancer• Driven OCTOPUS single molecule

developments

• User in plant cell imaging now catching up in scale of challenge

• Part of a PhD project:

• 1 experimental technique

• 50 experimental conditions

• 30 datasets for each condition

• 1000 single molecule tracks for each condition

• Multiple properties & events of interest in each track

• Comparison of just one property…

With thanks to Dan Rolfe

Page 39: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Multidimensional single molecule tracking

• Automated registration & tracking in multiple channels

• Computer vision

• Bayesian feature detection from astronomical galaxy detection

• Instrumental metadata from acquisition

• Flexible specification of many instrument configurations

Rolfe et al 2011, Euro Biophys J, 2011With thanks to Dan Rolfe

Page 40: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Big Scientific Data Benchmarks

Page 41: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

How can Academia compete with Industry on Machine Learning and AI?

Companies like Facebook, Google, Amazon, Baidu and Microsoft have three key advantages over academia:

1. These companies all have many, very large, private datasets that they will never make publicly available

2. Each of these companies employs many hundreds of computer scientists with PhDs in Machine Learning and AI

3. Their researchers and developers have essentially unlimited computing power at their disposal

➢ ImageNet example for computer vision community

Page 42: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

• ImageNet is an image dataset organized according to WordNet hierarchy. There are more than 100,000 WordNet concepts.

• ImageNet provides 1000 images of each concept that are quality-controlled and human-annotated.

• In competitions, ImageNet offers tens of millions of sorted images for concepts in the WordNet hierarchy.

➢ The ImageNet dataset has proved very useful for advancing research in computer vision

Page 43: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

A Particle Physics ExampleDataset for Machine Learning

Page 44: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration
Page 45: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

The Higgs Challenge

Page 46: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration
Page 47: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration
Page 48: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Machine Learning winners of the Higgs Challenge• Winner Gábor Melis, a graduate in software engineering and

mathematics, developed an algorithm that is an ensemble ofdeep neural networks trained on random subsets of dataprovided with very little feature engineering and no physicsknowledge

• Runner-up Tim Salimans, who has a PhD in Econometrics andworks as a data science consultant, developed a solution hedescribes as a combination of a large number of boosteddecision tree ensembles

• A Special High Energy Physics meets Machine Learning Awardwas presented to Tianqi Chen and Tong He of Team Crowwork.Their XG Boost algorithm was an excellent compromisebetween performance and simplicity, which could improvetools currently used in high-energy physics.

Winners of the Higgs Machine

Learning Challenge: Gábor

Melis and Tim Salimans (top

row), Tianqi Chen and Tong

He (bottom row).

Page 49: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

The STFC Big Scientific Data Benchmarks?

Page 50: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Use open Scientific Datasets for ‘Experimental’ Data Science

• The idea is to create scientific datasets that are sufficiently large and complex to provide a realistic testing ground for ML algorithms.

• These open datasets can form the basis for training academics and industry to understand which is the best algorithm and hardware execution platform to find different features in the data.

➢ Use experimental data from STFC Large Scale Facilities to create a set of scientific ‘benchmark’ datasets

➢Complement the computational benchmarks from the Hartree Centre

Page 51: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Experimental Data Science Datasets• Astronomy datasets from LSST, SKA

• Particle Physics LHC datasets from ATLAS, CMS

• Large Scale Facilities datasets – DLS, ISIS, CLF and Hartree

• Environmental datasets from JASMIN data

• Fusion datasets from Culham

➢ The creation of such curated datasets will allow experimentation and training in Machine Learning technologies executed on different hardware architectures

➢Use these datasets as basis for training courses in Data Science for both academia and industry

Page 52: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

A Fusion Big Scientific Data Benchmark?

• Filamentary plasma structures play important role in turbulent particle transport

• Archive of 400GB of video data from MAST Tokomak at Culham

• Developing synthetic data training set of simulated filaments with known properties

• Promising exploration of applicability of Machine Learning techniques

Page 53: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

Proof of Concept: An Initial Set of Benchmarks

1. Particle Physics – LHC datasets (Particle tracking, Supersymmetric particles)

2. Astronomy – LSST, SKA simulated datasets (STFC CDTs in Data Intensive Science)

3. Diamond – Cryo-SXT dataset (e.g. Mark Basham, DLS)

4. Diamond - Cryo-EM dataset (e.g. Dave Stuart, DLS)

5. CLF – Octopus, single molecule tracking dataset (e.g. Dan Rolfe, CLF)

6. ISIS – Peak detection with noisy datasets (e.g. Anders Markvardsen, ISIS)

7. Environment – Extreme weather events, air quality satellite data (JASMIN/CEDA, RAL Space)

8. Fusion – MAST Filament Video Dataset (Rob Akers, Culham)

Page 54: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

The Ada Lovelace Center

Page 55: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

The Data Analysis Gap• Complex Data

• Too big to move in some cases• High CPU / memory requirements• May need to combine data from different sources

• Complex software environments• Variation in users’ knowledge of HPC• Variation in home computing environments• Variation in the availability of Analysis and modelling

Software• Diverse science communities supported by the Facilities

• Different analysis software requirements

➢Users’ access to usable computing to handle experimental science a barrier to science

With thanks to Brian Matthews, SCD

Page 56: Big Scientific Data and Data Science - Institute for Data ...idies.jhu.edu/wp-content/uploads/2017/10/Tony-Hey... · Big Data and Cognitive Computing: Hartree Centre collaboration

The ALC - Towards a “Super-facility”?

“A network of connected facilities, software and expertiseto enable new modes of discovery”

Katie Antypas, Inder Monga, Lawrence Berkeley National Laboratory

Infrastructure + Software + Expertise

With Common Interfaces and Transparent Access

Data

Catalogue

Petabyte

Data storage

Parallel

File system

HPC

CPU+GPU

VisualisationData

Catalogue

Petabyte

Data storageParallel

File systemHPC

CPU+GPUVisualisationSoftware

Data

Acquisition