33
Atos Marc Simon Atos Senior Expert Global HPC Presales manager

HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Atos

Marc Simon

Atos Senior Expert

Global HPC Presales manager

Page 2: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale vs Exaflops

2

Page 3: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale is not only about ExaflopsLessons learned from the Petaflops ages

First Petaflops system in June 2008 – Roadrunner1 Pflops – 2.5 MW , 300 racks

~13 k x IBM power Cell 8i accelerators + AMD Opteron

100 M$

10 years after, in June 2018 – Summit 122 Pflops – 13 MW , 256 racks

~27 k x Nvidia V100 accelerators + Power9

250 M$

Road to Exaflops is longer than expected …

Page 4: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale is not only about ExaflopsLessons learned from the Petaflops ages

End’s of

Moore law…

Page 5: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale is not only about ExaflopsLessons learned from the Petaflops ages

Application performances

are not increasing

at the same speed …

Page 6: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale is not only about ExaflopsLessons learned from the Petaflops ages

There are Flops …

and Flops…

Not all HPC and AI need Double Precision (64 bits) Flops

– Reduced Precision (16 bits) is enough

– FP16 (IEEE) : 5 bits for exponent

– new Floating Point Format B16 … “B” like Brain, B16: 8 bits for

exponent

– Same dynamic range as FP32(IEEE)

– usually FP16/B16 4x faster than FP64

With Matrix Multiplication / Machine Learning Special instructions …

when usable

64bits: x5 Flops and

16bits: x9 Flops … x4→ x36 vs usual FP64

Page 7: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale is not only about ExaflopsLessons learned from the Petaflops ages

Using

Some

(Artificial)

Intelligence

AI can:

- recognize cats from dogs

- play Go much better than Lee Sedol

- analyze 3D data better than anyone

- …

AI can be used in HPC for:- pre-processing → data assimilation

- post-processing → data analysis

- accelerating computing

- empirical modeling

- …

Precision

Medicine

Autonomous

Driving

CFD

Page 8: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale is about new usagesUsage

o Infrastructure convergence

o HPC infrastructure – Density / Performance / Reliability

o Handle new paradigm for Bigdata / AI workloads

o Hybrid workloads to maximize insight

o AI augmented simulation

o Datalake – Data analysis

o Pre/post processing

o Data deluge

o Digital twins

o High energy physics – Astrophysics

o HPCaaS

o Private and public cloud

o Hybrid orchestrator

HPC, HTC,

Bigdata and AI

Page 9: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale is about new usagesUsage: Digital Twins

Couplingo Data collection

o Analysis

o Multi Physics simulation

Page 10: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale is about new usagesUsage: Precise Medicine

Predictive

Preventive

Personalised

Participatory

50% of the children born in 2018 will live to be 100 in most countries

Page 11: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale is about new usagesUsage: HPC in the loop – Edge Computing

o Much more data is produced than

could be stored

o in-flight pre-processing necessary

SKA: 250 ExaByte/year

Page 12: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale – European Initiatives

12

Page 13: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

European Exascale EffortThe European HPC Landscape is changing

Page 14: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Code evolution and performance portability

Be ready for exascale

Weather and Climate, BioMedecine, CFD, etc…

Provide services to users community

Be recognized as an essential industrial partner

Consulting, Co-design, Benchmarking

14

HORIZON 2020EU Collaborative Projects (H2020)

Weather Forecast & ClimateDesigning next gen applications

Life ScienceBiomedical modelling community

Solid Earth, Energy, VVUQ,

Extreme DataStart, reinforce, prepare future needs

Page 15: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

▶ Develop the roadmap for the full length of the EPI Initiative

▶ Develop the first generation of technologies through a co-

design approach

▶ Tape-out of the first-generation chip by integrating the IPs

developed

▶ Validate this chip in the HPC context and in the automative

context using a demonstration platform

EPI Proposal for HPC

Page 16: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

16

Public Procurement of Innovations for High

Performance Computing

• In this project a group of leading European supercomputing centres decided the formation of a buyers group to execute a joint Public Procurement of Innovative Solutions

• The participants will work together on coordinated roadmaps for providing HPC resources optimized to the needs of European scientists and engineers.

• Energy efficiency and power management

• Data management

• Programming environment and productivity

• Data centre integration

• Maintenance and support

• System and application monitoring

• Security

FEATURES

OBJECTIVES

GOALS

Page 17: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

17

EuroHPC World-Class Supercomputer

▶ 8 hosting sites selected

▶ 19/28 countries

▶ 840 M€ co-investment

(EU/countries)

▶ Targeting 2020

▶ Requires diversity

Page 18: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale – Atos Technologies

18

Page 19: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale ChallengesTechnology Trends

Processing Data Management Networking

Energy Applications

Page 20: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

BullSequana X1000Open platform

ExascaleInterconnect

Energy & Performance Optimizer

2019 2020 2021 2022 2023

1018

Exascale

BullSequana XH2000

Data Management

High-speedEthernet

Atos Quantum Learning Machine

Atos HPC roadmap – New technologies

Page 21: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale ChallengesTechnology Trends

Processing

Page 22: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale ChallengesProcessors

Diversity of compute engine

o Adapted to specific needs

Higher and Higher TDP to manage

o Up to 800w ….

High Bandwidth Memory

o Targeting the 1:1 Byte/Flops

Interconnection of heterogeneous chips

o Coherent and non-coherent

Ideal HPC unit

Core

Core

Core

Page 23: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale ChallengesTechnology Trends

Data Management

Page 24: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Different spaces for different needs

L2

Scalable storage

Lext

External storage

L3

Archive storage

L1

Processing storage

e.g. Parallel File System

e.g. HSM

e.g. Object Storage

Storage Manager

e.g. Cloud Storage

Exascale ChallengesData Management

Page 25: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale ChallengesTechnology Trends

Networking

Page 26: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale ChallengesNetworking

High Speed Interconnect requirements

▶ High bandwidth & low latency

o x20 bandwidth – x3 latency in 15 years

▶ Increase Resiliency , Availability and Serviceability features

o Larger fabric to manage

▶ Topology support - Scalability

o Adapted to size and performance requirements

▶ Routing algorithm – Mix and match

o Path relative to type of communication

▶ Adaptive routing & QoS

o Congestion management

▶ Interoperability – Open Standard

Page 27: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale ChallengesTechnology Trends

Energy

Page 28: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale ChallengesEnergy

Direct Liquid Cooling :

o Compute nodes (CPU, Memory, Drives, GPU)

o High Speed Interconnect: HDR and BXI switches (L1,L2,L3)

o Management network: Intra Rack management switches

o Power Supply Unit: DLC shelves

The only components in BullSequana XH2000 that are not liquid cooled are the pumps of the Hydraulic Chassis (HYC)

95% Efficiency:

o Warm water up to 40°C inlet

o Heat rejected in air is almost constant

Targeting > 45°C inlet

And

>98% efficiency ….

BullSequana XH2000: Fan less Innovative cooling solution ……

Page 29: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale ChallengesTechnology Trends

Applications

Page 30: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Smart Hybrid ManagementUnify User experience

Bull Super Computing Stack

Integrate and manage HPC, Bigdata, AI and Cloud workflows

Atos codex AI Suite – Orchestrator

Smart Energy OptimizationMeasure the power consumption of your job through the

Bull Energy Optimizer (BEO)

Optimize the power consumption of your job through the

Bull Dynamic Power Optimizer (BDPO)

Smart IO ManagementManage IO performance through the

Bull IO Instrumentation (IOI)

Accelerate some I/O through the

Bull Fast IO Library (FIOL)

Exascale ChallengesApplications

Page 31: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale ChallengesCollaborating with Communities

Weather Forecast & ClimateDesigning next gen applications

Life ScienceBiomedical modelling community

Solid Earth, Energy, VVUQ, Extreme DataStart, reinforce, prepare future needs

Page 32: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

Exascale ChallengesSummary

Processing : Build open common platform for Hybrid processors

Data Management : Data Centric architecture

Networking : Backbone of modern Hybrid / Heterogenous supercomputer

Energy : Increase efficiency

Applications : Adapt for hybrid usage model – Collaboration , Center Of Excellence

Page 33: HPC, AI & Quantum · storage L ext External storage L 3 Archive storage L 1 Processing storage e.g. Parallel File ... Integrate and manage HPC, Bigdata, AI and Cloud workflows Atos

The Next Generation starts Today!