17
ISC’11 June 21 nd, 2011 Motoi Okuda FUJITSU Ltd. Fujitsu‘s Technologies to the K Computer - a journey to practical Petascale computing platform -

Fujitsu's Technologies to the K Computer · ISC’11 June 21nd, 2011 Motoi Okuda FUJITSU Ltd. Fujitsu‘s Technologies to the K Computer - a journey to practical Petascale computing

Embed Size (px)

Citation preview

ISC’11

June 21nd, 2011

Motoi Okuda

FUJITSU Ltd.

Fujitsu‘s Technologies to the K Computer

- a journey to practical Petascale computing platform -

ISC’11

The Next generation supercomputer project of Japan

“The K computer ”

Design concept of the K computer

Our technologies applied to the K computer

Preliminary performance figures of the K computer

Toward post 10PFlops era and Exa-scale computing

Conclusion

Agenda

Copyright 2011 FUJITSU LIMITED 1

ISC’11

History of the K computer project

Project officially started mid of 2006

System installation started in Oct. 2010

Partial system started test-operation in April 2011

Full system installation and adjustment will be completed by middle of 2012

Official operation will start by the end of 2012

Application software projects are also running concurrently

2006 2007 2008 2009 2010 2011 2012

Copyright 2011 FUJITSU LIMITED

Next-Generation Integrated Nano-science Simulation

Next-Generation Integrated Simulation of Living Matter

Conceptual

design Detailed design

Prototype,

evaluation Tuning

Production, installation,

and adjustment

2

ISC’11

2005 2006 2007 2008 2009 2010 2011 2012 2001 2002 2003 2004

Pre History of the K computer project

Primary R&D project started in 2005

Primary R&D projects for

Next Generation Supercomputer

WG for High-end

Computing National Grid Project

NAREGI

National grid project started in 2003

High-end Computing WG initiated the feasibility study for future high–end computing environment from application point of view in 2001

Copyright 2011 FUJITSU LIMITED

Next-Generation Integrated Nano-science Simulation

Next-Generation Integrated Simulation of Living Matter

Conceptual design

Detailed design

Prototype, evaluation

Tuning Production, installation,

and adjustment

3

ISC’11

Target Applications of the K computer

Courtesy of RIKEN

Copyright 2011 FUJITSU LIMITED 4

ISC’11

Design target of the K computer

Toward wider coverage of applications and higher performance on those

applications

High Performance

10PFlops at LINPACK

High productivity

Easy to extract high performance from the highly paralleled programs without

inordinate burden to programmers

Sophisticated language and programmer support environment

High operability

Low power consumption

High reliability and easy to operate

Ensuring target date : mid. of 2012

Copyright 2011 FUJITSU LIMITED 5

ISC’11

Fujitsu’s technologies applied to the K computer

Copyright 2011 FUJITSU LIMITED

SPARC64TM VIIIfx Processor HPC-ACE (SPARC V9 Architecture

Enhancement for HPC) :128GFlpos SIMD

Register enhancements

Software controllable cache

Hardware barrier between core

Main frame CPU level of high reliable design

Low power consumption : ~58W

Single CPU per node configuration High memory bandwidth and simple

memory hierarchy

CPU/ICC direct water cooling High reliability, low power consumption

and compact packaging

New Interconnect ,Tofu 6-dimensional Mesh/Torus

topology

High speed, highly scalable, high operability and high availability interconnect for over 100,000 nodes system

Functional interconnect

LINPACK 10 PFlops

Over 1PB mem.

800 racks

80,000 CPUs

640,000 cores

6

ISC’11

Software environment

Fujitsu’s technologies applied to the K computer (cont.)

Copyright 2011 FUJITSU LIMITED

The K computer

Applications

HPC Portal / System Management Portal

System configuration management

System monitoring System installation&

operation

Job operations Management

System Operations Management

Job manager Job scheduler Resource management

Linux based OS enhanced for K computer

Lustre based distributed file system

High scalability IO bandwidth guarantee High reliability and

availability

High Performance File System

Hybrid parallel programming

Sector cache support SIMD/register file

extensions

MPI/Math. Libraries

Support tools

Tuned for hardware

Profiler & tuning tools Interactive debugger

Compiler (Fortran, C, C++)

7

ISC’11

横軸 top500 (左の方が一位) v.s. 効率

横軸を性能にする案もあるか?

The K computer’s Performance

LINPACK performance and its efficiency P

erf

orm

ance

Effic

iency

(RM

ax : L

INP

AC

K P

erf

orm

an

ce

/ R

Peak :

Peak P

erf

orm

an

ce

)

Pro

du

cti

vit

y

Performance (Rmax PFlops)

June 2011

K computer (subset) 68,544 CPUs, 548,352 cores

8.162PFlops, 93.0% SPARC64TM VIIIfx

Other Fujitsu System

NSCT (China) GPGPU

NSCS (China) GPGPU

Jaguar (US) Opteron

GSIC (Japan) GPGPU

Copyright 2011 FUJITSU LIMITED 9

ISC’11

The K computer’s Performance (cont.)

LINPACK performance and its power consumption P

ow

er

Effic

iency (

RM

ax M

Flo

ps/W

)

Gre

en

ness

June 2011

K computer (subset) 825 MFlops/W

SPARC64TM VIIIfx

IBM BlueGene/Q prototype (US)

PowerBQC

Nagasaki Univ. (Japan) GPGPU

FZJ (German) XCell

CINECA/SCS (Italy) GPGPU

GSIC (Japan) GPGPU

NSCT (China) GPGPU

Copyright 2011 FUJITSU LIMITED

Performance (Rmax PFlops)

10

ISC’11

The K computer’s Performance (cont.)

LINPACK performance and its computing time

June 2011

K computer (subset) 28 hr.

SPARC64TM VIIIfx

JAXA (Japan) FX1

SPARC64TM VII

NSCT (China) GPGPU

Jaguar (US) Opteron C

om

puting T

ime

(H

ou

rs)

Copyright 2011 FUJITSU LIMITED

Performance (Rmax PFlops)

11

ISC’11

The K computer’s Performance (cont.)

LINPACK performance efficiency and power consumption

Greenness

Perf

orm

ance E

ffic

ien

cy

(RM

ax : L

INP

AC

K P

erf

orm

ance / R

Peak :

Pe

ak P

erf

orm

ance)

Pro

du

cti

vit

y

Power Efficiency (RMax MFlops/W)

June 2011

Power data not registered

K computer (subset) SPARC64TM VIIIfx

IBM BlueGene/Q prototype (US)

PowerBQC

Nagasaki Univ. (Japan) GPGPU

CINECA/SCS(Italy) GPGPU

GSIC(Japan) GPGPU

FZJ(German),etc. XCell

NSCT(China) GPGPU

Copyright 2011 FUJITSU LIMITED 12

ISC’11

The K computer’s Performance (cont.)

Example of the fundamental BMT performance on 1.05PFlops system*

High efficient threading between cores and functional interconnect

Copyright 2011 FUJITSU LIMITED

Scalability of the HIMENO-BMT*** (XL size, 1,024 x 512 x 512)

No. of Cores

Scala

bili

ty

* : 65,536 cores, 8,192 CPUs, ** : 8 thread /node + MPI

** : HIMENO-BMT, Benchmark program which measures the speed of major loops to solve Poisson's

equation solution using Jacobi iteration method. In this measurement, Grid-size XL was used.

Hybrid execution** w/o Integrated MPI support

Hybrid execution** with Integrated MPI support

Flat MPI execution with Integrated MPI support

13

ISC’11

Several real applications are now running on the K computer which is in test operation phase

First priority applications has been optimized, tested and evaluated

Others

More than 20 applications are optimizing and testing on the K computer

Applications running on the K Computer

Program Discipline Outline Scheme

NICAM Earth science Nonhydrostatic ICosahedral Atmospheric Model (NICAM) for Global-Cloud Resolving Simulations

FDM (atmosphere)

Seism3D Earth science Simulation of Seismic-Wave Propagation and Strong Ground Motions

FDM (wave)

FrontFlow/ Blue

Engineering Unsteady Flow Analysis based on Large Eddy Simulation (LES)

FEM (fluid)

PHASE Material science First-Principles Simulation within the Plane-Wave Pseudo potential formalism

DFT (plane wave)

RSDFT Material science Ab-initio Calculation in Real Space The real-

space DFT

LatticeQCD Physics Study of elementary particle and nuclear physics based on Lattice QCD simulation

QCD

Copyright 2011 FUJITSU LIMITED 14

ISC’11

Tight collaboration, co-work and concurrent development with target applications

Expansion and brush up of current technologies for practical 10PFlops class computing

Technologies Jump for practical 100PFlops class computing

Japanese HPC community’s big question after March 11th

How Exa-scale computing contribute to the society ?

Which applications need Exa-scale computing power?

Copyright 2011 FUJITSU LIMITED

For post 10PFlops era and Exa-scale computing

FPGA, Reconfigurable LSI

Many core architecture

Optical computer Accelerator

technologies

Quantum computer

On board optical link

DNA computer

CNT technologies

3D stacked memory

Graphene technologies

Under 20nm semiconductor tech.

CPU integrated interconnect I/F

15

ISC’11

Conclusion

The K computer targeted practical PFlops class computing.

Fujitsu’s several leading-edge technologies applied to the K computer and

achieved excellent performance, productivity and operability

“How to utilize this huge computer power for bringing safe, reliable and

sustainable society in reality” is the Fujitsu’s next and true challenge

This is a milestone to reach real Exa-scale computing

Fujitsu will continue our effort toward real Exa-scale computing

Copyright 2011 FUJITSU LIMITED 16

ISC’11