43
0 Fujitsu HPC Cluster Suite Copyright 2013 FUJITSU 29 th May 2013 Павел Борох Webinar

Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

0

Fujitsu HPC Cluster Suite

Copyright 2013 FUJITSU

29th May 2013 Павел Борох

Webinar

Page 2: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

1

HPC: полный спектр предложений от Fujitsu

Copyright 2013 FUJITSU

FEFS

Gateway Cent OS

HPC Cluster Suite

Cluster Management & Operation

Sizing,

design

Proof of

concept

Integration into

customer environment

Certified system

and production

environment

Complete assembly,

pre-installation and

quality assurance

Ready to

Operate

at delivery

PRIMERGY Server, Workstation

ETERNUS Storage

ISV and Research

Partnerships

Open Petascale Libraries Network

PreDiCT Initiative

Consulting and Integration Services

Ready-to-Go

Needs HCS FEFS Gateway Summary CDM

Page 3: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

2

HPC и необходимое ПО

Кластерная архитектура и использование

Когда предпочтителен интегрированные пакет ПО

Copyright 2013 FUJITSU

Page 4: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

3

HPC cluster

Типичная архитектура HPC кластера

Copyright 2013 FUJITSU

Shared Disk

Head node 2

(Fail-over)

eth0

eth0

ib0

ib0

eth1

eth1

End-User

workstations Compute

nodes

eth0

Ethernet

ib0

Infiniband

Задачи

выполняются

здесь

Parallel File System (PFS)

Пользователь

запускает

задачу здесь

Job A

Job B

Inter-process communication (MPI) and PFS data traffic

Management (job start/stop, NFS of /home)

failover

Pre/Post

processing

Needs HCS FEFS Gateway Summary CDM

Workload manager

Head node 1

Очередь

задач здесь

Page 5: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

4

Характеристики среды HPC

Characteristic Description

Ориентирована на задачи

(Job)

Расчѐты выполняются в виде задач («пакетный режим») на наборе

вычислительных узлов (от десятков до тысяч). Возможны как

последовательные (один процесс), так и параллельные (множество

процессов) задачи

Не интерактивна За редкими исключениями работа с HPC не интерактивна

Одновременность На кластере может одновременно работать множество задач

Разнообразие нагрузок Нагрузки (число ядер на задачу) различаются в зависимости от

приложения (уровня параллелизма)

Большие объемы данных Многие приложения производят и используют большие объемы

данных за малые интервалы времени

Межузловые коммуникации Параллельные приложения требуют наличия скоростного

межузлового интерконнекта (напр. InfiniBand). Необходимо для

передачи данных и для коммуникации между процессами Copyright 2013 FUJITSU

Needs HCS FEFS Gateway Summary CDM

Page 6: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

5

Почему интегрированное решение?

HPC SW stack must be installed on

10’s to 1,000’s of nodes

Same OS installed with same options on

every node

Resource manager / MPI libraries / Scientific

libraries

Systems must conform to a standard

basic set of operating conditions

uid, gid and password exactly the same

across all nodes

Shared home file system across all nodes

Common temporary storage across all nodes

Password-less access for user sessions

across all nodes

Time consuming, tedious and error-prone set-up Operating conditions items do not scale across more than a few

nodes when direct human action is needed

Software choices for any individual option is

daunting Compilers: GNU, Intel, PGI, Absoft

Resource managers: LSF, PBSPro, Torque, Moab, SGE,

SLURM, CONDOR

MPI: OpenMPI, MPICH, MVAPICH, Intel MPI, Platform MPI

Need to validate software choices and drivers

Copyright 2013 FUJITSU

Complex and difficult

Operating System

Parallel

libraries

Operating System

Cluster deployment &

management

Simple and Validated

Web

based

end-user

interface

Work-

load

manager

Improved TCO

Reduced IT Cost

Shortened Delivery

Improved Quality

Needs HCS FEFS Gateway Summary CDM

Page 7: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

6

HPC software stack: типичный ландшафт

Зрелый стэк ПО обеспечивает:

Ввод узлов в эксплуатацию и администрирование пакетов ПО («заливок»)

«Workload manager» для управления задачами и ресурсами

Параллельную среду обработки с необходимыми библиотеками

Инструментарий для разработчиков

Опции хранилища данных (NFS, PFS)

Эти компоненты в принципе одни и те же

Различие только в конкретных используемых продуктах

Copyright 2013 FUJITSU

Application programs

Workload manager

Operating System

GPGPU and XEON Phi

software support

Cluster deployment and management

Automated installation and configuration

Administrator interface Operation and monitoring

Cluster checker User environment

management

Management of cluster resources Manage serial and parallel jobs Fair share usage between users

Parallel Middleware

Scientific Libraries

Parallel File System

Compilers, performance and profiling tools

Graphical end-user interface

Fujitsu PRIMERGY HPC Clusters

Needs HCS FEFS Gateway Summary CDM

RedHat Linux CentOS

OS Drivers

Page 8: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

7

Fujitsu Software HPC Cluster Suite (HCS)

Функционал и редакции

Copyright 2013 FUJITSU

Page 9: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

8

HPC Cluster Suite: позиционирование редакций

Open edition

Ограниченный бюджет

Компоненты с открытым

кодом достаточны

Собственный опыт в

настройке и эксплуатации

кластера (напр. академ.)

Доступ к обновлениям не

критичен

Basic edition

Требуется поддержка

(напр. индустриальные

пользователи)

Продвинутые функции

планирования не

требуются

Относительно небольшие

кластеры (СМБ, группы

разработки)

Advanced edition

Требуется продвинутый

функционал

планирования задач

Полная поддержка

менеджера ресурсов

Более настраиваемый

HPC Gateway

Требуется разработка

правил обработки

потока задач

Copyright 2013 FUJITSU

Needs HCS FEFS Gateway Summary CDM

Note: Editions are not field upgradeable

Page 10: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

9

Описание - Open / Basic / Advanced

Copyright 2013 FUJITSU

Main features Open Edition Basic Edition Advanced Edition

Easy-to-use and scalable cluster deployment and management

CDM Intel Cluster Checker

CDM Intel Cluster Checker

CDM Intel Cluster Checker

Workload managers Torque

SGE and SLURM Torque

SGE and SLURM Altair PBS Professional

Parallel file system No FEFS FEFS

General HPC Open Source Software components MPI, parallel libraries, compilers, BMT tools

Yes Yes Yes

Graphical end-user interface - Gateway with various ISV application catalogs

Gateway Demo Gateway Basic Gateway Advanced

Line command administrator interface Yes Yes Yes

Monitoring and alerting Open Source

Proprietary (planned) Open Source

Proprietary (planned) Open Source

Proprietary (planned)

Development Environment GNU Intel® Cluster Studio XE

GNU Intel® Cluster Studio XE

GNU Intel® Cluster Studio XE

Intel® Cluster Ready Yes Yes Yes

Recommended cluster size Up to 128 nodes Up to 128 nodes Up to 1024 nodes

High Availability (HA) No No Yes

Support and Maintenance and upgrade

No perpetual

Yes (9hx5)

1/3/5 year subscription Yes (9hx5)

1/3/5 year subscription

Needs HCS FEFS Gateway Summary CDM

Page 11: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

11

Fujitsu HCS – поддержка ОС

HCS Version 1.0

Hardware platform PRIMERGY SandyBridge

RX / CX

RHEL RHEL 5.8

RHEL 6.3

SUSE -

CentOS

-

CentOS 6.3

(compute node only)

Copyright 2013 FUJITSU

Needs HCS FEFS Gateway Summary CDM

Page 12: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

12

HPC Cluster Suite: категории

SKUs planning Feature (basic / advanced / open edition) / customer segment (academic / commercial) and cluster size

Per node licensing

HPC Cluster Suite SKUs Подписка Размер кластера, лицензии на каждый узел внутри категории

Open Edition вечная 1-128 узлов

Basic

Academic + Research

1Y

До 16 узлов

(управляющие +

вычислительные)

До 64 узлов

(управляющие +

вычислительные)

65+ узлов

(управляющие +

вычислительные)

3Y

5Y

Commercial

1Y

3Y

5Y

Advanced

Academic + Research

1Y

3Y

5Y

Commercial

1Y

3Y

5Y

Copyright 2013 FUJITSU

Needs HCS FEFS Gateway Summary CDM

Page 13: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

16

Компоненты Fujitsu HCS

Описание компонентов пакета

Copyright 2013 FUJITSU

Page 14: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

17

The Fujitsu HPC Cluster Suite (HCS)

Полнофункциональный пакет для управления кластерами на основе Fujitsu PRIMERGY Easy-to-use cluster management

Popular workload managers

General HPC Open Source Software

Highly scalable parallel file system

Graphical end-user interface for simplified usage

Альянс с ведущими разработчиками

Полностью протестированное решение для HPC

Copyright 2013 FUJITSU

Application programs

Workload manager

Operating System

GPGPU and XEON Phi

software support

Cluster deployment and management

Automated installation and configuration

Administrator interface Operation and monitoring

Cluster checker User environment

management

Management of cluster resources Manage serial and parallel jobs Fair share usage between users

Parallel Middleware

Scientific Libraries

Parallel File System

Compilers, performance and profiling tools

Graphical end-user interface

Fujitsu PRIMERGY HPC Clusters

Fujitsu HPC Cluster Suite

Needs HCS FEFS Gateway Summary CDM

RedHat Linux CentOS

OS Drivers

Page 15: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

18

Software stack components

Importance Essential

Why needed Core software supporting the hardware

platform

Enables support for hardware with no

standard OS drivers (IB, 10GbE, Disk

controllers)

Availability for HCS RedHat EL 5.x/6.x

CentOS EL 5.x/6.x

Value add Drivers are integrated to the HCS repository

for simple cluster deployment

Copyright 2013 FUJITSU

Operating Systems + Drivers

Application programs

Workload manager

Operating System

RedHat Linux CentOS GPGPU and XEON Phi

software support

Cluster deployment and management

Automated installation and configuration

Administrator interface Operation and monitoring

Cluster checker User environment

management

Management of cluster resources Manage serial and parallel jobs Fair share usage between users

Parallel Middleware

Scientific Libraries

Parallel File System

Compilers, performance and profiling tools

Graphical end-user interface

Fujitsu PRIMERGY HPC Clusters

Fujitsu HPC Cluster Suite

OS Drivers

Needs HCS FEFS Gateway Summary CDM

Page 16: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

19

Software stack components

Importance Essential depending on hardware

configuration

Why needed To support clusters with co-processor nodes

Availability for HCS

GPGPU – CUDA with OpenCL, drivers and

dev. tools

Xeon Phi – Intel Manycore Platform Software

Stack (MPSS)

Value add Easy installable add-on packages for

GPGPU and Xeon Phi

Copyright 2013 FUJITSU

Application programs

Workload manager

Operating System

GPGPU and XEON Phi

software support

Cluster deployment and management

Automated installation and configuration

Administrator interface Operation and monitoring

Cluster checker User environment

management

Management of cluster resources Manage serial and parallel jobs Fair share usage between users

Parallel Middleware

Scientific Libraries

Parallel File System

Compilers, performance and profiling tools

Graphical end-user interface

Fujitsu PRIMERGY HPC Clusters

Fujitsu HPC Cluster Suite

Co-processor support

RedHat Linux CentOS

OS Drivers

Needs HCS FEFS Gateway Summary CDM

Page 17: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

20

Software stack components

Importance Essential

Why needed Bare metal deployment of nodes

Cluster configuration management

Monitoring of cluster health

Availability for HCS

Cluster Deployment Manager (CDM – Fujitsu

developed product)

Intel Cluster Checker for validation

Nagios/Ganglia for monitoring and alerting(now)

AdminGUI (codename) graphical interface for

management and monitoring (future)

Value add Comprehensive deployment tool for small or

large clusters

Single graphical web-based interface for all

activities

Copyright 2013 FUJITSU

Application programs

Workload manager

Operating System

GPGPU and XEON Phi

software support

Cluster deployment and management

Automated installation and configuration

Administrator interface Operation and monitoring

Cluster checker User environment

management

Management of cluster resources Manage serial and parallel jobs Fair share usage between users

Parallel Middleware

Scientific Libraries

Parallel File System

Compilers, performance and profiling tools

Graphical end-user interface

Fujitsu PRIMERGY HPC Clusters

Fujitsu HPC Cluster Suite

Cluster deployment and management

RedHat Linux CentOS

OS Drivers

Needs HCS FEFS Gateway Summary CDM

Page 18: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

21

Software stack components

Importance Essential

Why needed Enables sharing of all cluster resources

between various users

Manages policies to determine order of

resource usage

Availability for HCS

Open source choices • TORQUE

• SGE

• SLURM

Commercial • PBS Professional for advanced edition

Value add Variety gives ability to meet the needs of

many customers

PBSPro can meet the needs of the most

demanding customers and systems

Copyright 2013 FUJITSU

Application programs

Workload manager

Operating System

GPGPU and XEON Phi

software support

Cluster deployment and management

Automated installation and configuration

Administrator interface Operation and monitoring

Cluster checker User environment

management

Management of cluster resources Manage serial and parallel jobs Fair share usage between users

Parallel Middleware

Scientific Libraries

Parallel File System

Compilers, performance and profiling tools

Graphical end-user interface

Fujitsu PRIMERGY HPC Clusters

Fujitsu HPC Cluster Suite

Workload managers

RedHat Linux CentOS

OS Drivers

Needs HCS FEFS Gateway Summary CDM

Page 19: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

22

Software stack components

Importance Essential for parallel applications running

across multiple nodes

Why needed Provides the software layer needed for inter-

node process communication

Availability for HCS

Open source • OpenMPI

• MPICH

• MVAPICH

Commercial • Intel MPI

Value add Variety makes it possible to bid to many

customers

Some customers need multiple options due to

application dependencies

Copyright 2013 FUJITSU

Application programs

Workload manager

Operating System

GPGPU and XEON Phi

software support

Cluster deployment and management

Automated installation and configuration

Administrator interface Operation and monitoring

Cluster checker User environment

management

Management of cluster resources Manage serial and parallel jobs Fair share usage between users

Parallel Middleware

Scientific Libraries

Parallel File System

Compilers, performance and profiling tools

Graphical end-user interface

Fujitsu PRIMERGY HPC Clusters

Fujitsu HPC Cluster Suite

Parallel middleware

RedHat Linux CentOS

OS Drivers

Needs HCS FEFS Gateway Summary CDM

Page 20: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

23

Software stack components

Importance Needed for some applications.

Why needed Used most often for in-house code

development

Sometimes needed by ISV’s

Availability for HCS Lapack, ScalaPack BLAS netcdf, netcdf-devel hdf5 fftw, fftw-devel atlas, atlas-devel GMP Global Arrays MKL

Value add Meets the demands of many customers Some customers need multiple options due

to application dependencies

Copyright 2013 FUJITSU

Application programs

Workload manager

Operating System

GPGPU and XEON Phi

software support

Cluster deployment and management

Automated installation and configuration

Administrator interface Operation and monitoring

Cluster checker User environment

management

Management of cluster resources Manage serial and parallel jobs Fair share usage between users

Parallel Middleware

Scientific Libraries

Parallel File System

Compilers, performance and profiling tools

Graphical end-user interface

Fujitsu PRIMERGY HPC Clusters

Fujitsu HPC Cluster Suite

Scientific libraries

RedHat Linux CentOS

OS Drivers

Needs HCS FEFS Gateway Summary CDM

Page 21: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

24

Software stack components

Importance Needed for software development

Why needed Used to compile applications and provide

tools to optimize application performance

Availability for HCS Compilers

• GNU c, c++, gfort • Open64 (PathScale compiler) • Intel Cluster studio

Profiling tools • Intel Cluster studio • Allinea DDT

Performance tools • Intel vtune • PAPI • TAU

Value add Can meets the demands of many customers

with both open source and commercial

offerings

Copyright 2013 FUJITSU

Application programs

Workload manager

Operating System

GPGPU and XEON Phi

software support

Cluster deployment and management

Automated installation and configuration

Administrator interface Operation and monitoring

Cluster checker User environment

management

Management of cluster resources Manage serial and parallel jobs Fair share usage between users

Parallel Middleware

Scientific Libraries

Parallel File System

Compilers, performance and profiling tools

Graphical end-user interface

Fujitsu PRIMERGY HPC Clusters

Fujitsu HPC Cluster Suite

Compilers, performance and profiling tools

RedHat Linux CentOS

OS Drivers

Needs HCS FEFS Gateway Summary CDM

Page 22: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

25

Software stack components

Importance Needed for demanding I/O requirements

Why needed Usually essential for large clusters

(>64 nodes)

Can be used on smaller clusters if I/O load is

expected to be high

Availability for HCS

Fujitsu Exabyte File System (FEFS),

developed and maintained by Fujitsu

Value add Originally developed for the demands of the

K-Computer

Inherits reliability and performance

enhancements of this system

Updates passed back to the community

Copyright 2013 FUJITSU

Application programs

Workload manager

Operating System

GPGPU and XEON Phi

software support

Cluster deployment and management

Automated installation and configuration

Administrator interface Operation and monitoring

Cluster checker User environment

management

Management of cluster resources Manage serial and parallel jobs Fair share usage between users

Parallel Middleware

Scientific Libraries

Parallel File System

Compilers, performance and profiling tools

Graphical end-user interface

Fujitsu PRIMERGY HPC Clusters

Fujitsu HPC Cluster Suite

Note: NFS can be used for small or low I/O demanding clusters.

Either storage from the head node or a specified NAS server is

used in these cases.

Parallel file system

RedHat Linux CentOS

OS Drivers

Needs HCS FEFS Gateway Summary CDM

Page 23: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

26

Software stack components

Importance Attractive to end-users

Why needed Simplifies the usage of HPC for end-users

Enables sharing of results and data between

team members

Can be used from remote locations

Availability for HCS

HPC Gateway

Value add Used to provide pre-packaged solutions for

running applications

Enables non-HPC specialist to use a HPC

cluster

Copyright 2013 FUJITSU

Application programs

Workload manager

Operating System

GPGPU and XEON Phi

software support

Cluster deployment and management

Automated installation and configuration

Administrator interface Operation and monitoring

Cluster checker User environment

management

Management of cluster resources Manage serial and parallel jobs Fair share usage between users

Parallel Middleware

Scientific Libraries

Parallel File System

Compilers, performance and profiling tools

Graphical end-user interface

Fujitsu PRIMERGY HPC Clusters

Fujitsu HPC Cluster Suite

Graphical end-user interface

RedHat Linux CentOS

OS Drivers

Needs HCS FEFS Gateway Summary CDM

Page 24: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

27

Fujitsu HPC Cluster Suite - V1.0 release

Open Edition Basic Edition Advanced Edition

Deployment CDM + SVIM (SVIM used for the installer node)

Cluster Management Intel Cluster Checker *1 (includes: iozone, streams, HPL)

ServerView

Workload manager Torque (default) *1 PBS pro

Co-processor support -

Scientific Libraries Intel MKL*2

Parallel Libraries Open MPI*1, Intel MPI *2

Compilers GNU*1 , Intel Cluster Studio XE *2

Performance and profiling tools GNU (c, c++, g77, debug and profiler)*1,

Intel Cluster Studio XE *2

Parallel/Shared File system NAS -

Cloud Interface - - -

End-User Interface HPC Gateway Entry HPC Gateway Basic HPC Gateway Advanced

Other Recommended to 128 nodes Recommended to 128 nodes

HA feature

up to 1024 nodes

> 1024 as project bid

*1 Only installation support, does not include any technical support or fixes

*2 Must be purchased separately

Copyright 2013 FUJITSU

Needs HCS FEFS Gateway Summary CDM

Page 25: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

28

Cluster Deployment Manager

Managing the cluster and configuration

Copyright 2013 FUJITSU

Page 26: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

29

CDM - Easy-to-use cluster management

Powerful tool used to improve the productivity by reducing the TCO.

Leveraged know-how from high-end HPC (K-Computer)

Copyright 2013 FUJITSU

Automates compute node installation and cluster configuration - Deployment of the operating system and all HPC software components as well as their related configuration (including PRIMERGY specific drivers)

- Ability to add/modify/remove additional software components and their configuration for all nodes via a single command

Installs the OS on the installer or head node of the cluster

- Automatic hardware detection and apply proper drivers

SVIM

CDM

Installation process

Needs HCS FEFS Gateway Summary CDM

Page 27: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

30

CDM overview of operation

Management from the head (installer) node

Operations can be achieved from the head node (no changes on individual nodes)

• Modification of configuration files

• Copying files to nodes

• Installing software components

• Add new users to the system

• Add/remove/replace nodes of the cluster

• Parallel shell can be used to execute commands across the whole cluster

Variety of node types can be deployed

Multiple node groups can be used

• head, compute, login, I/O, ftp, compilation

Different OS’s can be used

A separate repository is used to manage each OS to be used

Node groups use software from one of the repositories

Copyright 2013 FUJITSU

Needs HCS FEFS Gateway Summary CDM

Page 28: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

31

CDM based cluster architecture - SME use case -

Compute node group

installer

Node group

Installer node

(Mgmt node)

External NTP

server

Public

network

Fujitsu CDM

Compute node # n

Compute node # 5

Compute node # 4

External DNS

server

Management/

data network

DX80

Provisioning network

CDM

Repository

Copyright 2013 FUJITSU

Compute node # 3

Compute node # 2

Compute node # 1

Interconnect

Ethernet

(or IB)

Needs HCS FEFS Gateway Summary CDM

Public

Page 29: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

32

CDM based cluster architecture - Medium/large user -

IO node group

installer

Node group

Compute node # XX

Compute node # X2

Compute node # X1

External NTP

server

Public

network

Fujitsu CDM

Compute1 node group

Compute node # YY

Compute node # Y2

Compute node # Y1

Compute2 node group

External DNS

server

IO node # ZZ

IO node # Z1

Management/

data network Interconnect

InfiniBand

DX80

Interconnect

Ethernet Provisioning network Public

CDM

Repository

Login

Node group

Fail

over

Login node Login node

Login node Login node

Head node 1

Head node 2

Batch

server

CDM

Batch

server

CDM

Copyright 2013 FUJITSU

Needs HCS FEFS Gateway Summary CDM

Page 30: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

33

Fujitsu PRIMERGY HPC Gateway

A portal to the HPC work place

Integrated in the HPC Cluster Suite

Copyright 2013 FUJITSU

Page 31: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

34

HPC Gateway

Built on Liferay Portal and Tomcat

application server

All tools accessible from a desktop

browser

HPC resources used as an

extension of the desktop (Process

Manager)

Share, exchange and track activity

across the team (Wiki, Documents,

Calendar, Forum, KnowledgeBase)

Application aware using application

catalogue templates

Copyright 2013 FUJITSU

Needs HCS FEFS Gateway Summary CDM

An integrated web environment

Page 32: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

35

Gateway architecture

Copyright 2013 FUJITSU

Disk

End-User

workstations Compute

nodes

eth0

Ethernet

ib0

Infiniband

Jobs

run here

Parallel File System (PFS)

Job A Job B

Inter-process communication (MPI) and PFS data traffic

Management (job start/stop, NFS of /home)

Pre/Post

processing

Tomcat App. server

Liferay portal

HPC

Gateway

portlet

Gateway

submits jobs

Workload manager

Jobs queued here

Gateway web

interface

Head node

Disk

Needs HCS FEFS Gateway Summary CDM

Page 33: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

36

Gateway differentiation for HCS versions

Open Basic Advanced

Run, monitor, view results of application jobs Yes Yes Yes

Run legacy job scripts Yes Yes Yes

On-boarding new applications

(creating an Application template)

Yes Yes Yes

Import templates from Application catalogue (Fujitsu

download)

Payable Payable Yes

Import workflow (own or 3rd party processes) Yes Yes Yes

Graphical desktop administration interface No No Yes

Workflow editor No No Yes

Collaboration (Wiki, Documents, Calendar, Forum,

KnowledgeBase)

Yes Yes Yes

Multiple Business projects No No Yes

Customizable security model No No Yes

Access Multiple clusters in one site No No Yes

Number of concurrent users 2 100 400

Support No Yes Yes

Needs HCS FEFS Gateway Summary CDM

Page 34: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

38

Parallel File System Fujitsu's Exabyte File System – FEFS

Copyright 2013 FUJITSU

Page 35: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

39

Meta data User data

Summary of common file system types

Copyright 2013 FUJITSU

client client client

NAS

server

client client client

I/O

server

I/O

server

client client client

MDS

server

I/O

server

I/O

server

Meta data User data

client client client

MDS

server

I/O

server

I/O

server

NAS Clustered Parallel Distributed

Meta data

client client

MDS

server

I/O

server

Ethernet Ethernet

or IB

IB

or Ethernet

Ethernet

or IB (locally) Ethernet

or IB (locally)

• Data can exist over different sites

• Emphasis on data accessibility, duplication, reliability

• Performance can vary due to network bandwidth when data is

not local

• Multiple I/O servers each with a part of the

total file system

• Clients and servers normally on the same

network

• IB used for high-speed access

• Very scalable (just add more I/O servers)

• Perform well for large block I/O

• Multiple I/O servers each with

access to all the file system

• Clients and servers normally

on the same network

• Bottleneck for large numbers

of clients or heavy I/O

• Normally accessed

via NFS

• Simple set-up but

limited

performance

• More scalable

versions require

proprietary client

modules

Needs HCS FEFS Gateway Summary CDM

Page 36: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

40

HPC file systems (temporary storage)

Main usage

Temporary job run-time data

Permanent storage is also needed (not

discussed)

File system needs

Global Name Space

Different Locking: File/Block/Byte

Security: global authentication/authorization

Reliability: No Single point of Failure

Availability: add nodes/capacity without

downtime

Scalability: Capacity/number of files

Standards: IEEE Posix

High Performance: bandwidth, throughput

Applicable File system types

NAS file system (NFS)

Parallel file system (GPFS, Lustre, FEFS)

Aspects affecting file system choice

Total throughput requirements (if known)

Size of the cluster (# of file system client

nodes)

Size of the file system to be used

Whether apps are I/O bound or compute only

Number of concurrent jobs

Application is I/O intensive (e.g. Nastran)

Needs HCS FEFS Gateway Summary CDM

Page 37: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

41

FEFS characteristics

Extremely Large capacity

Extra-large volume (100PB~1EB)

Massive number of clients (100k~1M) & I/O servers (1k~10k)

High I/O Performance

Throughput of Single-stream (~GB/s) & Parallel IO (~TB/s)

Reducing file open latency (~10k ops)

High Reliability and High Availability

Continuation of file service even if a component failure occurs

I/O Usage Management

Fair-share QoS

Best-effort QoS

FEFS is optimized for maximizing hardware performance while minimizing file I/O overhead

Copyright 2013 FUJITSU

Meta Data Server

(MDS)

Client Nodes

Meta Data

Object Storage

Server

(OSS)

Object Storage

Target

(OST)

File Data

Needs HCS FEFS Gateway Summary CDM

Page 38: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

42

Specification of FEFS and Lustre

Feature FEFS Current Lustre

System Limits Max file system size

Max file size

Max #files

Max OST size

Max stripe count

Max ACL entries

8EB

8EB

8E

1PB

20k

8191

64PB

320TB

4G

16TB

160

32

Node Scalability Max #OSTs

Max #clients

20k

1M

8150

128K

Usability QoS Yes No

Directory Quota Yes No

InfiniBand Multi-rail Yes No

Block Size (Backend File System) ~512KB 4KB

Copyright 2013 FUJITSU

Needs HCS FEFS Gateway Summary CDM

Page 39: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

43

OSS configuration – fail-over pair 1

PRIMERGY RX300

FC FC FC FC

IB IB

PRIMERGY RX300

FC FC FC FC

IB IB

DX80

#8

CM CM

DX80

#7

CM CM

DX80

#6

CM CM

DX80

#5

CM CM

DX80

#4

CM CM

DX80

#3

CM CM

DX80

#2

CM CM

DX80

#1

CM CM

OSS configuration – fail-over pair n

PRIMERGY RX300

FC FC FC FC

IB IB

PRIMERGY RX300

FC FC FC FC

IB IB

DX80

#8

CM CM

DX80

#7

CM CM

DX80

#6

CM CM

DX80

#5

CM CM

DX80

#4

CM CM

DX80

#3

CM CM

DX80

#2

CM CM

DX80

#1

CM CM

OSS1 OSS2 OSS x OSS y

MDS

(fail-over pair)

OST’s OST’s OST’s OST’s OST’s OST’s OST’s OST’s OST’s OST’s OST’s OST’s OST’s OST’s OST’s OST’s

IB switch network

Compute cluster

Login

nodes

FEFS typical configuration

Note: All FEFS servers are configured with fail-over

Compute cluster Compute cluster Compute cluster Compute cluster

Copyright 2013 FUJITSU

IB IB

DX80

#1

CM CM

MDS

PRIMERGY RX300

IB IB

Needs HCS FEFS Gateway Summary CDM

Page 40: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

44

Support matrix

FEFS

Version

Supported OS Supported PRIMERGY

servers

Usable Storage units

V1 MDS/OSS RedHat EL 5.8

Client RedHat EL 5.8/6.3

All PRIMERGY

supported by the HPC

Cluster suite

ETERNUS

• DX80S2/90S2

• DX410S2/DX440S2

DDN *1

• SFA12K

Copyright 2013 FUJITSU

*1: usage of DDN is on a project bid basis only

Needs HCS FEFS Gateway Summary CDM

Page 41: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

45

HPC: полный спектр предложений от Fujitsu

Copyright 2013 FUJITSU

FEFS

Gateway Cent OS

HPC Cluster Suite

Cluster Management & Operation

Sizing,

design

Proof of

concept

Integration into

customer environment

Certified system

and production

environment

Complete assembly,

pre-installation and

quality assurance

Ready to

Operate

at delivery

PRIMERGY Server, Workstation

ETERNUS Storage

ISV and Research

Partnerships

Open Petascale Libraries Network

PreDiCT Initiative

Consulting and Integration Services

Ready-to-Go

Needs HCS FEFS Gateway Summary CDM

Page 42: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

46

Page 43: Webinar - Fujitsusp.ts.fujitsu.com/dmsp/Publications/public/ps-py-hpc-webinar-28052013-ru.pdfOpen Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy

47

Overview of competitors

Copyright 2013 FUJITSU

Fujitsu BCM Stack IQ IBM HP DELL Cluster Management

- Deployment

- Monitoring

X; CDM X; BCM X; Rocks+ X; PCM / xcat X; CMU X; resell

Workload Manager X; resell & OSS X; resell & OSS X; resell & OSS X; sell LSF X; resell & OSS X; resell

OSS integration X X; BCM X X X X; resell

ISV integration X - - X - - Graphical Administrator interface - (planned) X; BCM X X X X; resell

Graphical end user interface X; Gateway - - X; PAC - - Cloud integration - (planned) X; BCM X X - X; resell

HW integration

- Validation

- HW monitoring

- BIOS setting

X * HW monitoring

- BIOS setting

- * Rely on HW

vendor

- * Rely on HW

vendor

X X X

Application template X; Gateway - - X; PAC - X; resell

Process integration X; Gateway - - - - - HW portfolio X; PRIMERGY - - X X X Global support X; (Planned) - - X X X PFS integration X; FEFS - - X; GPFS X; (HP

SFS/Lustre)

X; Lustre

Needs HCS FEFS Gateway Summary CDM