Fast & Furious: building HPC solutions in a nutshell

Preview:

DESCRIPTION

Slides from IT Weekend Ukraine conference presentation

Citation preview

Victor Haydin

Head of R&D, ELEKS

Agenda

1. What is HPC?2. Why does somebody need it?3. How to do it?

What?

Wikipedia: “High-performance computing (HPC)

uses supercomputers and computer clusters to solve

advanced computation problems.

Today, computer systems approaching the teraflops-

region are counted as HPC-computers.”

Definition

Definition

advanced computation

problems

Modeling and Simulation

Low-latency processing

Big Data

A.I.

SupercomputersComputer clusters

Teraflops performance

HPC systems comparison

1

10

100

1000

10000

100000

1000000

10000000

100000000

CPU (Intel Ivy Bridge) 100xCPU GPU (NVIDIA Kepler) 100xGPU IBM Sequoia

HPC

Why?

Finances

Healthcare

Fluid- and Aerodynamics

Genetics

Computer Vision and Image Processing

How?

Disclaimer

Commodity Hardware

VS.

Specialized

GPU-based

Example 1:Financial Risk Analysis Using Monte-Carlo methodOn GPGPU

Distribute

Run

Define

Store

Feed

Present

Survive

High-level architecture

Middleware

Worker

Example 2:Image Search platformUsing local feature detectionOn GPGPU

High-level architecture

Middleware

Load Balancing

0

20

40

60

80

100

120

140

9 workers 18 workers

Unicast

• Computation time – 1 second• Sending time – 120 seconds!

• More workers – slower speed

Unicast

0

20

40

60

80

100

120

140

1 2

Unicast

Multicast

• Computation time – 1 second• Sending time –25 seconds

• Almost 5 times faster

Multicast

Middleware

Worker

ERROR: CUDA ERROR CODE 30 (“UNKNOWN ERROR”)

Run same code on CPU and GPU

CUDA_KERNEL foo(…)

{

CUDA_DEFINE_PARAMS;

// your code here

}

CUDA_CALL(threads, blocks, foo(…))

Kernel

Generated code// GPU mode

__global__ void foo (…)

{

// your code here

}

foo<<<threads, blocks>>>(…)

// CPU modevoid foo(…){

// same code here}

// LOOP OVER threads and blocks{

foo(…)}

Pros & Cons• Same code for CPU and

GPU

• Debugging

• Range checking

• No CUDA ERROR 30

• Shared memory

• __syncthreads()

@victor_haydin

linkedin.com/in/victorhaydin

victor.haydin@gmail.com

Got a question?Ask!