21
7th Workshop on Fusion Data Processing Validation and Analysis Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications J. Nieto 1 , D.Sanz 1 , G. de Arcas 1 , R. Castro 2 , J.M. López 1 , J. Vega 2 1 Universidad Politécnica de Madrid (UPM), Spain 2 Asociación EURATOM/CIEMAT para Fusión. Spain

Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

  • Upload
    tranque

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

7th Workshop on Fusion Data Processing Validation and Analysis

Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications

J. Nieto1, D.Sanz1, G. de Arcas1, R. Castro2, J.M. López1, J. Vega2

1 Universidad Politécnica de Madrid (UPM), Spain2 Asociación EURATOM/CIEMAT para Fusión. Spain

Page 2: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

27th Workshop on Fusion Data Processing Validation and Analysis

Index

Scope of the project Project goals Sample algorithm Test system

Subtask 1: GPU benchmarking Subtask 2: EPICS integration (DPD) Results Conclusions

Page 3: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

37th Workshop on Fusion Data Processing Validation and Analysis

FPSC Project FPSC Project Objective: To develop a FPSC prototype

focused on Data Acquisition for ITER IO The “functional requirements” of FPSC prototype:

To provide high rate data acquisition, pre-processing, archiving and efficient data distribution among the different FPSC software modules

To interface with CODAC and to provide archiving

FPSC software based compatible with RHEL and EPICS To use COTS solutions

Page 4: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

47th Workshop on Fusion Data Processing Validation and Analysis

FPSC HW architecture

Real Time Controller 19 ” 1 U Chasis NI 8353 RT

PXI Clk10

PFSCSystem Controller CPU

PXIe-PCIe8372

)

PXIe-PCIe Data Archiving servers

NI-8370

172.17.152.13

172.17.152.11

ETHERNET

NI PXI-7952R

NI PXI-6682NI PXI-6653

172.17.152.33

172.17.152.40

MiniCODAC

PC Desktop

PC Desktop

ETHERNET

172.17.152.34

DEVELOPMENT HOST

GPUs

Page 5: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

57th Workshop on Fusion Data Processing Validation and Analysis

GPU subtasks Goals:

To provide benchmarking of Fermi GPUs (subtask 1) Analyze GPU development cycle (methodology) Compare execution times in GPU & CPU for similar

developing effort To provide a methodology to integrate GPU processing

units into EPICs (subtask 2) Requisites:

Use an algorithm representative of the type of operations that would be needed in plasma pre-processing

Page 6: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

67th Workshop on Fusion Data Processing Validation and Analysis

GPU Test System

Linux RedHat Enterprisev5.5 64bits

CPU Asyn

DPD Subsystem

EPICS IOC

GPU Asyn

IPP v7.0 CULA R11CUBLAS v3.2

CODAC CORE SYSTEM 2.0

Host processor softwareOSMiddlewareCompilers

CPU LibrariesGPU Libraries

RedHat Enterprise Linux 5.5

EPICS 3.14.12 and asynDriver 4.16

gcc V4.12.20080704 and nvcc 

V0.2.1221

MKL 10.3 Update 9 and IPP 7.0

NVIDIA SDK 3.2

NVIDIA CUBLAS 3.2

EMPHOTONICS CULA R11

NVIDIA GTX580

Xeon X5550QuadCore

Page 7: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

77th Workshop on Fusion Data Processing Validation and Analysis

Sample algorithm

Loop until convergence

Compute first guess(Xfit)

Compute Jacobian matrix(JMxN)

Compute update coeffs.c = c + (J’ · J-1) · J’ · (x’ - x’fit)

Update fit (xfit)

Compute error

Initial coeffs C (N=10)

Input data X(M points)

Fitted coeffs(position & amplitude)

Fitted data

Input DataFitted Data

Best fit code for detecting position and amplitude of a spectra composed by a set of Gaussians based on Levenberg-Marquardt method

Page 8: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

87th Workshop on Fusion Data Processing Validation and Analysis

Subtask 1 Goal: benchmarking of a Fermi GPU Standard GPU programming methodology:

GPU is operated from the host as a coprocessor

Host threads sequence GPU operations: Responsible for moving data

(Host↔Device) Operations are coded:

Programming kernels: CUDA Using libraries primitives: CULA,

CUBLAS…Loop until convergence

Compute first guess(Xfit)

Compute Jacobian matrix(JMxN)

Compute update coeffs.c = c + (J’ · J-1) · J’ · (x’ - x’fit)

Update fit (xfit)

Compute error

Initial coeffs C (N=10)

Input data X(M points)

Fitted coeffs(position & amplitude)

Fitted data

Page 9: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

97th Workshop on Fusion Data Processing Validation and Analysis

Results S1 (I)Block Size

Exec. Time(ms)

Throughput(MB/s)

Improv.Ratio

GPU CPU GPU CPU  256 2,86 2,75 0,7 0,7 1,0512 2,85 4,83 1,4 0,8 1,71024 3,6 8,69 2,3 0,9 2,42048 5,21 16,07 3,1 1,0 3,14096 16,42 28,55 2,0 1,1 1,78192 42,85 55,26 1,5 1,2 1,3

16384 85,4 107,5 1,5 1,2 1,332768 168,65 210,99 1,6 1,2 1,365536 334,77 425,96 1,6 1,2 1,3

Page 10: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

107th Workshop on Fusion Data Processing Validation and Analysis

Results S1 (II)

Page 11: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

117th Workshop on Fusion Data Processing Validation and Analysis

Subtask 2 Goal: to provide EPICS support for GPU processing

Processing units

EPICS IOC

DPD

FPGAGPU Others: archiving…

Asyn Layer

Data Generation CPU

EPICS IOC

Acquisition &

Processing

Asyn Layer

Single processapproach

DPDapproach

Page 12: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

127th Workshop on Fusion Data Processing Validation and Analysis

Proposed methodology The core of FPSC software is the DPD, it allows for:

Moving data with very good performance. Integrating all the functional elements (EPICS monitoring, Data processing,

Data Acquisition, Remote archiving, etc). Having a code completely based on the standard asynDriver. Full compatibility with any type of required data

EPICS IOC

State MachineCODAC

Configuration

Hardware Monitoring DPD (Data Processing and Distribution)

SubsystemTiming

TCN/1588FPGAGPU

Proc.

Hardware/ CubicleSignals

Archiving

Asyn Layer

Monitoring CPUProc.SDN

Page 13: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

137th Workshop on Fusion Data Processing Validation and Analysis

DPD features (I) DPD enables to configure both the different functional

elements (FPGA acquisition, GPU processing, SDN, EPICS monitoring, data processing, data archiving) of the FPSC and the connections (links) between them.

Functional elements allow: reading data blocks from inputs processing received data generating new signals routing data blocks to output links

DPD enables the integration of new type of functional elements to extend the FPSC functionality. This implies the creation of the corresponding asynDrivers that can be carried out in a simple way.

Enables a very easy integration of any existing asynDriver

EPICS IOC

Input Links Output Links

Page 14: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

147th Workshop on Fusion Data Processing Validation and Analysis

DPD features (II) DPD enables to configure the data routing at configuration-time

or even at run-time (to implement fault tolerant solutions) DPD provides a common set of EPICS PVs for the several

functional elements and their respective links DPD provides on-line measurements of both throughputs and

buffer occupancy in the links DPD implements an optional multi-level buffering (memory, disk)

backup solution for any link of the system

Level 0

Level 1

Level 2

Backup Block Link

Page 15: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

157th Workshop on Fusion Data Processing Validation and Analysis

Test scenario

T0

T2

T3

T3-T2 Processing Time (TP)

T4-T1 Module Service Time (TMS)

Internal Process Time (TP0)

Host → Dev

DPD (Data Processing and Distribution) Subsystem

GPUProc.

Data Generation

Host → Dev

Processing

Dev → Host

T1 T4

GPUProc.GPU

Proc.GPU

T4-T0 Total Service Time (TTS)

TP0

Page 16: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

167th Workshop on Fusion Data Processing Validation and Analysis

Timing (II)TiCamera

DataGeneratorT0: New data block is generated

Received data block

DataFit Processing

Data block Received

DataFit result packing and routing

T1: Data block is received in the module

T2: Data block is ready to be processed

T3: DataFit processing is finished

T4: New DataFit processed data is packed and sent

TPTMS

TTS

Page 17: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

177th Workshop on Fusion Data Processing Validation and Analysis

Test scenario 1

Monitoring

EPICS waveform

TiCameraDataGenerator

GPU processing:TiCameraFit

Page 18: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

187th Workshop on Fusion Data Processing Validation and Analysis

Test scenario 2

Monitoring

EPICS waveform

TiCameraDataGenerator

GPU#0 processing:TiCameraFit

GPU#0 processing:TiCameraFit

Page 19: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

197th Workshop on Fusion Data Processing Validation and Analysis

Test scenario 3

Monitoring

EPICS waveform

TiCameraDataGenerator

GPU#0 processing:TiCameraFit

GPU#1 processing:TiCameraFit

Page 20: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

207th Workshop on Fusion Data Processing Validation and Analysis

Results S21. To determine DPD overhead with respect to “hard coded”

approach2. To test DPD scalability (multi-module, multiple-hw support)

Block Size SP App 1M/1GPU 2M/1GPU 2M/2GPU

4096 13,2 14,1 29,7 14,2

8192 43,1 44,8 89,9 45,4

16384 85,6 86,6 172,3 87,0

- Using 3rd solution, we have been able to process 3MB/s

running 2 modules in 2 different GPUs

Page 21: Diapositiva 1 - ENEA - Fusione · PPT file · Web view2012-03-26 · 7th Workshop on Fusion Data Processing Validation and Analysis. Index. Scope of the project. Project goals. Sample

217th Workshop on Fusion Data Processing Validation and Analysis

Conclusions Development methodology for using GPUs is being standardized,

providing increasing levels of abstraction from hardware implementation details

“Hard coded” implementations seriously compromise scalability and maintainability, without guarantying relevant increase in performance

Specific frameworks are being developed for different scenarios (Thrust, DPD…) To simplify development To promote reusability To provide scalability and maintainability To include first level parallelism (internal load balancing based

on multithreading)