Experiences Accelerating MATLAB Systems Biology Applications Heart Wall Tracking Lukasz Szafaryn,...

Experiences Accelerating MATLAB Systems Biology

Applications

Heart Wall Tracking

Lukasz Szafaryn, Kevin SkadronUniversity of Virginia

Outline

• MATLAB• Optimizations to MATLAB• GPU Acceleration with CUDA• Applications (Heart Wall Tracking and Myocyte Simulation)

– Problem– Algorithm– Optimization and performance– Lessons

• Conclusions • Future Research

MATLAB

• Convenient but inefficient programming language of choice for scientists- Interpreted language- Most of the existing code and libraries are single-threaded

• MATLAB Parallel Toolbox - understanding of parallel programming

• Jacket and GPUmat - large parallelism to justify overhead

MATLAB contd.

• Interpreted language optimized by JIT compiler – 2x slower than C

• MATLAB Embedded Compiler has limited support – 1.2-1.4x slower than C

• MEX Interface to link C code- translating to C - many functions written from scratch- no support for convenient OpenMP standard, need to use

thread libraries

Acceleration

1. Translation: - convert MATLAB to C

2. Parallelization:– C for multi-core CPU– CUDA for GPU

Experimental Setup

– CPU: 3.2 GHz quad-core Intel Core 2 Extreme– GPU: NVIDIA GeForce GTX 280 (PCIe 2.0)– MS Windows, MS C Compiler

Allocate GPU memory

Transfer inputs

Launch kernel

Return to CPU

Transfer results

Free GPU memory

C Program

CUDA Kernel

CPU GPU

Acceleration with GPU (CUDA)

Entire Heart Wall Application

read first frame from input file, display

[0.533 s] [0.28 %]

crop image, display image

[0.089 s] [0.05 %]

SRAD, display image

[3.224 s] [1.72 %]

detect edges, display image

[0.448 s] [0.24 %]

morphological transformation, display image

[0.275 s] [0.15 %]

dilate image, display image

[0.285 s] [0.15 %]

inner and outer ellipse parameter

[0.001 s] [0.00 %]

create ellipse sample points

[0.726 s] [0.39 %]

track movement of sample points in all

frames

[129.276 s] [68.99 %]

display movement of sample points

through frames

[36.63 s] [19.55 %]

save outputs into file

[0.035 s] [0.02 %]

Hough Search, display images

[15.872 s] [8.47 %]

Heart Wall TrackingDescription

• Speed and shape of contractions provides important information about body’s response to stimulus

• Measured by tracking inner and outer heart walls through multiple frames

Input OutputTracking

Heart Wall TrackingAlgorithm

• Processing 20 inner and 30 outer heart wall points, total 50 points (TLP)

• Processing of each point - sequence of operations on the surrounding area and template (DLP)

Update templates

Read next frame

Track inner point

Track outer point

Save point locations

10 # of frames / 10

task-level parallelism (TLP)

1 2 3 4 50

data-level parallelism (DLP)

Heart Wall TrackingPerformance

• Times reported for processing of 300 frames (10s of ultrasound recording)

10.86x

5.94x 6.29x

33.20x 36.89x 41.23x

Porting Tradeoffs

Convenience Performance

Developing modular code

• replacing each MATLAB function with equivalent parallelized GPU function

• common routines can be possibly reused in other applications

Hiding specific aspects of GPU programming inside each module

• each module sets up its own GPU execution parameters

• each module performs its own I/O with GPU transparently

Restructuring and combining code

•writing code based on algorithm tasks rather than MATLAB statements

•overlap parallel (often unrelated) tasks by executing them in the same kernel call

Exposing specific aspects of GPU programming

•doing more work inside each GPU kernel call to fully exploit parallelism and avoid overhead

•performing I/O with GPU manually to eliminate redundant data transfers

multiple

add,sub, mul, div

---------- SYNC ----------

Reduction

---------- SYNC ----------

multiple

add,sub, mul, div

---------- SYNC ----------

convolution

multiple

add,sub, mul, div

---------- SYNC ----------

Reduction

---------- SYNC ----------

multiple

add,sub, mul, div

---------- SYNC ----------

convolution

multiple

add,sub, mul, div

---------- SYNC ----------

Reduction

---------- SYNC ----------

multiple

add,sub, mul, div

---------- SYNC ----------

convolution

multiple

add,sub, mul, div

---------- SYNC ----------

Reduction

---------- SYNC ----------

multiple

add,sub, mul, div

---------- SYNC ----------

convolution

multiple

add,sub, mul, div

---------- SYNC ----------

Reduction

---------- SYNC ----------

multiple

add,sub, mul, div

---------- SYNC ----------

convolution

multiple

add,sub, mul, div

---------- SYNC ----------

Reduction

---------- SYNC ----------

multiple

add,sub, mul, div

---------- SYNC ----------

convolution

multiple

add,sub, mul, div

---------- SYNC ----------

Reduction

---------- SYNC ----------

multiple

add,sub, mul, div

---------- SYNC ----------

convolution

tim eSMs

outer heart points

inner heart points

1 2 3 30…

global SYNC

Frame 1

Frame 2

Heart Wall TrackingLessons

• Typical MATLAB code written by a scientist has room for optimization – 1.3x

• Conversion to C requires significant coding effort• Selective offloading results in multiple CPU-GPU data

transfer overheads• Iterative codes require merging kernels and reusing

variables to avoid overhead• CUDA libraries cannot be used as a part of GPU code• Good performance - significant changes to the structure of

code, difficult for a scientist to understand

Conclusions

• Limited availability of C libraries necessitates time consuming coding

• Many systems biology applications (even those with limited parallelism) benefit from GPU

• GPU overheads are significant (should be eliminated in new CPU-GPU architectures)

• Real-time processing feasible in near future• Ultimately, acceleration of applications should be

automated!

Future Research

• Automatic acceleration with the use of compiler - via use of architecture-specific libraries- via compiling for target architecture

• Merging of workloads- based on resource needs- based on dependency

• Acceleration with alternative architectures- well suited for fine-grained parallelism

- esp. FPGA

Acknowledgements

• Funding provided by:– NSF grant IIS-0612049– SRC grant 1607.001

• Equipment donated by NVIDIA

Software

Source code will be soon available at:http://lava.cs.virginia.edu/wiki/rodinia

Questions

Backup

Experiences Accelerating MATLAB Systems Biology Applications Heart Wall Tracking Lukasz Szafaryn,...

Documents

Lukasz Wojtow lukasz@wojtow.me Genotick.com …genotick.com/static/download/NovelApproachToArtificialIntelligence...Novel Approach to Artificial Intelligence Trading Lukasz Wojtow

Jeremy W. Sheaffer 1 David P. Luebke 2 Kevin Skadron 1

Lukasz Zelezny - Search Marketing Day 2014 - Warszawa Wasaw

Lukasz Kujawski, Oracle @ SalesMixer 18.03.2014

Experiences with Achieving Portability across Heterogeneous Architectures Lukasz G. Szafaryn +, Todd Gamblin ++, Bronis R. de Supinski ++ and Kevin Skadron

Look AHEAD Study Lukasz Materek Endocrinology Rounds May 20, 2012

Task 2 - blog8928.files.wordpress.com · Web viewTask 2. Unit 17. Gorniak, Lukasz. Task 2. Unit 17. Gorniak, Lukasz. Task 2. Unit 17. Gorniak, Lukasz. Contents. Rain forest introduction1

© 2009, Kevin Skadron Making Sense of Recent Research in Temperature-Aware Design Kevin Skadron Univ. of Virginia LAVA Lab / HotSpot Group

2013.04.23 eZ Sessions 6 - Migrating legacy eZ Publish extensions - Lukasz Serwatka

Presentation 2. Urban Mobilty Planning PL, Lukasz Franek.ppt · Microsoft PowerPoint - Presentation 2. Urban Mobilty Planning PL, Lukasz Franek.ppt [Compatibility Mode] Author: Rob.Jeuring

Lukasz Zelezny #SEOBarCamp July 2013

Social Media Strategy: Tips by Lukasz Zelezny

MP3 Stegonography Lukasz Grzegorz Maciak Micheal Alexis Ponniah Renu Sharma

LUKASZ&RAFAL MID_TERM PRESENTATION

Accelerating SQL Database Operations on a GPU …skadron/Papers/bakkum_sqlite_gpgpu10.pdfAccelerating SQL Database Operations on a ... the need for database programmers to use new

Computer Science Graduate School Presented by Kevin Skadron

Bet, Call, Check your SEO activity - Lukasz Zelezny #SiGMA2014 Malta

Conversion 2016 - Lukasz Zelezny - Effective Visitors: converting visitors to clients

Disk and-cover-lukasz

Lukasz Gogolewski | Business growth strategy