10
Parallel Programming of Heterogeneous Embedded Systems MTAPI and EMB² in Action Unrestricted © Siemens AG 2017

Parallel Programming of Heterogeneous Embedded Systems · plus C++ wrappers for convenient task management, • ship with ready-to-use plugins for OpenCL, CUDA, and distributed systems

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parallel Programming of Heterogeneous Embedded Systems · plus C++ wrappers for convenient task management, • ship with ready-to-use plugins for OpenCL, CUDA, and distributed systems

Parallel Programming of Heterogeneous

Embedded Systems

MTAPI and EMB² in Action

Unrestricted © Siemens AG 2017

Page 2: Parallel Programming of Heterogeneous Embedded Systems · plus C++ wrappers for convenient task management, • ship with ready-to-use plugins for OpenCL, CUDA, and distributed systems

Unrestricted © Siemens AG 2017

Page 2

Hardware Trends – From Single-Core Processors to

Heterogeneous Systems on a Chip

…Single-Core

1975-2005

Multicore

since 2005

Manycore

since 2008

Driven by

frequency scaling

Constrained by

power consumption

Heterogeneous systems

today

Driven by

high performance per Watt ratio

Constrained by

programming complexity

CPU

DSP

GPU

FPGA

H. Esmaeilzadeh et al., “Dark silicon and the end of multicore scaling”, International Symposium on Computer Architecture (ISCA). ACM, 2011.

M. Zahran, “Heterogeneous Computing Here to Stay”. ACM Queue, Nov/Dev 2016.

Page 3: Parallel Programming of Heterogeneous Embedded Systems · plus C++ wrappers for convenient task management, • ship with ready-to-use plugins for OpenCL, CUDA, and distributed systems

Unrestricted © Siemens AG 2017

Page 3

Continuing Growth of Heterogeneous Systems

“Heterogeneous systems provide an effective way of

responding to the ever-increasing demand for more

computing power. However, ensuring interoperability

and programmer productivity is a significant challenge.”

Low-power scalable heterogeneous architectures and their

programming belong to the top challenges until 2022.

https://www.hipeac.net/publications/vision/

https://www.computer.org/cms/ComputingNow/2022Report.pdf

Performance

Interoperability

Power

Programmability

DSP FPGA

CPU GPU

Page 4: Parallel Programming of Heterogeneous Embedded Systems · plus C++ wrappers for convenient task management, • ship with ready-to-use plugins for OpenCL, CUDA, and distributed systems

Unrestricted © Siemens AG 2017

Page 4

Multicore Task Management API (MTAPI)http://www.multicore-association.org/workgroup/mtapi.php

MTAPI in a nut shell

• Standardized API for task-parallel programming

on a wide range of hardware architectures

• Developed and driven by practitioners of

market-leading companies

• Part of the Multicore Association’s ecosystem

(MCAPI, MRAPI, SHIM, …)

Tasks Queues

Heterogeneous Systems

Shared memory

Distributed memory

Different instruction

set architectures

Working group lead

Page 5: Parallel Programming of Heterogeneous Embedded Systems · plus C++ wrappers for convenient task management, • ship with ready-to-use plugins for OpenCL, CUDA, and distributed systems

Unrestricted © Siemens AG 2017

Page 5

Embedded Multicore Building Blocks (EMB²)https://embb.io/

Key features and requirements:

• Based on MTAPI industry standard

• Easy parallelization of legacy code

• Support for real-time applications (task priorities /

affinities, lock-free data structures)

• Resource awareness and determinism (no

dynamic memory allocation during operation)

• Portability on a variety of hardware architectures

EMB² is an open source C/C++ library that provides generic building blocks for compute-intensive applications.

Hardware / operating system

Dataflow

Application

Containers

Task management (MTAPI)

Algorithms

Base library (abstraction layer)

Page 6: Parallel Programming of Heterogeneous Embedded Systems · plus C++ wrappers for convenient task management, • ship with ready-to-use plugins for OpenCL, CUDA, and distributed systems

Unrestricted © Siemens AG 2017

Page 6

Sample Application Areas

Simulation and

Augmented Reality

Efficiently combine the physical

world with virtual ones

Sensing, Imaging,

and Signal Processing

Process complex sensor data to

track the environment

In-field Data Analytics

and Internet of Things (IoT)

Analyze large amounts of data on

the devices in real-time

Images by Siemens

Page 7: Parallel Programming of Heterogeneous Embedded Systems · plus C++ wrappers for convenient task management, • ship with ready-to-use plugins for OpenCL, CUDA, and distributed systems

Unrestricted © Siemens AG 2017

Page 7

MTAPI Programming Model – Jobs, Actions, and Tasks

CPU GPU

Special HW

Task 1

Job A

Action IApplic

ation

Action II

Action III

Task 2

• Job: Piece of work with a unique identifier

• Action: Implementation of a job (hardware or software-defined)

• Task: Execution of a job with some data to be processed

Page 8: Parallel Programming of Heterogeneous Embedded Systems · plus C++ wrappers for convenient task management, • ship with ready-to-use plugins for OpenCL, CUDA, and distributed systems

Unrestricted © Siemens AG 2017

Page 8

Programming Heterogeneous Systems (Example)

Action I Action II Action I Action II

Pitcher's motion, Cincinnati Reds, 9/15/2004, by Rick Dikeman; Source: English Wikipedia; CC BY-SA 3.0

Apply a digital filter to a sequence of images utilizing CPU and GPU.

1. Create Job (job_filter)

2. Create Action I (CPU) associated with

job_filter

3. Create Action II (GPU) associated with

job_filter

4. Apply job_filter to each element of a

given range:

embb::algorithms::ForEach(

range.begin(), range.end(),

job_filter);

Alternatively, use dataflow networks for

processing continuous streams of data.

Page 9: Parallel Programming of Heterogeneous Embedded Systems · plus C++ wrappers for convenient task management, • ship with ready-to-use plugins for OpenCL, CUDA, and distributed systems

Unrestricted © Siemens AG 2017

Page 10

Summary and Conclusion

• Software developers will be expected to enable single

applications to exploit large number of cores that are

increasingly diverse.

• MTAPI specifies standardized interfaces for leveraging

the performance of such heterogeneous systems.

• The Embedded Multicore Building Blocks (EMB²)

• provide a fully compliant MTAPI implementation

plus C++ wrappers for convenient task management,

• ship with ready-to-use plugins for OpenCL, CUDA,

and distributed systems communicating over network,

• help to increase developer productivity through high-

level patterns and parallel algorithms,

• are specifically designed for embedded systems

and the typical requirements that accompany them.

http://www.multicore-association.org/

https://embb.io/

Page 10: Parallel Programming of Heterogeneous Embedded Systems · plus C++ wrappers for convenient task management, • ship with ready-to-use plugins for OpenCL, CUDA, and distributed systems

Unrestricted © Siemens AG 2017

Page 11

Contact

Dr. Tobias Schüle

Siemens Corporate Technology

Otto-Hahn-Ring 6

81739 Munich

Germany

[email protected]

Markus Levy, President

The Multicore Association

PO Box 4794

El Dorado Hills, CA 95762

USA

[email protected]