25
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine- Grained Multiprocessing for Portable Streaming Applications Jani Boutellier 1 , Alessandro Cevrero 2 , Philip Brisk 2 , Paolo Ienne 2 1 University of Oulu (FI) 2 EPFL, Lausanne (CH)

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

Embed Size (px)

DESCRIPTION

MACHINE VISION GROUP, JANI BOUTELLIER, Fine-Grained Acceleration Accelerators can be made fine-grained

Citation preview

Page 1: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Architectural Support for the Orchestration of Fine-Grained Multiprocessing for Portable

Streaming Applications

Jani Boutellier1, Alessandro Cevrero2,Philip Brisk2, Paolo Ienne2

1University of Oulu (FI) 2EPFL, Lausanne (CH)

Page 2: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

The Context of this Work

• The context of this work is multiprocessing embedded systems

• The systems’ processing elements (PEs) are application specific and heterogeneous

• We propose a circuit for low-overhead hardware-assisted scheduling and dispatching of PEs

• Solution suitable for data-dominated signal processing applications

Page 3: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Fine-Grained Acceleration

• Accelerators can be made fine-grained

Page 4: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Fine-Grained Acceleration

• Accelerators can be made fine-grained

• Improves accelerator utilization

Page 5: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Fine-Grained Acceleration

• Accelerators can be made fine-grained• Improves accelerator utilization• Allows HW use across applications

Discussed in Silvén et al. (2005)

Page 6: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Fine-Grained Acceleration

• Static accelerator invocation schedules are ok, only when the applications use accelerators in a regular pattern

Page 7: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Fine-Grained Acceleration

• Static accelerator invocation schedules are ok, only when the applications use accelerators in a regular pattern

• Unfortunately, modern signal processing uses adaptive coding

Parser

Intra block

Inter block

Screen

codedbitstream

Page 8: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Fine-Grained Acceleration• End-to-end < 10 μs over 100k

iterations / s

Page 9: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Fine-Grained Acceleration

• For each iteration, a different set of functions can be used

• End-to-end < 10 μs over 100k iterations / s

Acc. 1

Acc. 2

Acc. 3

Acc. 4

Acc. 5

Acc. 6

Page 10: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Fine-Grained Acceleration

• Available time for switching accelerator invocation schedule is really short

• End-to-end < 10 μs over 100k iterations / s

Acc. 1

Acc. 2

Acc. 3

Acc. 4

Acc. 5

Acc. 6

Page 11: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Quasi-static scheduling

• Quasi-static scheduling is a midway between dynamic and static scheduling

Page 12: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Quasi-static scheduling

• Quasi-static scheduling is a midway between dynamic and static scheduling

• Applicable when application consists of sequential, static parts

Page 13: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Quasi-static scheduling

• Quasi-static scheduling is a midway between dynamic and static scheduling

• Applicable when application consists of sequential, static parts

• Minimizes run-time computations

FlexibilityHigh

Static scheduling

Dynamic scheduling

Quasi-static scheduling

OverheadLow

Page 14: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

BA

CD

time

proc. 1

proc. 2

proc. 3

Schedule part repositoryApplication:

Quasi-static scheduling

Page 15: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

BA

CD

timeA3

proc. 1

proc. 2

proc. 3

Schedule part repository

12

31

3123

B:

C:

D:A2

A1

Application:

Quasi-static scheduling

Page 16: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

BA

CD

timeA3

proc. 1

proc. 2

proc. 3

Schedule part repository

B1

B2

B3

1

3123

B:

C:

D:A2

A1

Application:

Quasi-static scheduling

Page 17: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

BA

CD

timeA3

proc. 1

proc. 2

proc. 3

Schedule part repository

B1

B2

B3

1

3D1

D2

D3

B:

C:

D:A2

A1

Application:

Quasi-static scheduling

Page 18: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Proposed solution

• In this work we propose a dedicated circuit for quasi-static scheduling

• because quasi-static scheduling of fine-grained accelerators is not feasible with a software scheduler *

* Boutellier et al. (2009) Journal of Signal Processing Systems

Page 19: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Proposed solution

Page 20: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Proposed solution

• In this work we propose a dedicated circuit for quasi-static scheduling

• Appends a new schedule part in 3 clock cycles

• Performs dispatching independently• Area is 3300 gates when

– supporting 4 accelerators– 13 alternative schedule parts

• schedule parts stored in the memory of the circuit

Page 21: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Experiments

1. MPEG-4 SP video decoding

Page 22: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Experiments

2. Fine-grain accelerator scheduling

Page 23: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Experiments

• Both experiments were performed on an Altera Cyclone III FPGA

• The CPU and accelerators were Nios II processors

• Experiment 1 performed decoding of real video

• In Experiment 2 the accelerators just moved data around

Page 24: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Results

Static schedule

Quasi-static sch.

Experiment 1 141 Mcycles 47 McyclesExperiment 2 1.13 Mcycles 0.78 Mcycles

Our circuit enables quasi-static multiprocessor scheduling with a negligible overhead, as it is not feasible in software

Page 25: MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009 Architectural Support for the Orchestration of Fine-Grained…

MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009

Thank you for your attention.Questions?