23
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory Accesses in HLS Johanna Rohde, Christian Hochberger Date: 7.9.2017 TU Darmstadt Computer Systems Group

Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1

Using GCC Analysis Techniques to Enable Parallel Memory Accesses in HLS

Johanna Rohde, Christian Hochberger

Date: 7.9.2017

TU Darmstadt

Computer Systems Group

Page 2: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 2

Outline

● Background

– SpartanMC Toolchain

– GCC

– Hardware Plugin

● Approach

– Scalar Evolution Analysis

– Memory Model

● Evaluation

● Conclusion

Page 3: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 3

BackgroundSpartanMC SoC Kit

● 18 Bit Soft-Core

– Tradeoff between 8 and 32 bit

– FPGA use 18 bit for BRAMs and DSPs

– 2 additional bit increase code density

● Graphical system configurator

● Peripheral components

● Simulator

● Supported FPGAs

– Xilinx: Spartan 3/6, 7er Series, …

– Altera: Cyclon 4

– Lattice: ECP 3/5

Interested?www.spartanmc.de

Page 4: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 4

BackgroundCompiler Toolchain

● GCC

– Support for 2-address-machines

– Wide range of optimizations● e.g. Polyhedral framework

– Widely used

– Currently using GCC 7.1

Page 5: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 5

GIMPLE

DFG

C - Frontend Objekt

Files

Language Specific

Pass-Manager

SSA Passes

RTL Passes

RTL Tree

DFG

Target Specific

Plugin

GIMPLE Pass

Intermediate Representation

BackgroundCompile-Flow

Page 6: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 6

BackgroundHardware Accelerator Plugin

● A specific SpartanMC design is generated for every application

● Automatically integrate hardware accelerators

– Statical code analysis at compile time

– Transparent to the user

– No annotations or further information needed

Page 7: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 7

BackgroundOptimizations

● Loop pipelining

– Overlapping iterations

– Problem: order of memory accesses

– Increases initiation interval

=

R

in_1

W

=

result_1

+

*value_1

2

=

R

in_1

W

=

result_1

+

*value_1

2

Iteration 1

Iteration 2

Page 8: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 8

ApproachChain of Recurrences

● Express evolution of a variable

● Basic Recurrence (BR)

● Tree of Recurrence (TREC)

f={φ 0 ,Θ , f 1}, Θ∈{+,∗}

Φ={Φa ,+,Φb} or Φ=c

f (i)={φ 0 ,+ , f 1}(i)=φ 0+∑j=0

i−1

f 1( j )

f (i)={φ 0 ,∗ , f 1}( i)=φ 0+∏j=0

i−1

f 1( j)

for 0≤ i<N

Page 9: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 9

ApproachScalar Evolution Analysis

● Implemented in the GCC

● Calculates a TREC for every expression

● Example:

i →{0,+ ,1}1

i∗M →{0,+ ,1}1∗M ={0,+, M }1

i∗M+ j →{{0,+, M }1,+ ,1}2

for(int i = 0; i < N; i++) { \\loop 1 for(int j = 0; j < M; j++{ \\loop 2 A[i*M+j] = ….; }}

Page 10: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 10

ApproachMemory Access Model

● Anatomy of a (regular) memory access

– innermost_loop_behavior struct● tree base_address● tree offset → always 0● tree init● tree step

– A = {(base + offset + init), +, step}

● Pair

– Dependency between● Read + Write ● Write + Write

Page 11: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 11

ApproachMemory Access Model

● Static Pairs

– Analyzable at compile time

– Base address equals

– Step size equals

● Dynamic Pairs

– Analyzable at runtime

– Step size equals

● Critical Pairs

– Not analyzable

Δ base+Δ init +n∗step=0

Page 12: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 12

EvaluationBenchmarks

● 11 benchmarks with 23 loops

● Different areas of application

– Cryptography

– Image processing

– Signal processing

AES

Dijkstra

Binary Tree

Haar-wavelet

Base64

Fletcher

BilinearFilter

GrayscaleFilter

BitReverse

IIR JPEG

Page 13: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 13

EvaluationCoverage

Page 14: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 14

EvaluationClassification

Page 15: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 15

EvaluationFirst-Last Dependencies

=

R

in_1

W

=

result_1

+

*value_1

2

=

R

in_1

W

=

result_1

+

*value_1

2

Iteration 1

Iteration 2

Page 16: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 16

EvaluationMemory Carried Dependencies

=

R

in_1

=

result_1

W

+

value_1

=

R

in_2

=

result_2

W

+

value_1

Iteration 1

Page 17: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 17

EvaluationStep Size

● Condition for static/dynamic classification is equal step size

– Distance is constant

● Special case: Distance is increasing

– Asml = {(basesml + offsetsml + initsml), +, stepsml}

– Alrg = {(baselrg + offsetlrg + initlrg), +, steplrg}

– Runtime condition:● Alrg(n) – Asml(0) > 0

Page 18: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 18

EvaluationStep Size

● General

– 9% → 4% critical pairs

● First-Last Dependencies

– 32% → 9% critical pairs

● Memory Carried Dependencies

– 6% → 3% critical pairs

Page 19: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 19

Conclusion

● Over 90% of all dependencies can be considered for decoupling

– 91% of first-last dependencies

– 97% of memory carried dependencies

● Analysis is already implemented

● Future work

– Implementation of static/dynamic decoupling

Page 20: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 20

Thank You!

Page 21: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 22

GIMPLE

DFG

C - Frontend Objekt

Files

Language Specific

Pass-Manager

SSA Passes

RTL Passes

RTL Tree

DFG

Target SpecificIntermediate Representation

BackgroundCompile-Flow

• Dead Code Elimination• Forward Propagation• Dead Store Elimination• Redundancy Elimination• Loop Optimization• Constant Propagation• ...

Page 22: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 23

Benchmarks

● 17 benchmarks with 43 loops

● Different areas of application

– Cryptography

– Image processing

– Signal processing

AES Base64Bit

ReverseBilinear

filterBinary

TreeCRC

Dijkstra EuclidHaar-

waveletFletcherGrayscale

Filter IDCT

IIR JPEG RSAMandel-

brotMatrixMult

Page 23: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory

07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 24

Benchmarks

● 20 Loops excluded

– No loop pipelining

– No store operation

– Single store operation

AES Base64Bit

ReverseBilinear

filterBinary

TreeCRC

Dijkstra EuclidHaar-

waveletFletcherGrayscale

Filter IDCT

IIR JPEG RSAMandel-

brotMatrixMult