45
OrthrusPE: Runtime Reconfigurable Processing Elements for Binary Neural Networks Nael Fasfous Technical University of Munich Department of Electrical and Computer Engineering Chair of Integrated Systems

OrthrusPE: Runtime Reconfigurable Processing Elements for

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: OrthrusPE: Runtime Reconfigurable Processing Elements for

OrthrusPE: Runtime Reconfigurable Processing Elements

for Binary Neural Networks

Nael Fasfous

Technical University of Munich

Department of Electrical and Computer Engineering

Chair of Integrated Systems

Page 2: OrthrusPE: Runtime Reconfigurable Processing Elements for

• Optimization of Convolutional Neural Networks

• Challenges of Binary Neural Networks

• Motivation for Runtime Reconfigurable BNN Processing Elements

• OrthrusPE: Dual Modes

• Experimentation and Results

Outline

Nael Fasfous | Chair of Integrated Systems | TUM

Page 3: OrthrusPE: Runtime Reconfigurable Processing Elements for

Optimization of Convolutional Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Structural Algorithmic Hardware

Page 4: OrthrusPE: Runtime Reconfigurable Processing Elements for

Optimization of Convolutional Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Structural Algorithmic Hardware

Low-Rank

Decomp.Pruning Quantization Winograd FFT GEMM Vectorization PartitioningDataflow

Page 5: OrthrusPE: Runtime Reconfigurable Processing Elements for

Optimization of Convolutional Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Structural Algorithmic Hardware

Low-Rank

Decomp.Pruning Quantization Winograd FFT GEMM Vectorization

Bypass

PartitioningDataflow

Tucker

Decomp.

CP

Decomp.

Channel-wise

Filter-wise

Element-wise

Winograd

Sparsity

Spectral

Sparsity

Filter

Sharing

Floating

PointFixed Point

Binarization

Pow. 2

Log Base

Weight

Sharing

Array

MappingIn Out

IFmap Filter

IO OW

IW

OS IS

RS WS

Page 6: OrthrusPE: Runtime Reconfigurable Processing Elements for

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

• Binary Weights

• Binary Activations

• Binary Weights AND Activations

Page 7: OrthrusPE: Runtime Reconfigurable Processing Elements for

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

• Binary Weights

• Binary Activations

• Binary Weights AND Activations Replace Multiplications by XNOR ops

Replace Accumulations by Popcount ops

Page 8: OrthrusPE: Runtime Reconfigurable Processing Elements for

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

• Binary Weights

• Binary Activations

• Binary Weights AND Activations Replace Multiplications by XNOR ops

Replace Accumulations by Popcount ops

Reduce Memory requirements (1/16 of FP-16)

Page 9: OrthrusPE: Runtime Reconfigurable Processing Elements for

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

• Binary Weights

• Binary Activations

• Binary Weights AND Activations Replace Multiplications by XNOR ops

Replace Accumulations by Popcount ops

Severe Information Loss

Reduce Memory requirements (1/16 of FP-16)

Page 10: OrthrusPE: Runtime Reconfigurable Processing Elements for

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

• Binary Weights

• Binary Activations

• Binary Weights AND Activations Replace Multiplications by XNOR ops

Replace Accumulations by Popcount ops

Severe Information Loss

MNIST, CIFAR-10, SVHN

ImageNet

Reduce Memory requirements (1/16 of FP-16)

Page 11: OrthrusPE: Runtime Reconfigurable Processing Elements for

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Naïve Binarization

1 -2 4

10 0.1 3

-5 7 -3

𝑠𝑖𝑔𝑛 𝑥

1 0 1

1 1 1

0 1 0

Severe information loss:

10 and 0.1 have the same effect on the

network.

𝑠𝑖𝑔𝑛 𝑥 = 0, 𝑥 < 01, 𝑥 ≥ 0

Page 12: OrthrusPE: Runtime Reconfigurable Processing Elements for

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

𝑠𝑖𝑔𝑛 𝑥 = 0, 𝑥 < 01, 𝑥 ≥ 0

𝑠𝑖𝑔𝑛 𝑥 − 50 0 0

1 0 0

0 1 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

𝑠𝑖𝑔𝑛 𝑥 − 3

𝑠𝑖𝑔𝑛 𝑥 − 4

Page 13: OrthrusPE: Runtime Reconfigurable Processing Elements for

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

𝑠𝑖𝑔𝑛 𝑥 − 50 0 0

1 0 0

0 1 0

𝑠𝑖𝑔𝑛 𝑥 = 0, 𝑥 < 01, 𝑥 ≥ 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

𝑠𝑖𝑔𝑛 𝑥 − 3

𝑠𝑖𝑔𝑛 𝑥 − 4

Captured Information:

0.1 is less positive than 10

Page 14: OrthrusPE: Runtime Reconfigurable Processing Elements for

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

𝑠𝑖𝑔𝑛 𝑥 − 50 0 0

1 0 0

0 1 0

𝑠𝑖𝑔𝑛 𝑥 = 0, 𝑥 < 01, 𝑥 ≥ 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

𝑠𝑖𝑔𝑛 𝑥 − 3

𝑠𝑖𝑔𝑛 𝑥 − 4

Captured Information:

0.1 is less positive than 10

4 and 3 lie between 10 and 0.1

Page 15: OrthrusPE: Runtime Reconfigurable Processing Elements for

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

𝑠𝑖𝑔𝑛 𝑥 − 50 0 0

1 0 0

0 1 0

𝑠𝑖𝑔𝑛 𝑥 = 0, 𝑥 < 01, 𝑥 ≥ 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

𝑠𝑖𝑔𝑛 𝑥 − 3

𝑠𝑖𝑔𝑛 𝑥 − 4

Captured Information:

0.1 is less positive than 10

4 and 3 lie between 10 and 0.1

4 and 3 are single unit away from

each other

Page 16: OrthrusPE: Runtime Reconfigurable Processing Elements for

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

𝑠𝑖𝑔𝑛 𝑥 − 50 0 0

1 0 0

0 1 0

𝑠𝑖𝑔𝑛 𝑥 = 0, 𝑥 < 01, 𝑥 ≥ 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

𝑠𝑖𝑔𝑛 𝑥 − 3

𝑠𝑖𝑔𝑛 𝑥 − 4

Captured Information:

0.1 is less positive than 10

4 and 3 lie between 10 and 0.1

4 and 3 are single unit away from

each other

Information can be extracted for

negative numbers, e.g. 𝑠𝑖𝑔𝑛 𝑥 + 3

Page 17: OrthrusPE: Runtime Reconfigurable Processing Elements for

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

𝑠𝑖𝑔𝑛 𝑥 − 50 0 0

1 0 0

0 1 0

𝑠𝑖𝑔𝑛 𝑥 = 0, 𝑥 < 01, 𝑥 ≥ 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

𝑠𝑖𝑔𝑛 𝑥 − 3

𝑠𝑖𝑔𝑛 𝑥 − 4

𝑏𝑖𝑛𝐶𝑜𝑛𝑣

𝑏𝑖𝑛𝐶𝑜𝑛𝑣

𝑏𝑖𝑛𝐶𝑜𝑛𝑣

𝐶𝑜𝑛𝑣

× 𝛼1

× 𝛼2

× 𝛼3

[1] X. Lin et al., “Towards accurate binary convolutional neural network,” NIPS, 2017.

Page 18: OrthrusPE: Runtime Reconfigurable Processing Elements for

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

𝑠𝑖𝑔𝑛 𝑥 − 50 0 0

1 0 0

0 1 0

𝑠𝑖𝑔𝑛 𝑥 = 0, 𝑥 < 01, 𝑥 ≥ 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

𝑠𝑖𝑔𝑛 𝑥 − 3

𝑠𝑖𝑔𝑛 𝑥 − 4

𝑏𝑖𝑛𝐶𝑜𝑛𝑣

𝑏𝑖𝑛𝐶𝑜𝑛𝑣

𝑏𝑖𝑛𝐶𝑜𝑛𝑣

𝐶𝑜𝑛𝑣

× 𝛼1

× 𝛼2

× 𝛼3

α: learned during

training to induce

further diversity in

information.

[1] X. Lin et al., “Towards accurate binary convolutional neural network,” NIPS, 2017.

Page 19: OrthrusPE: Runtime Reconfigurable Processing Elements for

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

𝑠𝑖𝑔𝑛 𝑥 − 50 0 0

1 0 0

0 1 0

𝑠𝑖𝑔𝑛 𝑥 = 0, 𝑥 < 01, 𝑥 ≥ 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

𝑠𝑖𝑔𝑛 𝑥 − 3

𝑠𝑖𝑔𝑛 𝑥 − 4

𝑏𝑖𝑛𝐶𝑜𝑛𝑣

𝑏𝑖𝑛𝐶𝑜𝑛𝑣

𝑏𝑖𝑛𝐶𝑜𝑛𝑣

𝐶𝑜𝑛𝑣

× 𝛼1

× 𝛼2

× 𝛼3

α: learned during

training to induce

further diversity in

information.

BNNs with only 4-5%

accuracy degradation

from full-precision

CNN on ImageNet [1].

[1] X. Lin et al., “Towards accurate binary convolutional neural network,” NIPS, 2017.

Page 20: OrthrusPE: Runtime Reconfigurable Processing Elements for

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

K Ci

Xi

Yi

CiK Ci

𝐴𝑙 = 𝐶𝑜𝑛𝑣(𝑊𝑙 , 𝐴𝑙−1)

Co

𝑊𝑙 ∈ ℝ𝐾×𝐾×𝐶𝑖×𝐶𝑜𝐴𝑙−1 ∈ ℝ𝑋𝑖×𝑌𝑖×𝐶𝑖

Page 21: OrthrusPE: Runtime Reconfigurable Processing Elements for

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

K Ci

Xi

Yi

CiK Ci

𝐵𝑙 ∈ 𝔹𝐾×𝐾×𝐶𝑖×𝑀×𝐶𝑜𝐻𝑙−1 ∈ 𝔹𝑋𝑖×𝑌𝑖×𝐶𝑖×𝑁

𝔹 = {0,1}

𝐴𝑙 = 𝐶𝑜𝑛𝑣(𝑊𝑙 , 𝐴𝑙−1)

Co

Page 22: OrthrusPE: Runtime Reconfigurable Processing Elements for

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

𝐴𝑙 = 𝐶𝑜𝑛𝑣(𝑊𝑙 , 𝐴𝑙−1)

K Ci

Xi

Yi

N

M

Ci

K Ci

M

Co

𝐵𝑙 ∈ 𝔹𝐾×𝐾×𝐶𝑖×𝑀×𝐶𝑜𝐻𝑙−1 ∈ 𝔹𝑋𝑖×𝑌𝑖×𝐶𝑖×𝑁

𝔹 = {0,1}

Page 23: OrthrusPE: Runtime Reconfigurable Processing Elements for

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

K Ci

Xi

Yi

N

M

Ci

K Ci

M

Co

𝐵𝑙 ∈ 𝔹𝐾×𝐾×𝐶𝑖×𝑀×𝐶𝑜𝐻𝑙−1 ∈ 𝔹𝑋𝑖×𝑌𝑖×𝐶𝑖×𝑁

𝔹 = {0,1}

𝐴𝑙 =

𝑚=1

𝑀

𝑛=1

𝑁

𝛼𝑚𝛽𝑛𝐵𝑖𝑛𝐶𝑜𝑛𝑣(𝐵𝑚𝑙 , 𝐻𝑛

𝑙−1)

Page 24: OrthrusPE: Runtime Reconfigurable Processing Elements for

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

𝔹 = {0,1}

𝐴𝑙 =

𝑚=1

𝑀

𝑛=1

𝑁

𝛼𝑚𝛽𝑛𝐵𝑖𝑛𝐶𝑜𝑛𝑣(𝐵𝑚𝑙 , 𝐻𝑛

𝑙−1)

K Ci

Xi

Yi

N

M

Ci

K Ci

M

Co

𝐵𝑚𝑙 ∈ 𝔹𝐾×𝐾×𝐶𝑖×𝐶𝑜𝐻𝑛

𝑙−1 ∈ 𝔹𝑋𝑖×𝑌𝑖×𝐶𝑖

Page 25: OrthrusPE: Runtime Reconfigurable Processing Elements for

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

𝔹 = {0,1}

𝑎𝑚,𝑛 =

𝑐𝑖=1

𝐶𝑖

(𝑝𝑚,𝑛,𝑐𝑖)

𝐴𝑙 =

𝑚=1

𝑀

𝑛=1

𝑁

𝛼𝑚𝛽𝑛𝐵𝑖𝑛𝐶𝑜𝑛𝑣(𝐵𝑚𝑙 , 𝐻𝑛

𝑙−1)

K Ci

Xi

Yi

N

M

Ci

K Ci

M

Co

𝐵𝑚𝑙 ∈ 𝔹𝐾×𝐾×𝐶𝑖×𝐶𝑜𝐻𝑛

𝑙−1 ∈ 𝔹𝑋𝑖×𝑌𝑖×𝐶𝑖

Page 26: OrthrusPE: Runtime Reconfigurable Processing Elements for

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

K Ci

Xi

Yi

N

M

Ci

K Ci

M

Co

𝔹 = {0,1}

𝐴𝑙 =

𝑚=1

𝑀

𝑛=1

𝑁

𝛼𝑚𝛽𝑛𝐵𝑖𝑛𝐶𝑜𝑛𝑣(𝐵𝑚𝑙 , 𝐻𝑛

𝑙−1)

𝐵𝑚𝑙 ∈ 𝔹𝐾×𝐾×𝐶𝑖×𝐶𝑜𝐻𝑛

𝑙−1 ∈ 𝔹𝑋𝑖×𝑌𝑖×𝐶𝑖

𝑝𝑚,𝑛,𝑐𝑖 =

𝑘𝑥=1

𝐾

𝑘𝑦=1

𝐾

𝑥𝑛𝑜𝑟 𝑏𝑘𝑥, 𝑘𝑦 , ℎ𝑥𝑖+𝑘𝑥, 𝑦𝑖+𝑘𝑦

𝑎𝑚,𝑛 =

𝑐𝑖=1

𝐶𝑖

(𝑝𝑚,𝑛,𝑐𝑖)

Page 27: OrthrusPE: Runtime Reconfigurable Processing Elements for

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

K Ci

Xi

Yi

N

M

Ci

K Ci

M

Co

𝔹 = {0,1}

𝐴𝑙 =

𝑚=1

𝑀

𝑛=1

𝑁

𝛼𝑚𝛽𝑛𝐵𝑖𝑛𝐶𝑜𝑛𝑣(𝐵𝑚𝑙 , 𝐻𝑛

𝑙−1)

𝐵𝑚𝑙 ∈ 𝔹𝐾×𝐾×𝐶𝑖×𝐶𝑜𝐻𝑛

𝑙−1 ∈ 𝔹𝑋𝑖×𝑌𝑖×𝐶𝑖

𝑎𝑚,𝑛 =

𝑐𝑖=1

𝐶𝑖

(𝑝𝑚,𝑛,𝑐𝑖)

𝑝𝑚,𝑛,𝑐𝑖 = 𝑝𝑜𝑝𝑐𝑛𝑡 (𝑥𝑛𝑜𝑟 𝑏𝑘𝑥, 𝑘𝑦 , ℎ𝑥𝑖+𝑘𝑥, 𝑦𝑖+𝑘𝑦 )

Page 28: OrthrusPE: Runtime Reconfigurable Processing Elements for

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

K Ci

Xi

Yi

N

M

Ci

K Ci

M

Co

𝔹 = {0,1}

𝐴𝑙 =

𝑚=1

𝑀

𝑛=1

𝑁

𝛼𝑚𝛽𝑛𝐵𝑖𝑛𝐶𝑜𝑛𝑣(𝐵𝑚𝑙 , 𝐻𝑛

𝑙−1)

𝑎𝑚,𝑛 =

𝑐𝑖=1

𝐶𝑖

(𝑝𝑚,𝑛,𝑐𝑖)

𝐵𝑚𝑙 ∈ 𝔹𝐾×𝐾×𝐶𝑖×𝐶𝑜𝐻𝑛

𝑙−1 ∈ 𝔹𝑋𝑖×𝑌𝑖×𝐶𝑖

ℎ𝑙−1 𝑏𝑙

𝑝𝑚,𝑛,𝑐𝑖 = 𝑝𝑜𝑝𝑐𝑛𝑡 (𝑥𝑛𝑜𝑟 𝑏𝑘𝑥, 𝑘𝑦 , ℎ𝑥𝑖+𝑘𝑥, 𝑦𝑖+𝑘𝑦 )

Page 29: OrthrusPE: Runtime Reconfigurable Processing Elements for

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

K Ci

Xi

Yi

N

M

Ci

K Ci

M

Co

𝔹 = {0,1}

𝐴𝑙 =

𝑚=1

𝑀

𝑛=1

𝑁

𝛼𝑚𝛽𝑛𝐵𝑖𝑛𝐶𝑜𝑛𝑣(𝐵𝑚𝑙 , 𝐻𝑛

𝑙−1)

𝑎𝑚,𝑛 =

𝑐𝑖=1

𝐶𝑖

(𝑝𝑚,𝑛,𝑐𝑖)

𝐵𝑚𝑙 ∈ 𝔹𝐾×𝐾×𝐶𝑖×𝐶𝑜𝐻𝑛

𝑙−1 ∈ 𝔹𝑋𝑖×𝑌𝑖×𝐶𝑖

ℎ𝑙−1 𝑏𝑙

𝑝𝑚,𝑛,𝑐𝑖 = 𝑝𝑜𝑝𝑐𝑛𝑡 (𝑥𝑛𝑜𝑟 𝑏𝑘𝑥, 𝑘𝑦 , ℎ𝑥𝑖+𝑘𝑥, 𝑦𝑖+𝑘𝑦 )

Binary

Fixed-point

Page 30: OrthrusPE: Runtime Reconfigurable Processing Elements for

More FP in Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

• Binary Weight and Activation Bases (Scale/Shift)

• First Layer Remains Non-Binarized

• Batch Normalization

Page 31: OrthrusPE: Runtime Reconfigurable Processing Elements for

More FP in Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

• Binary Weight and Activation Bases (Scale/Shift)

• First Layer Remains Non-Binarized

• Batch Normalization

Motivation for Reconfigurable Processing Elements

Fixed-Point Ops Binary Ops

Page 32: OrthrusPE: Runtime Reconfigurable Processing Elements for

OrthrusPE: Runtime Reconfigurable PEs for BNNs

Nael Fasfous | Chair of Integrated Systems | TUM

Dual Modes

• Fixed-precision mode: First layer, Batch-norm, Scale and Shift Operations

• Binary mode: SIMD Binary Hadamard Products and Popcounts

• Achieved with high resource reuse

Page 33: OrthrusPE: Runtime Reconfigurable Processing Elements for

OrthrusPE: Binary Mode

Nael Fasfous | Chair of Integrated Systems | TUM

Efficient SIMD Binary Hadamard Product Execution

1 1 1 1 1 1 0 X

0 0 0 0 0 0 0 X

1 0 1 0 1 0 0 X

X X X X X X X X

1 0 1

1 0 1

1 0 1

Input: hl-1 ⊂ Hnl-1

Kernel: bl ⊂ Bml

Page 34: OrthrusPE: Runtime Reconfigurable Processing Elements for

OrthrusPE: Binary Mode

Nael Fasfous | Chair of Integrated Systems | TUM

Efficient SIMD Binary Hadamard Product Execution

1 1 1 1 1 1 0 X

0 0 0 0 0 0 0 X

1 0 1 0 1 0 0 X

X X X X X X X X

1 0 1

1 0 1

1 0 1

Input: hl-1 ⊂ Hnl-1

Kernel: bl ⊂ Bml

Sig A

Sig B

1 1 1

0 0 0

1 0 1

1 1 1

0 0 0

0 1 0

1 1 1

0 0 0

1 0 1

1

0

0

1 1

0 0

1 0

1 1 0

0 0 0

1 0 0

X

X

X

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

X

X

X

Sig C

Page 35: OrthrusPE: Runtime Reconfigurable Processing Elements for

OrthrusPE: Binary Mode

Nael Fasfous | Chair of Integrated Systems | TUM

Efficient SIMD Binary Hadamard Product Execution

1 1 1 1 1 1 0 X

0 0 0 0 0 0 0 X

1 0 1 0 1 0 0 X

X X X X X X X X

1 0 1

1 0 1

1 0 1

Input: hl-1 ⊂ Hnl-1

Kernel: bl ⊂ Bml

DSP

Sig P

Sig A

Sig B

1 1 1

0 0 0

1 0 1

1 1 1

0 0 0

0 1 0

1 1 1

0 0 0

1 0 1

1

0

0

1 1

0 0

1 0

1 1 0

0 0 0

1 0 0

X

X

X

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

X

X

X

Sig C

Page 36: OrthrusPE: Runtime Reconfigurable Processing Elements for

OrthrusPE: Dual Modes

Nael Fasfous | Chair of Integrated Systems | TUM

Runtime Reconfigurability

Binary Mode

Fixed-Precision Mode

Reconfiguration Signals

Using the same

hardware resource for

two distinct, critical

BNN operations

Page 37: OrthrusPE: Runtime Reconfigurable Processing Elements for

OrthrusPE: Dual Modes

Nael Fasfous | Chair of Integrated Systems | TUM

Runtime Reconfigurability

Binary Mode

Fixed-Precision Mode

Reconfiguration Signals

Using the same

hardware resource for

two distinct, critical

BNN operations

Page 38: OrthrusPE: Runtime Reconfigurable Processing Elements for

OrthrusPE: Dual Modes

Nael Fasfous | Chair of Integrated Systems | TUM

Runtime Reconfigurability

Binary Mode

Fixed-Precision Mode

Reconfiguration Signals

Using the same

hardware resource for

two distinct, critical

BNN operations

Page 39: OrthrusPE: Runtime Reconfigurable Processing Elements for

OrthrusPE: Dual Modes

Nael Fasfous | Chair of Integrated Systems | TUM

Runtime Reconfigurability

Binary Mode

Fixed-Precision Mode

Reconfiguration Signals

Using the same

hardware resource for

two distinct, critical

BNN operations

Page 40: OrthrusPE: Runtime Reconfigurable Processing Elements for

Experimentation and Evaluation

Nael Fasfous | Chair of Integrated Systems | TUM

Synthesized Four Throughput-Equivalent Configurations:

• OrthrusPE

• OrthrusPE-DS (Dual-Static): SIMD Binary Hadamard Products on Static DSP

• Hybrid (Common): Binary operations on LUTs, FP operations on DSP

• All-LUT: Execution restricted to LUTs

Page 41: OrthrusPE: Runtime Reconfigurable Processing Elements for

Experimentation and Evaluation

Nael Fasfous | Chair of Integrated Systems | TUM

Resource Utilization

• OrthrusPE and OrthrusPE-DS are more

resource efficient across all target

accelerator frequencies.

Page 42: OrthrusPE: Runtime Reconfigurable Processing Elements for

Experimentation and Evaluation

Nael Fasfous | Chair of Integrated Systems | TUM

Resource Utilization

• OrthrusPE’s closest FINN configuration

@200MHz

• 16 Extra Bit Accumulations

• 3 MACs (through reconfigurability)

• 32% fewer LUTs

Page 43: OrthrusPE: Runtime Reconfigurable Processing Elements for

Experimentation and Evaluation

Nael Fasfous | Chair of Integrated Systems | TUM

Dynamic Power Estimation

• OrthrusPE more efficient across all

frequencies

• Results scale as accelerators use 100-

1000s of PEs

Page 44: OrthrusPE: Runtime Reconfigurable Processing Elements for

Conclusion and Future Work

Nael Fasfous | Chair of Integrated Systems | TUM

• Accurate BNNs cannot be achieved without fixed-point operations and reliance on

DSP blocks.

• OrthrusPE improves the efficiency of computation by executing both fixed-point and

binary ops on FPGA hard blocks.

• Accurate BNNs solve many of the computation and memory challenges for deep neural

network workloads on edge devices.

Page 45: OrthrusPE: Runtime Reconfigurable Processing Elements for

Nael Fasfous | Chair of Integrated Systems | TUM

Thank you for your attention