24
Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge devices Martin Croome, VP Business Development, GreenWaves Technologies 1 RISC-V Day in Shanghai, 30 June 2018

Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery

operated edge devices

Martin Croome, VP Business Development, GreenWaves Technologies

1RISC-V Day in Shanghai, 30 June 2018

Page 2: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

What this talk is about?

RISC-V Foundation 2

The IoT pipeNB-IoT, LTE-M, Sigfox,

LoRa, etc.

B/day to kB/dayBattery operated

sensors

8-bit, 160x120 @ 10 fps =4.6 Mbit/s

24-bit @ 50kHz = 1.2 Mbit/s

Linear PCM =1.4 Mbit/s

Market DemandRich sensor data

Keyword SpottingBeam formingSpeech pre-processing

Vibration analysisFault detection

Face detectionPresence detectionCountingEmotion detection

30 June 2017

Page 3: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

What this talk is about?

RISC-V Foundation 3

The IoT pipeNB-IoT, LTE-M, Sigfox,

LoRa, etc.

B/day to kB/dayBattery operated

sensors

8-bit, 160x120 @ 10 fps =4.6 Mbit/s

24-bit @ 50kHz = 1.2 Mbit/s

Linear PCM =1.4 Mbit/s

Market DemandRich sensor data

B/day to kB/day

CNNSVM

BayesianBoostingCepstral analysis

30 June 2017

Page 4: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

What this talk is about?

RISC-V Foundation 4

The IoT pipeNB-IoT, LTE-M, Sigfox,

LoRa, etc.

B/day to kB/dayBattery operated

sensors

8-bit, 160x120 @ 10 fps =4.6 Mbit/s

24-bit @ 50kHz = 1.2 Mbit/s

Linear PCM =1.4 Mbit/s

Market DemandRich sensor data

B/day to kB/day

CNNSVM

BayesianBoostingCepstral analysis

Issue: way more MIPS than an MCU can

deliver but needs to bewithin an MCU power

envelope ?

30 June 2017

Page 5: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

General Patterns for content understanding

RISC-V Foundation 5

• Extract descriptors from raw data• 2D: Corners, blobs, HOG, DOG, …• 1D: LPC coefficients, Cepstral coeffs, …

• Use descriptors to classify data among representative families• Machine learning (CNN, SVM, Boost), Bayesian, ….

Usually highly parallel

Also highly parallel30 June 2017

Page 6: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

GAP8: Ultra Low Power IoT Processor

RISC-V Foundation 6

Architecture efficiency• Extended RISC-V ISA• Low contention shared memory 8 +1 core

clustered architecture• Tight synchronization• CNN based pattern matching engine (HWCE)

Performance• up to 12GOPS• up to 0.4GOPS @ 1mW, • up to 40MOPS @ 300uW• 3 uWatt stand-by power

consumption

HW features• Smart IOs• Voltage regulator/DVFS • RTC• Secured execution

30 June 2017

Page 7: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

monitoring event qualification,protocol stack,system control

data analysis & classification

Smart I/Osvoltage regulator & RTCSRAM in retentive mode

extended RISC-V extended RISC-Vefficient 8 core parallelization

HW synchronizationshared instruction cache

CNN HW engine

Quasi stand-by Low computing power High computing power

uWs mWs 10 to 50 mWs

primary energy consumption primary energy consumption

GAP8 hierarchical power architecture

7RISC-V Foundation30 June 2017

Page 8: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

GAP8: Open Source Origin

RISC-V Foundation 8

GAP8Best in class Instruction Set Architecture (ISA)UC Berkeley originated

Open Source Computing Platformcreated by ETHZ and UniBo

Engineered as Ultra-low power IoT Application Processor

30 June 2017

Page 9: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

9RISC-V Foundation

SW development flowFC clock & voltage domain

Logarithmic Interconnect

Shared L1 Memory

Shared Instruction Cache

Cor

e 0

Debug

ClusterDMA

H/WSYNC

Cor

e 1

Cor

e 7

Cor

e 6

Cor

e 5

Cor

e 4

Cor

e 3

Cor

e 2

HW

CE

MemoryL2

DebugPMU RTC

FabricController

L1

ROM

I$

LVDS

Serial I/Q

UART

SPI

I2C

I2S

CPI

HyperBus

GPIO / PWM

Mic

ro D

MA

Cluster clock & voltage domain

Identical cores – Single GCC/GDB toolchain(including support for extended ISA)

CNN graph translators(TF2GAP8, ONNX2GAP8 in development)

Code generators for common algorithms

(CNN layers, Matrix, FIR, FFT, HoG, MFCC, …)

GAP8 AutoTilerSeparates kernel parallelization / vectorization

and data flowAutomatic code generation for data flow

OpenMP or Native API

GAPUINO development board.

Classic MCU developmentPULP OS, ARM™ Mbed, FreeRTOS, Other OS’s in

developmentDrivers

Cluster APIs

Arm and Mbed are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere.

30 June 2017

Page 10: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

10RISC-V Foundation

Automated Memory Management

Basic KernelsHow to handle a parametric tile• Vectorization + Parallelization• No assumption on where actual data are located

User Kernels

Passing actual data to basic kernels and having data circulating between them• A multi dimensional iteration space (2D; 3D; 4D) and a

traversal order• Each argument is a sub space of the iteration space and

has actual dimensions, location (L2, external) and properties

• Given a memory budget the auto tiler “tiles” each argument and generates a fully pipelined implementation interleaving processing and data movements

• Basic Kernels are inserted at defined locations in the iteration space (prologue, body, epilog, …)

• Generated tiles are passed to Basic Kernels

Usually seen as libraries

Can be grouped and organized as generators

30 June 2017

Page 11: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

11RISC-V Foundation

Automated Memory Management

BasicKernelsUser KernelsGroup of User KernelsGenerators

C Programs, calls to Autotiler’s Model API

C Libraries

Autotiler Library

(Constraints Solver, C Code Generator)

Compile & Run on PC

C code for the target handling data movements and Basic Kernels dispatch on cluster’s cores

#include "AutoTilerLib.h"

#include "CNN_Generator.h"

void Mnist()

{

CNN_TiledConvNxNReLUPool2x2_SW_fp("Conv5x5RLMP_0", 5, 1, 32, 28, 28, 1);

CNN_TiledConvNxNReLUPool2x2_SW_fp("Conv5x5RLMP_1", 5, 32, 64, 12, 12, 1);

CNN_TiledLinearLayer ("LinearLayerRL_2", 64, 4, 4, 10, 1, 0, 0);

}

30 June 2017

Page 12: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

Algorithm Benchmarks

RISC-V Foundation 12

Application Cores1 2 4 8

1D FFT1024 Radix4 28.2 14.3 7.8 4.7

2D FFT 256 x 256 Radix4 78.9 41.9 22.6 13.3 0.88 MHz/Frame

Byte 5x5 Conv 18.5 9.3 4.7 2.2

Short 5x5 Conv 37.8 18.9 9.5 4.6

Binary 5x5 Conv 20.8 10.5 5.3 2.8

Short MaxPool2x2 8.2 4.2 2.1 1.1

Short MatMult 32x32 41.9 20.9 14.0 5.2

Short 2048 to 1 Fully Connected 3112.0 1616.0 847.0 495.0

CannyEdge 99.5 50.9 26.2 12.7 VGA: 3.9 MHz/Frame

AES-CTR 128b 15.3 7.7 4.0 2.1 0.47 MHz/Mbs-1

64 Mel Coefficients 542.7 299.4 176.7 101.3 10ms slots 0.64MHz

HoG, 8x8 Cells, 2x2Blocks, 9 Bins 65.0 35.0 18.0 9.0 VGA: 2.76 MHz/Frame

Cycles per produced output30 June 2017

Page 13: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

Algorithm Benchmarks

RISC-V Foundation 13

7.1

30 June 2017

Page 14: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

CNN based text recognition

RISC-V Foundation 14

Trainable Par: 421 263Neurons: 1 511 904

33ms per image

30 June 2017

Page 15: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

Dronet – Autonomous Drone

RISC-V Foundation 15

Power envelope breakdown @ 165MHz 12 images/sec

30 June 2017

Page 16: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

Unique energy efficiency vs performance

20XExtended Instruction Set (ISA)Efficient parallelization

Shared instruction cacheHW Convolution Engine

Ultra fast HW state changes

best in class ULP MCUs

high end low power MCUs,mid-range application processors

Embedded vision processorsDedicated CNN processors

GAP8uAs asleepmWs awake

10s of mWs

ener

gy

effic

ienc

y

computing power

100s of MOPS several GOPS TFLOPS

Comparison of Latest optimized ARMCMSIS-CNN library versus GAP8 implementation of identical CNN graph trained on CIFAR-10 imagesSource: ARM processors blog

Running on GAP8 cluster* No Hardware Convolution Engine** With Hardware Convolution Engine

Target Clock Time Cycles Active Power

STM32 F7 216Mhz 99.1ms 21 400 000 60mW

GAP8 * 15.4Mhz 99.1ms 1 500 000 3.7mW

GAP8 * 175Mhz 8.7ms 1 500 000 70mW

GAP8 ** 4.7Mhz 99.1ms 460 000 0.8mW

16 X reduction

STM 32 H7 216Mhz40nm

11 X

16

Page 17: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

Unique energy efficiency vs performance

RISC-V Foundation 17

@1.0V, 50 MHz. Input: W=32, H=100 Conv 3x3 Conv 5x5SW time 129.7 us 332.1 us

SW Power 12.58 mW 12.80 mW

HWCE time 69.2 us 60.8 us

HWCE Power 4.95 mW 5.1 mW

@1.0V, 50 MHz. Input: W=32, H=100 Conv 3x3 Conv 5x5Speed gain 1.87 5.46Power gain intrinsic 2.54 2.51

Power gain combined with speed gain 4.76 13.71

HWCE: Boosted convolution

30 June 2017

Page 18: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

Conclusion

30 June 2017 RISC-V Foundation 18

GAP8’s Extended RISC-V ISA and flexible, programmable architecture enables massive deployment of edge

intelligence

by dramatically reducing rich sensing device installation costs through true autonomy

and by reducing solution costwith system on a chip integration

Built on top of 2 major HW open source initiatives

Architectural Innovation

enabled by PULP, RISC-V

and Open Source

Page 19: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

Thank You!

RISC-V Foundation 1930 June 2017

Page 20: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

Backup Slides

RISC-V Foundation 2030 June 2017

Page 21: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

People Counting

RISC-V Foundation 2130 June 2017

Page 22: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

22RISC-V Foundation

Advanced Power Management

ü Embedded DC/DC, low currentü Real Time Clock 32KHz onlyü L2 Memory partially retentive

MCU sleep mode

uW ra

nge

ü Embedded DC/DC, high currentü Voltage can dynamically changeü One clock gen active, frequency can dynamically

changeü Systematic clock gating

MCU active mode

1 m

W ra

nge

ü Embedded DC/DC, high currentü Voltage can dynamically changeü Two clock gen active, frequencies can

dynamically changeü Systematic Clock Gating

MCU + Parallel processor active mode

10-4

0 m

W ra

nge

Ultra fast switching time from one mode to anotherUltra fast voltage and frequency change time

Highly optimized system level power consumption

30 June 2017

Page 23: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

23RISC-V Foundation

Source of Energy Efficiency?

data analysis & classification,

extended RISC-Vefficient 8 core parallelization

HW synchronizationshared instruction cache

CNN HW engine

3-5x

1.4x

4x

1.5x

eRIS

C-V

Logarithmic Interconnect

Shared L1 Memory

Shared Instruction CacheDbg Unit

DMA

CNN-HWE

HW Sync

ClusterL2 Memory

LVDSUARTSPII2SI2C

// 10bGPIOs

HyperBus

eRISC-V

I$

L1

Micro

DM

A

ClkDbg

Rom eRIS

C-V

eRIS

C-V

eRIS

C-V

eRIS

C-V

eRIS

C-V

eRIS

C-V

eRIS

C-V

overall, in practice on targeted algorithms,

typically 20x

30 June 2017

Page 24: Using RISC-V in high computing, ultra-low power, programmable … · Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge

System Cost

RISC-V Foundation 24

Sys

tem

cos

t

computing power100s of MOPS several GOPS TFLOPS

best in class ULP MCUs

high end low power MCUs,mid-range application processors

Embedded vision processorsDedicated CNN processors

GAP8

2-3X

System-On-a-ChipHigh integration

30 June 2017