Mixed-Signal Techniques for Embedded Machine Learning Systems · 2020. 12. 5. · Mixed-Signal...

Mixed-Signal Techniques for Embedded

Machine Learning Systems

Boris Murmann

June 16, 2019

Applications

Speed of response

Bandwidth utilized

Privacy

Power consumed

CloudEdge

Conversational

Interfaces

…natural?

…seamless?

…real-time?

Task Complexity, Memory and Classification Energy

Face Detection

Digit RecognitionImage Classification

10 classes

Binary CNN

Image Classification

1000 classes

MobileNet v1

Self-Driving

ResNet-50 HD

Task Complexity, Memory and Classification Energy

My main interest

Edge Inference System

Sensors

Wake-up

Detector

Processin

Neural

Network

Buffer

Register

Compute

Opportunities for Analog/Mixed-Signal Design

Sensors

Wake-up

Detector

Processin

Neural

Network

Buffer

Register

Compute

Analog

Feature

Extraction

Analog

In-Memory

Computing

Outline

Data-Compressive Imager for Object Detection

› Omid-Zohoor & Young, TCSVT 2018 & ISSCC 2019

Mixed-Signal ConvNet

› Bankman, ISSCC 2018 & JSSC 2019

RRAM-based ConvNet with In-Memory Compute

› Ongoing work

Wake-Up Detector with Hand-Crafted Features

Feature

Extractor

Simple

Classifier

Sensor

Signal LabelA/D

Training

Algorithm

Labeled

Training Data

High-dimensional

Low-dimensional

representation

Data Deluge

Analog Feature Extractor

Low-rate and/or low–resolution ADC

Low data rate digital I/O

Reduced memory requirements

Feature

Extractor

Simple

Classification

Sensor

SignalLabelA/D

Training

Algorithm

Labeled

Training Data

Low-dimensional

representation

Histogram of Oriented Gradients

Analog Gradient Computation

H1 H2𝐺𝐻 =

V1 V2𝐺𝑉 =

horizontal

vertical

𝐺𝐻 = 400𝑚𝑉 − 100𝑚𝑉 = 𝟑𝟎𝟎𝒎𝑽

𝐺𝐻 =1

4400𝑚𝑉 −

4100𝑚𝑉 = 𝟕𝟓𝒎𝑽

Dark patch

Bright patch

High Dynamic Range Images

Gradient magnitude varies

significantly across image

Would require high-

resolution ADCs (≥ 9b) to

digitize computed gradients

› But, we want to produce

as little data as possible

Ratio-Based (“Log”) Gradients

𝐺𝐻 =V1

𝐺𝑉 =

𝐺𝐻 = =𝛼 ×

𝛼 ×

Illumination Invariant

Ratio Quantization

𝐺𝐻 =

𝐺𝑉 =

𝑉𝑝𝑖𝑥1𝑉𝑝𝑖𝑥2

Negative

Positive

*Empirically determined

thresholds

𝑉𝑝𝑖𝑥1𝑉𝑝𝑖𝑥2

4.21.31

𝐺𝑉 = 𝑄𝐺𝐻 = 𝑄

HOG Feature Compression with 1.5b Gradients

Fewer angle bins

Quantizing histogram

magnitudes

25× less data in HOG

features compared to 8-bit

Log vs. Linear GradientsLess Illu

Prototype Chip

Row Buffers with Pixel Binning (Image Pyramid)

Ratio-to-Digital Converter (RDC)

Data-Driven Spec Derivation

𝐻1 + 𝝐𝟏𝐻2 + 𝝐𝟐

Chip Summary

• 0.13 µm CIS 1P4M

• 5µm 4T pixels

• QVGA 320(V) x 240(H)

• 229 mW @ 30 FPS

Supply Voltages

Pixel: 2.5V

Analog: 1.5V, 2.5V

Digital: 0.9V

Sample Images

Results using Deformable Parts Model detection & custom database (PascalRAW)

Comparison to State of the Art

Information Preservation

Use Log Gradients as ConvNet Input?

Ongoing work; comparable performance using ResNet-10 (PascalRaw dataset)

Digital ConvNet Fabric

Can We Play Mixed-Signal Tricks in a ConvNet?

BinaryNet

Courbariaux et al., NIPS 2016

Weights and activations constrained to +1

and -1, multiplication becomes XNOR

Minimizes D/A and A/D overhead

Nice option for small/medium-size

problems and mixed-signal exploration

Bankman et al., ISSCC 2018

Mixed-Signal Binary CNN Processor

1. Binary CNN with “CMOS-inspired” topology, engineered for minimal circuit-level path loading

2. Hardware architecture amortizes memory access across many computations, with all memory on chip (328 KB)

3. Energy-efficient switched-capacitor neuron for wide vector summation, replacing digital adder tree

CIFAR-10

Zhao et al., FPGA 2017

Original BinaryNet Topology

88.54% accuracy on CIFAR-10

1.67 MB weight memory (68% FC layers)

27.9 mJ/classification with FPGA

Mixed-Signal BinaryNet Topology

Sacrificed accuracy for regularity and energy efficiency

86.05% accuracy on CIFAR-10

328 KB weight memory

3.8 mJ per classification

Neuron

Naïve Sequential Computation

Weight-Stationary

x2 (north/south)

x4 (neuron mux)

Weight-Stationary and Data-ParallelParallel

broadcast

Complete Architecture

𝑧 = sgn

𝑖=0

𝑤𝑖𝑥𝑖 + 𝑏 𝑧

𝑤 𝑏Filter Weights

𝑥Image Patch Filter Bias

9b sign-magnitude

𝑏 = −1 𝑠

𝑖=0

2𝑖𝑚𝑖

Output

2×2×256 2×2×256

Neuron Function

𝑣diff𝑉𝐷𝐷

=𝐶𝑢𝐶𝑡𝑜𝑡

𝑖=0

𝑤𝑖𝑥𝑖 + 𝑏

𝑏 = −1 𝑠

𝑖=0

2𝑖𝑚𝑖

Switched-Capacitor Implementation

Bias & offset cal.Weights x inputs

Batch normalization

folded into weight

signs and bias

Behavioral Simulations

Significant margin in noise, offset, and mismatch (VFS = 460 mV)

𝐶𝑢 = 1 fF𝑣𝑜𝑠 = 4.6 mV RMS𝑣𝑛 = 460 µV RMS

“Memory-Cell-Like” Processing Element

1 fF metal-oxide-metal fringe capacitorStandard-cell-based

42 transistors

24107 F2

TSMC 28nm HPL 1P8M

6 mm2 area

328 KB SRAM

10 MHz clock

Supply Voltages

VDD – Digital Logic, 0.6V – 1.0V

VMEM – SRAM, 0.53V – 1.0V

VNEU – Neuron Array, 0.6V

VCOMP – Comparators, 0.8V

Die Photo

μ = 86.05%

σ = 0.40%

Measured Classification Accuracy

10 chips, 180 runs each through

10,000 CIFAR-10 test images

VDD = 0.8V, VMEM = 0.8V

3.8 μJ/classification

237 FPS, 899 μW

0.43 μJ in 1.8V I/O

Mean accuracy μ = 86.05% same

as perfect digital model

Comparison to Synthesized Digital

BinarEye(Moons et al., CICC 2018)

Synthesized Digital Mixed-Signal

Digital vs. Mixed-Signal Binary CNN Processor

Synthesized

Digital

Moons et al.

CICC 2018

Hand-Designed

Digital

(Projected)

Mixed-Signal

Bankman et al.

ISSCC 2018

Energy @

86.05%

CIFAR-10

CIFAR-10 Energy vs. Accuracy Neuromorphic› [1] TrueNorth, Esser PNAS

GPU› [2] Zhao FPGA 2017

FPGA› [2] Zhao FPGA 2017

› [3] Umuroglu FPGA 2017

MCU› [4] CMSIS-NN, Lai arXiv 2018

Memory-like, mixed-signal› [5] Bankman ISSCC 2018

BinarEye, digital› [6] Moons CICC 2018

In-memory, mixed-signal› [7] Jia arXiv 2018

*energy excludes off-chip DRAM

Limitations of Mixed-Signal BinaryNet

Poor programmability

Relatively limited accuracy (even on CIFAR-10) due to 1b arithmetic

Energy advantage over customized digital is not revolutionary

› Same SRAM, essentially same dataflow

Need a more “analog” memory system to unleash larger gains

› In-memory computing

BinaryNet Synapse versus Resistive RAM

• TBD

• 25 F2

• Multi-bit (?)

• 0.93 fJ per 1b-MAC in 28 nm

• 24107 F2

• Single-bit

Tsai, 2018

Matrix-Vector Multiplication with Resistive Cells

Typically use two cells to

achieve pos/neg weights

(other schemes possible)

256 columns

25 F2 cell @ F = 90 nm

Side length s = 0.45 mm

2 x 1024 x 0.45 um x 256 x 0.45 um = 0.106 mm2

per layer

0.84 mm2 for the complete 8-layer network

RRAM Density – BinaryNet Example

Ongoing Research

What is the best architecture?

How many levels can be stored in each cell?

What is the most efficient readout?

Can we cope with nonidealities using training techniques?

(Content deleted for online distribution)

VGG-7 Experiment (4.8 Million Parameters)

3×3×128

3×3×256 3×3×512

3×3×2563×3×3

2-bit weights, 2-bit

activations

Accuracy on CIFAR-10

2b quantization only: 93%

2b quantization + RRAM/ADC model: 92%

Work in progress!

Optimal energy/MAC = 0.38 fJ

Energy Model for Column in Conv6 Layer

Summary

Analog feature extraction is attractive for wake-up detectors

Adding analog compute in ConvNets can be beneficial when it

simultaneously lets us reduce data movement

› In-memory analog compute looks most promising

› Can consider SRAM or emerging memories (e.g. RRAM)

Expect significant progress as more application drivers for “machine

learning at the edge” emerge

Mixed-Signal Techniques for Embedded Machine Learning Systems · 2020. 12. 5. · Mixed-Signal...

Documents

Mixed pathologies including chronic traumatic ...i2.cdn.turner.com/cnn/2017/images/02/14/dementiafootball_ling.pdfMixed pathologies including chronic traumatic encephalopathy account

CNN-PS: CNN-based Photometric Stereo for General Non ...openaccess.thecvf.com/...CNN-PS_CNN-based_Photometric_ECCV_2… · CNN-PS: CNN-based PhotometricStereo for General Non-Convex

A Comparative Study of CNN, BoVW and LBP for Classiﬁcation ... · characteristics. This paper is a comparative study describing the potential of using local binary patterns (LBP),

CNN Center

YodaNN : An Architecture for Ultra-Low Power Binary … · 1 YodaNN1: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration Renzo Andri , Lukas Cavigelli , Davide Rossiyand

CNN Airport Media Kiti.cdn.turner.com/cnn/CNN/Programs/airport.network/cnn...Programming includes breaking news, business and financial reports, sports updates, weather, lifestyle

A Hierarchical binary CNNs for landmark localization with ... · Our work is inspired by recent results of binarized CNN architectures on image classiﬁcation [5], [6]. Contrary

Approximating the Stability Region for Binary Mixed-Integer … · Binary Mixed-Integer Programs Fatma Kılın¸c-Karzan, Alejandro Toriello, Shabbir Ahmed, George Nemhauser, Martin

State Space Mixed Models for Longitudinal Observations ... · State Space Mixed Models for Longitudinal Observations with Binary and Binomial Responses Sonderforschungsbereich 386,

Variational based Mixed Noise Removal with CNN Deep

Learning Deep Models for Face Anti-Spoofing: Binary or ......Presentation Attack Binary Supervision 0 1 Auxiliary Supervision Depth map rPPG signal Figure 1. Conventional CNN-based

Mixed-Integer Linear Programming - unican.espersonales.unican.es/castie/Chapter2.pdf · 26 Chapter 2. Mixed-Integer Linear Programming a binary choice: x= ‰ 1 if the event occurs

Cnn familybonding

PAAVANJ, Cnn

Binary Extended Formulations of Polyhedral Mixed-integer Sets

SUPERCRITICAL BINARY GEOTHERMAL CYCLE … ep- - 8 8 0 o de91 001815 supercritical binary geothermal cycle experiments with mixed-hydrocarbon working fluids and a near-horizontal in-tube

2sRanking-CNN: A 2-stage ranking-CNN for …2.2.1 1st-stage Ranking-CNN Before explaining 1st-stage ranking-CNN, let’s brieﬂy explain how ranking-CNN works. Ranking-CNN was proposed

An RLT Approach for Solving Binary-Constrained Mixed Linear ...hobbsgroup.johnshopkins.edu/docs/FERC PDFs/3M3/Market Modeling... · An RLT Approach for Solving Binary-Constrained

Conic Mixed-Binary Sets: Convex Hull Characterizations and

Perrhenate Incorporation into Binary Mixed …181861/datastream...Perrhenate Incorporation into Binary Mixed Sodalites: ... by bulk chemical analysis, X-ray diffraction, ... Jade”