20
Distribution A. Approved for public release; distribution is unlimited Distribution A. Approved for public release; distribution is unlimited NARESH UNIVERSITY OF ILLINOIS AT SHANBHAG URBANA-CHAMPAIGN

naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimitedDistribution A. Approved for public release; distribution is unlimited

NARESH

UNIVERSITY OF ILLINOIS AT

SHANBHAG

URBANA-CHAMPAIGN

Page 2: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views,opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views orpolicies of the Department of Defense or the U.S. Government.

Page 3: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

THE TEAM

Mike Burkland Naveen Verma(Princeton)

Pavan Hanumolu(UIUC)

Naresh Shanbhag(UIUC)

[systems & ckts][mixed-signal ICs][systems & ckts]

• L.-Y. Chen• P. Deaville• B. Zhang

• A. Patil, H. Hua, S. Gonugondla, K.-H. Kim

Ajey Jacob

[devices, foundry]

• S. Soss, D. Brown

• N. Gaul, B. Paul, B. Lanza

• J. Broussard• L. Suantak• J. Goulding• S. Cole• T. Gangwer• R. Kelley• C. Contreras

[DoD systems]

Raytheon MS GLOBALFOUNDRIES Princeton University of Illinois at Urbana-Champaign

Page 4: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

REALIZING ARTIFICIAL INTELLIGENCE @ THE EDGE

AI at the Edge

model

data

AI in the Cloud(autonomous cognitive agent)

• client-server model• efficiency challenged• security + privacy issues

• energy-latency efficient, securebut……

• volume + resource constraineddecisions

[source: pexels.com][source: pexels.com]

Page 5: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

THE DATA MOVEMENT PROBLEM𝑬𝑬𝒎𝒎𝒎𝒎𝒎𝒎𝑬𝑬𝒎𝒎𝒎𝒎𝒎𝒎

≈ ~𝟏𝟏𝟏𝟏𝟏𝟏 × (SRAM) → ~𝟓𝟓𝟏𝟏𝟏𝟏 × (DRAM) → ~𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 × (Flash)

[Canziani, Arxiv’16]

The Memory Wall in von Neumann Architectures

Page 6: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

fundamental question

how do we design intelligent autonomous machinesthat operate at the limits of energy-delay-accuracy?

Page 7: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

Proceedings of IEEE, Special Issue on non von Neumann Computing, January 2019.

Systems on Nanoscale Information fabriCswww.sonic-center.org

[2013-’17](STARnet Program by Semiconductor Research Corporation & DARPA)

Page 8: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

SHANNON-INSPIRED MODEL OF COMPUTING

use information-based metrics e.g., mutual information 𝐼𝐼(𝑌𝑌𝑜𝑜; �𝑌𝑌)1design low SNR fabrics, e.g., deep in-memory architecture (DIMA)2

develop statistical error-compensation (SEC) techniques3

(noisy channel)

DIMA

[ISSCC’18]

Page 9: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

integratethe MRAM device

within a

deep in-memory architecture (DIMA)

using

Shannon-inspired compute models

FRANC OBJECTIVEGLOBALFOUNDRIES PRINCETON & UIUC RAYTHEON MISSILE SYSTEMS

thermally limited

MRAM device

deep in-memory arch.

Shannon-inspired compute models

autonomous platforms

Page 10: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

[ICASSP 2014 UIUC, JSSC 2017 Princeton, JSSC 2018 UIUC]

read functions of datastandard bit-cell array(preserves storage density)

THE DEEP IN-MEMORY ARCHITECTURE (DIMA)

Page 11: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

SRAM DIMA PROTOTYPES

Multi-functionalinference processor

(65nm CMOS)

Random forest processor

(65nm CMOS)

On-chip training processor

(65nm CMOS)

Fully (128) row-parallel compute

(130nm CMOS)

100X EDP reduction over von Neumann equivalent* @ iso-accuracy* 8b fixed-function digital architecture with identical SRAM size

[Feb. JSSC’18] [ESSCIRC’17, JSSC July’18] [ISSCC’18, JSSC Nov.’18] [VLSI’16, JSSC’17]

Page 12: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

MRAM OPPORTUNITIES & CHALLENGES

high conductance → large array currentssmall TMR → high sensitivity readouthuge 𝑅𝑅𝑀𝑀𝑀𝑀𝑀𝑀 − 𝑅𝑅𝑀𝑀𝑀𝑀𝑀𝑀 limits cell compute modelshigh cell density → severe pitch-matching const.high MTJ process variation restricts bit-line SNR

MTJ resistance distribution

Free Layer

Ref. Layer

Tunnel Barrier

Select Transistor

Top Electrode

BottomElectrode

WL

Source Line

pMTJ stack 40Mb MRAM demo

Challenges:Opportunities:high density → high on-chip compute densitynon-volatility → ultra low-power duty-cycled operationin production → enables system prototyping &

reduces time to DoD availability

Page 13: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

four DIMAs proposed – two selected for prototypingBit-line Sensing (DIMA-BLS) Current Drive Time-based

Sensing (DIMA-CDTS)

𝑥𝑥[𝑖]

𝑀𝑀 × 𝑁𝑁 single-shot matrix vector multiply Functional read: 5-bit × 4-bit dot products on SLs Compute cell: 1T-1MTJ ( 4 × 1 multiplication) SL current sensing via time-based 5-bit ADCs SNR vs. energy trade-off due to sensing circuits

𝑀𝑀 × 𝑁𝑁 single-shot matrix vector multiply Functional read: binary dot products on BLs Compute cell: 2T-2MTJ (1b XNOR/AND) BL voltage sensing via 4-bit SAR ADCs SNR vs. energy trade-off due to sensing circuits

Page 14: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

EDP GAINS OVER VON NEUMANN EQUIVALENT

(includes all peripherals)

43× EDP reduction (256-dimensional dot products)

250×-to-1000× EDP reduction (128-dimensional dot products)

DIMA-BLS

DIMA-CDTS

MVM: matrix-vector multiply

Page 15: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

SHANNON-INSPIRED COMPUTE MODELS

• relaxes MRAM-based DIMA’s compute SNR requirements → enables substantial energy savings

• two methods: stochastic data-driven hardware resilience & coded DIMA

MRAM-based DIMA

use information-based metrics develop error-compensation techniques

Page 16: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

STOCHASTIC DATA-DRIVEN HARDWARE RESILIENCE

• noise-tolerance enhanced well beyond MTJ variations

• can be exploited for energy-SNR tradeoff

MRAM-based DIMA InferenceModel Parameters𝜃𝜃𝑀𝑀𝑀𝑀𝑀𝑀𝑆𝑆𝑆𝑆(𝑥𝑥,𝐺𝐺)G (MRAM variation, ADC noise) gi

DIMA-aware Stochastic Training

[B. Zhang, ICASSP’19]

Simulated MRAM-based BNN (CIFAR-10):

Page 17: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

CODED DIMA - ENHANCING DIMA’S COMPUTE SNRMTJ variation aware coding enables HIGH system SNR in spite of LOW circuit SNR

part

ial d

ecim

atio

n-in

-tim

eA

rchi

tect

ure

for m

appi

ng

FFT

on D

IMA

N-point Input Sequence

N/ M-pointInput

Sequence

N/ M-pointInput

Sequence

Input shuffling

N/ M-pointDFT

N/ M-pointDFT

Merge N/ M-point DFTs

N-point DFTCoefficients

N- p

oint

DFT

DFT, DHT, QFT

Coefficient precision (𝐵𝐵𝑊𝑊)

Out

put S

NR(d

B)

Limited byinput SNR

Limited byMRAM device

variation

Limited byquantization

noise

Input SNR: 40 dB; = 8; = 128

18dB

61 dB

41 dB

MRAM-DI MAWrite data

Functional read

Calculate error

Calibrate

write-time compensation (WTC)of MRAM variations

WTC

MTJ

𝐵𝐵𝑥𝑥 𝑁𝑁

Page 18: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

CURRENT STATUS

Technology transition via SRAM-based DIMA DIMA prototype designs taped out

(2019) [single-bank] validate EDPgains & calibrate models

(2020) [multi-bank] enable scaling and system integration

DIMA technology transition to RMS foralgorithm prototyping & evaluation

facilitates future MRAM-based DIMAintegration by RMS

verified & quantified RMS’s mission capability enabled by MRAM-based DIMA MRAM-based DIMA demonstrates > 2 orders-of-magnitude EDP reduction

[H. Jia, arXiv:1811.04047, Princeton]

Page 19: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

Distribution A. Approved for public release; distribution is unlimited

NEXT STEPS & SUMMARY

Mission: To realize > 200X in EDP gains in DoD workloads by integrating MRAM devicewithin Deep In-memory Architectures using Shannon-inspired compute models

FRANC Program: “to provide the foundation for new materials technology and newintegration approaches to be exploited in pursuit of novel compute architectures”

Multi-functioned DIMAs for Diverse Applications Parameterized DIMAs for Platform-design Tools

Summary

[Srivastava, et al., ISCA’18]

Page 20: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES