20
NTU CMOS Emerging Technology Group for Technology Direction and System Integration Asst. Prof. Hao Yu ([email protected]) School of Electrical and Electronic Engineering Nanyang Technological University, Singapore http://www.ntucmosetgp.net 1

NTU CMOS Emerging Technology Group for Technology ...news.ntu.edu.sg/rc-VIRTUS/Documents/2014-S4-YH.pdf · NTU CMOS Emerging Technology Group for Technology Direction and System Integration

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

NTU CMOS Emerging Technology Group for

Technology Direction and System Integration

Asst. Prof. Hao Yu ([email protected])

School of Electrical and Electronic Engineering

Nanyang Technological University, Singapore

http://www.ntucmosetgp.net

1

NTU CMOS Emerging Technology Group:

Research Summary I Research areas

Energy-efficient ICs for data link/analytic: wired/wireless I/Os, accelerators

• 2011 MOE Tier-2 (PI): 800K-S$ on 3D I/O

• 2010 NRF POC (PI): 250K-S$ on CMOS 60GHz Terminal

• 2011 MOE Tier-1 (PI): 200K-S$ on CMOS THz Terminal

• 2012 A*STAR PSF Tier-2 (CO-PI): 800K-S$ on 3D Server

• 2012 NRF CRP (CO-PI): 6M-S$ on NVM server

• 2012 HiSilicon (PI): 102K-S$ on CMOS 60GHz PA

• 2014-2015 Intel(PI): 30K-S$High-speed link

• 2014-2015 Huawei(PI): 400K-S$ on big-data server

• 2014-2018 MediaTek JIP: High-speed link and power management

Multi-modal Ics for data collection: optical/IR/THz/chemical biomedical sensors

• 2010 DSO-DIRP (CO-PI): 590K-S$ on 3D MEMS Sensor

• 2012 NRF POC (CO-PI): 250K-S$ on Ion ISFET Sensor

• 2013-2015 TSMC (PI): 50mm^2 CMOS 65nm CMOS imager/ISFET tapeout

• 2014-2015 JTC (PI): 180K-S$ wireless sensor with distributed machine learning

• 2011-2014 Advantest JIP: THz imaging

Manpower Training: 7 PhDs (4 to graduate in 2014), 3 MS/MEngs, 10 Research Staffs

2

NTU CMOS Emerging Technology Group:

Research Summary II Research Contributions

first CMOS meta-material 60GHz/140GHz/280GHz transceiver for near-field communication/imaging (2012)

first CMOS dual-mode sensor for DNA sequencing (2013)

first 3D/2.5D multi-core server (2014)

Collaborators: Prof. Dennis Sylvester/Prof. Wei Lu (Univ. of Michigan), Prof. Paul Franzon (NCSU), Prof. Sung Kyu Lim (Georgia Tech.), Prof. Sheldon Tan (UC-Riverside), Prof. Peter So (MIT), Prof. Krish Chakrabarty (Duke University), Prof. Xin Li (CMU) Dr. Tanay Karnik (Intel Lab), Dr. Ron Ho (Oracle Lab), Dr. Jinjun Xiong (IBM TJ-Watson Lab), Dr. Joshua Yang (HP Lab), Dr. Louis Scheffer (Howard Hughes Medical Institute) and Dr. William Yang (Huawei Shannon Lab).

Invited Talks: Intel Lab (Hillsboro), HP Lab (Palo Alto), IBM-TJ Lab (York Town), Qualcomm R&D (San Diego), TSMC R&D (Taiwan)

Publication: 90 conferences (VLSI-SYMP/CICC/RFIC/IMS/DAC), 40 SCI journals (IEEE/ACM Trans.), 1 best paper award in ACM Trans., 1 keynote talk, 4 books by Springer/CRC, and 5 patents in application

3

Energy-Efficient ICs for Data Analytics

• 1 Core = Microprocessor (=6 Giga

Flops @1.5GHz) •4 FPUs + RegFiles

•1 Chip = 742 Cores (=4.5 Tera Flops/s) • 213 MB of L1 I&D + 93 MB of L2

• 1 Node = 1 Chip + 16 DRAMs (16GB) • 1 Group = 12 Nodes + 12 Routers

(=54Tera Flops/s) • 1 Rack = 32 Groups (=1.7 Peta Flops/s)

• 384 nodes / rack •1 Data Center (=1 Exa Flops/s)

•3.6EB of Disk Storage •3.6PB = 0.0036 bytes/flops •583 Racks

Bandwidth at 100Gps, Space of 20,000 sq. ft. and Power

of 68 MW !!! Thousand cores in big memory

4

Big-data-analytic by Logic-Memory-Integration:

GHz TSV and TSI I/Os

[H. Yu-NTU: ICCAD’06, DAC’13, DATE’13-14, ASPDAC’13-14, ICCAD’14, TVLSI’08, TODAES’09, TCAD’13-14, TC’14]

Through silicon via

Through silicon interposer

5

8Gbps,0.8mW TSI I/Os for 2.5D Integration

A forward-clock (FC)

I/O with CTLE

equalization

Lowest JTB variation

of 35MHz, data rate

of 8Gb/s, energy

efficiency of

0.81mW/Gb/s with

the full-rate

structure, and area

of 0.16mm2

[H. Yu-NTU: 3D-IC’13, DATE’14, ISLPED’14, ICCAD’14] 6

Data-analytic Multi-core Microprocessor with 2.5D

TSI I/Os

Multiple MIPS cores with H.264 video accelerators

Multiple external memory blocks TSI I/Os between cores and memory GF 65nm + IME TSI process

9 [H. Yu-NTU: 3D-IC’13, DATE’14, ISLPED’14, ICCAD’14]

140G/280G CMOS wired I/Os:

High loss conventional T-line as well as switch can be replaced by low loss, low crosstalk

surface-wave plasmonic interconnect (waveguide)

Modulator, coupler size can be reduced to save area in 140G with larger bandwidth with

higher data rate >10Gbps

10 [H. Yu-NTU: APL’15]

CMOS sub-THz Wired I/Os

60G/140G/280G CMOS wireless I/Os:

In-phase sub-THz signal generation, transmission and detection with widest tuning

range, highest output power, and best sensitivity

11

CMOS sub-THz Wireless I/Os

[H. Yu-NTU: IMS’12-14, ASSCC’12-13, RFIC’13, CICC’13-14, ESSCIRC’14, TMTT’13-15]

2056 (H) × 1600 (V)

Pixel Array

Single-Slope Column ADC

Row

Dec

oder

/Dri

ver

IDA

C

SR

EG

Column Decoder/Driver

SRAM

ODD-COLUMN 10-bit Output

Single-Slope Column ADC

Column Decoder/Driver

SRAM

IDA

C

RA

MP

-GE

NE

RA

TO

R

EVEN-COLUMN 10-bit Output

PIX_OUT<1,3,5…1027>

PIX_OUT<0,2,4…1026>

12

5mm

5m

m

12

Data-analytic CMOS Image Sensor with 2.5D I/Os

1.1um 4way-shared pixel; Column-parallel with CDS readout

+ LVDS I/O; 60fps 3Mega

TSMC 65nm BSI

External

Memory

900kB

(1280*720)

Integ

reator

Sq

uared

Integ

rator

Integrated

Image

Buffer

21b, 21kB

(84x96)

Squared

Integrated

Image

Buffer

21b, 21kB

(84x96)

Face

Detection

Feature

Memory

26kB

(2135x96b)

Face

Buffer

Principal

Components

Analysis

Extreme

Learning

Machine

Final

Dicision

Eigen Face

Memory

97kB

(50 faces)

ELM

Memory

151kB

Chip Boundary

人脸探测

人脸识别

Architecture

3mm

3m

m

TSMC 40nm

13

Data-analytic Accelerator for Face Recognition

ISSCC’15: Collaboration with UMICH Dennis Sylvester

10mW, 5fps,1280x720

CMOS SoC for face

detection, recognition and

tracking

Big-data-analytic by Logic-in-Memory:

Non-volatile Computing

[H. Yu-NTU: ISLPED’12-13, DATE’14, ASPDAC’14, TNANO’12, TVLSI’14, Springer’14]

…...…...

SHF1SHF1

RD

RDWR1

WR1 SHF2SHF2

WR2

WR2

Load A Load BOutput

DWDW

A

BCin

SUM

EN EN

I I

A A

B

Cin

B

Cin

M2

M1 M3

M4

Cout

Cout

EN EN

VDD VDD

Cin

DW

DW

DW

A

B

A

DWCin

B DW Cout

nanowire

nanowire

nanowire

nanowire

nanowire

nanowire

nanowire

nanowire

nanowire

nanowire

nanowire

nanowire

BL BLB

Column mux & sense amplifiers

Wo

rd-lin

e d

eco

de

r

8th bit1st bit 2nd bit

Parallel output by distributing bits into separate nanowires

Sigmoid

function by

DWM-LUT

Bit-lin

e

de

co

de

r

x 1/(1+e-x)

x

~1/(1+e-x)

14

Nonvolatile-in-memory Imaging Accelerator Machine learning for super-resolution imaging Comparisons with conventional architecture

1. All operations involved in machine learning on neural network can be mapped to a logic-in-memory

architecture by non-volatile domain-wall nanowire. 2. I/O traffic in proposed DW-NN is greatly alleviated with an energy efficiency improvement by 92x and

throughput improvement by 11.6x compared to the conventional image processing system by general

purpose processor.

15

Multi-modal ICs for Data Collection

System miniaturization for point-of-care personal data collection:

microscope, NMR, flow cytometer, PCR, network analyzer

16

Multi-modal CMOS sensor: electrical, optical and chemical Microfluidic channel: molecules, tissues, cells, and biofilms LoC system: high throughput, non-invasive, large array, on-chip

processing for potable diagnosis

CMOS based Multi-modal Lab-on-chip

17

CMOS Sub-THz Imaging System

[H. Yu-NTU: IMS’12-14, ASSCC’12-13, RFIC’13, CICC’13-14, ESSCIRC’14, TMTT’13-15] 18

CMOS Optofluidic Imaging System for Cell Micro-

flow Cytometer

[H. Yu-NTU: PLS-ONE’14, LoC’14] 19

CMOS Dual-mode ISFET Sensor for DNA

Seqeuncing and Food Safety

Parameters Specifications

Process Standard TSMC

0.18μm CIS

Pixel Type Dual-Mode

(Image and Chemical)

Pixel Size 10μm×10μm

Pixel Optical

Sensing Area 20.1μm2 (FF=18.1% )

Pixel Chemical

Sensing Area 22.3μm2 (FF=20.1% )

Array Size 64×64

Die Area 2.5mm×5mm

ADC ENOB 11.4 bits

ADC SNDR 70.35dB

FPN 0.3%

Frame Rate 1200fps

Total Power

Consumption 32mA @ 3.3V

[H.Yu-NTU: IEEE VLSI-SYMP’14 Highlighted, ISMM ‘14 Keynote Talk] 20

Measurement: Accurate Large-arrayed Local pH

Correlation

1600

2400

3200

14

7

0

pH

sc

ale

ba

r

Dig

ita

l O

utp

ut

(12

-bit

)

Contact Image pH Map

21

Thank You!

EDB for JIP Sponsorship; MTK and GF for 65nm tapeout

http://www.ntucmosetgp.net

Email: [email protected]

Skype: hao.yu.ntu

22