ICCAD ’ 03 Review CSE 597B Lin Li. Outline Overview Archive download URL Best paper award Paper from our group Interesting tutorial Paper in related areas

ICCAD’03 Review

CSE 597B

Lin Li

Outline Overview

Archive download URL Best paper award Paper from our group Interesting tutorial

Paper in related areas Power and energy optimization Interconnect-centric SoC design Reliable issue Performance optimization Simulation at the nanometer scale

Other areas in ICCAD

Archive Download URL

Papers and presentation slides can be downloaded from:

http://www.iccad.com/archive.html

Best Paper Award 6C.1 - Noise Analysis for Optical Fiber Com

munication Systems Alper Demir KOC University, Sariyer-Istanbul, Turkey

8B.1 - Block-Based Static Timing Analysis with Uncertainty Anirudh Devgan, Chandramouli Kashyap IBM Research at Austin, IBM Microelectronics

Paper from Our Group 1A.1 - Adaptive Error Protection for Energy

Efficiency Lin Li, N. Vijaykrishnan, Mahmut Kandemir, Mar

y Jane Irwin

3C.1 - Array Composition and Decomposition for Optimizing Embedded Applications Guilin Chen, Mahmut Kandemir, Ugur Sezer, Ava

nti Nadgir

Interesting Tutorial 2C.1 - Design and CAD Challenges in sub-90

nm CMOS Technology Kerry Bernstein, Ching-Te Chuang, Rajiv V. Joshi,

Ruchir Puri IBM T.J. Watson

11B.1 - Formal Methods for Dynamic Power Mangement Rajesh K. Gupta, Sandeep Shukla, Sandy Irani UCSD, UCI, and VT

2C.1 - Design and CAD Challenges in sub-90nm CMOS Technology

Introduction CMOS device scaling New devices for high-performance logic

Planar device structures Partially-depleted (PD) SOI Fully-depleted (FD) SOI Strained-Si & high-k gate

Emerging technologies Double-gate MOSFETs 3D integration and interconnects Carbon Nanotube Transistor (CNT) Molecular computing

CAD challenges Challenges of Advanced device technologies Major issues

Power crisis Coping with Variability

2C.1 - Design and CAD Challenges in sub-90nm CMOS Technology (Cont’d)

11B.1 - Formal Methods for Dynamic Power Mangement

Overview the formal methods that have been explored in solving the system-level Dynamic Power Management (DPM) problem.

Show how formal reasoning frameworks can unify apparently disparate DPM techniques.

Approaches that treat the DPM problem as one of stochastic optimization with probabilistic guarantees on performance.

Power and Energy Optimization

Using dynamic voltage scaling in embedded systems (Section 1B)

Using software techniques in embedded systems (Section 3C)

Energy issues in systems design (Section 7B)

Power-aware design (Section 8C)

1B.1 - Generalized Network Flow Techniquesfor Dynamic Voltage Scaling in Hard Real-Time Systems

Vishnu Swaminathan, Krishnendu Chakrabarty ECE@Duke

Energy consumption must be carefully balanced with real-time responsiveness in hard real-time systems.

Present an optimal offline dynamic voltage scaling (DVS) scheme for dynamic power management in such systems.

lij, uij, Cij, ij

i j

j1

jn

s1h

sni

s1l

snh

1

2

2n-1

2n-2

Jobs Speeds Intervals

st

c 1h,c 1h

,V h2 ,1

0,1,Vl2c

1l-Vh2c

1h,c1l-c1h

0, ,0,1

0, ,0,1

0, 1 ,0,1

0, 2 ,0,1

c 1h,c 1h

+1,

V h2

,1

.

.

.

.

s1i

snl

0,1,Vi

2c1i -V

h 2c1h ,c

1i -c1h .

.

.

.

Generalized Network Flow Models for the DVS problem

1B.2 - Approaching the Maximum Energy Saving on Embedded Systems with Multiple Voltages

Shaoxiong Hua, Gang Qu ECE@UMCP

For a multiple-voltage DVS system to serve a set of applications {(ei, di, pi): i=1, 2, …, n} without missing their deadlines, if the system has m voltages {v1, v2,… ,vm}, determ

ine the value of each vi to minimize the energy consumption.

determine m and the value of each vi .

1B.2 - Approaching the Maximum Energy Saving on Embedded Systems with Multiple Voltages (Cont’d)

Voltage set-up is the fundamental problem for multiple-voltage DVS system. application-specific 2-voltage DVS system: analytic solutions and

a linear search algorithm m-voltage DVS system: analytic solution does

not exist, an approximation method Multiple-voltage can be very close to the

maximal energy saving by DVS.

1B.3 - Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Heterogeneous Distributed Real-Time Embedded Systems

Le Yan, Jiong Luo, Niraj K. JhaEE@Princeton

New scheduling algorithm that combines DVS and adaptive body biasing (ABB) to simultaneously optimize both dynamic power consumption and leakage power consumption for real-time distributed embedded systems.


A novel two-phase approach

Phase IOptimal tradeoff between supply and threshold voltages

Phase IITrade off energy consumption and clock period

Return


Initializations

Allocate slack to reference task

Allocate slack to each other task

Invalidate this slack allocation

Yes

No

Extensible tasks exist?

Yes

No

EST+WCET>LFT?

Phase IIReference task:

highest energy_derivative

energy_derivative: higher than reference

level

Phase I

3C.3 - Energy Optimazation of Distributed Embedded Processors by Combined Data Compress ion and Functional Partitioning

Jinfeng Liu, Pai H. Chou ECE@UCI

Goal Energy minimization for distributed embedded

processors Combined optimization

Selection of optimal compression algorithm Functional partitioning

3C.3 - Energy Optimazation of Distributed Embedded Processors by Combined Data Compress ion and Functional Partitioning

DRECV2 SEND2PROC2

RECV1 SEND1PROC1

D

N1

N2

IDLE

IDLE

D

150MHz

150MHz

Non-optimal without compression

A bad partitioning scheme that produces extra I/O load,

without compression

RECV2

SEN

D2

PROC2

DE

CO

2

CO

MP2

D

RECV1 SEND1PROC1

DIDLE

DE

CO

1

CO

MP1

IDLE

N1

N2

80MHz

80MHzOptimal with compression

However, it could turn out optimal with compression, if the data from N1 to N2 can be compressed well.

3C.4 - Energy-Aware Fault Tolerance in Fixed-Priority Real-Time Embedded Systems

Ying Zhang, Krishnendu Chakrabarty, Vishnu Swaminathan ECE@Duke

Goal: low power, fault-tolerant real-time systems

Fault tolerance is achieved via checkpointing

Power management is carried out using dynamic voltage scaling (DVS).

7B.1 - A Game Theoretic Approach to Dynamic Energy Minimization in Wireless Transceivers

Ali Iranli, Hanif E. Fatemi, Massoud PedramEE@USC

A hierarchical formulation for energy optimization of wireless transceivers is proposed

A game theoretic approach to solve this energy minimization is proposed by which the energy consumption is reduced by 15% for BER = 10-5

The proposed hierarchical frame work can be used in general for energy optimization of server-client systems

7B.1 - A Game Theoretic Approach to Dynamic Energy Minimization in Wireless Transceivers

Transceiver Energy Optimization

Transmitter Leader

Receiver FollowerLeader’s

PolicyTransmit Power

& Modulation level

Overall energy consumption

Leader’scost function

Follower’sPolicy

Truncation length

Receiver's energy consumption

Follower’scost function

Stackelberg Game

7B.2 - Communication-Aware Task Scheduling and Voltage Selection for Total Systems Energy Minimization

Girish V. Varatkar, Radu MarculescuECE@CMU

Recent work in ES community: performance and energy are crucial!

Voltage selection Task scheduling algorithm should use the foresight that v

oltage selection is going to follow the scheduling step Schedule should provide the maximum slowing down pot

ential This work brings the communication aspect into th

e picture A ‘communication-centric’ approach A ‘voltage selection’ approach

7B.3 - LRU-SEQ: A Novel Replacement Policy for Transition Energy Reduction in Instruction Caches

Praveen G. Kalla, Xiaobo Sharon Hu, Joerg Henkel CSE@Notre Dame

LRU to LRU-SEQ (Sequential LRU) Constraining sequential fetches to the same bank (same

way) avoids bank transitions. It also increases the sleep time for the banks over-comin

g break-even time requirements. LRU nature has to be maintained, else associativity is los

t !! (hit-ratio is affected) Distance between the last fetched line and the present li

ne is a parameter that will affect the performance of this policy.

7B.3 - LRU-SEQ: A Novel Replacement Policy for Transition Energy Reduction in Instruction Caches

FOR (every cache access) DOIF (access == HIT) THEN

P_way = C_wayELSE

dist = abs(Curr_Addr, Prev_Addr);IF ( dist <= SEQ_DST) THEN

C_way = P_wayELSE

C_way = LRU_WayEND

ENDUpdate LRU state for access.

END

P_( ) : Previous_( )C_( ) : Current_( )

State Holder 1: P_way

(entire cache)

State Holder 2 : P_line

(each cache way)

7B.4 - Compiler-Based Register Name Adjustment for Low-Power Embedded Processors

Peter Petrov, Alex Orailoglu CSE@UCSD

Compiler-driven register name adjustment for low-power was proposed

Register names reassigned without incurring

any performance or power overhead No hardware support required whatsoever Efficient algorithm for Register Name Adjust

ment proposed with additional frequency skew enhancing phase

8C.1 - Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches

Nam S. Kim, David Blaauw, Trevor N. MudgeEECS@UMICH

Cost- effective # of VTH for cache leakage reduction depending on the target access time, but 1 or 2 high VT

H’s is enough for leakage reduction Cache leakage

another design constraint in processor design trade-off among delay / area / leakage Incorporating w/ realistic cache miss statistics for the lea

kage optimization


ITRS 2002 projections with doubling of # of transistors every two yearsITRS 2002 projections with doubling of # of transistors every two years

Using high-k dielectricUsing high-k dielectricreduces gate-oxide reduces gate-oxide leakageleakage


Abus buffer w/ repeater

VTH1

VTH2

dec

od

er

Dbus buffer w/ repeater

VTH4

VTH3

sense-amp w/ I/O circuits

memory cell

word-line

bit-line pair

cache sub-bank organizationcache sub-bank organization

Circuit model based on Circuit model based on CACTI CACTI

70nm Berkeley predictive 70nm Berkeley predictive technology modeltechnology model

Interconnect R/C annotatedInterconnect R/C annotated

repeaters used to minimize repeaters used to minimize interconnect delayinterconnect delay

8C.3 - Dynamic Platform Management for Configurable Platform-Based System-on-Chips

Krishna Sekar, Kanishka Lahiri, Sujit Dey ECE@UCSD

Described design techniques for dynamically customizing a general-purpose configurable platform

Dynamic platform management helps combine benefits of general-purpose & application-specific approaches

Benefits Improved application performance More efficient platform resource usage Improved energy efficiency


Improving performance, power, size

Imp

rovi

ng

fle

xib

ilit

y, t

ime-

to-m

arke

t, e

ng

g.

cost

, ti

me-

in m

arke

t,

General-Purpose Processors

General PurposeConfigurable

Platforms

Platform Customization

Techniques

Customized Platforms

Domain Specific Platforms

ASIC, Custom SoC


Embedded processor

General-purpose Configurable Platform

Programmable Voltage Regulator

Flexible on-chip SRAM

Programmable PLL

Re-configurable

Cache

Parameterized co-processor

PLD

On-chip communication architecture

Optimized Platform Configuration

Power Constraints

Processing Requirements Processing Requirements Processing Requirements

Dynamic Platform Management

Performance Objectives, Data Properties Application 1 Application 2 Application 3

Performance Objectives, Data Properties

Performance Objectives, Data Properties

Interconnect-Centric SoC Design

1A.2 - SAMBA-Bus: A High Performance Bus Architecture for System-on-Chips

Ruibing Lu, Cheng-Kok Koh ECE@Purdue

Single Arbitration, Multiple Bus Accesses Automatically delivers multiple bus transacti

ons High bandwidth

Bus transactions can be performed even without explicit bus access grant from the arbiter Communication latency increases only slightly ev

en with high arbitration latency

1A.2 - SAMBA-Bus: A High Performance Bus Architecture for System-on-Chips

Interface Unit

ForwardSub-bus

BackwardSub-bus

Two sub-buses

M1 M2 M3 M4

ForwardSub-bus

BackwardSub-bus

1A.3 - The Y-Architecture for On-Chip Interconnect: Analysis and Methodology

Hongyu Chen, Chung-Kuan Cheng, Andrew B. Kahng et.al. CSE@UCSD

The Y-architecture for on-chip interconnect is based on pervasive use of 0-, 120-, and 240-degree oriented semi-global and global wiring.

Communication capability (throughput of meshes) better than Manhattan architecture and X-architecture.

Better total wire length compared to both H and X clock tree structures and better path length compared to the H tree.

Achieve 8.5% less IR drop than an equally-resourced power network in Manhattan architecture.

1A.3 - The Y-Architecture for On-Chip Interconnect: Analysis and Methodology

7 x 7 meshes with different interconnect architectures.

(a) A 7 by 7 mesh using Y-architecture

(b) A 7 by 7 mesh using Manhattan-architecture

(c) A 7 by 7 mesh using X-architecture

Reliable Issue

3B.4 - Vectorless Analysis of Supply Noise Induced Delay Variation

Sanjay Pant, David Blaauw, Savithri SundareswaranUMICH, Motorola

Power Supply Integrity Issues Functional Failure

Voltage fluctuations inject noise in the circuit Performance Failure

Gate delay becoming increasing sensitive to supply voltage ±10% variation in supply can result in 30% delay increase

Proposed Approach Vectorless Conservative in estimating worst-case drop/delay increase Takes into account both IR and LdI/dt drops


Library Charac.

Input VectorsInput

Vectorsi/p

Vector

Search

Simulator

PowerGrid

PowerGrid

WorstVoltage

DropSTA

Worst-Case

Timing

Voltage Drop Estimation Worst Drop highly dependent on input vectors Slow simulation times allow only a few vectors to be tried

Worst-Case Voltage Budget Analysis Highly conservative

Worst-case drop is localized

Ignores voltage shifts between distant driver-receiver pairs


V(t)

Divide Chip Into Blocks

Compute Unit Pulse Response

Express Delay/Voltage Using

Spatial/Temporal Superposition

Formulate Delay/Voltage Max. As Linear Optimization

Gate Delay

Characterize

Gate Delay

Characterize

VDD

VDD

POWER GRID

GND

GND

GROUND GRID

Variables

i (t)V(t)

5B.2 - Fault-Tolerant Techniques for Ambient Intelligent Distributed Systems

Diana Marculescu ECE@CMU

Novel techniques for harnessing redundancy as a way for increasing fault-tolerance Assume a large number of networked devices Idle devices can act as surrogates for failing ones

via application migration or remapping Scheduling techniques for optimizing system

lifetime Determine optimal migration schedule, under

realistic battery models

8C.2 - Dynamic Fault-Tolerance and Metrics for Battery Powered, Failure-Prone Systems

Phillip Stanley-Marbell, Diana MarculescuECE@CMU

Introduce the concept of adaptive fault-tolerance management for failure-prone systems, and a classification of local algorithms for achieving system-wide reliability.

Performance Optimization

5B.1 - Cache Optimization For Embedded Processor Cores: An Analytical Approach

Arijit Ghosh, Tony Givargis CS@UCI

An efficient algorithm to directly compute cache parameters satisfying desired performance criteria.

5B.3 - Performance Efficiency of Context-Flow System-On-Chip Platform

Rami Beidas, Jianwen Zhu ECE@Toronto

A new programming model, called context-flow, that is simple, safe, highly parallelizable yet transparent to the underlying architectural details.

Simulation at the Nanometer Scale

7A.1 - A Probabilistic-Based Design Methodology for Nano-Scale Computation

Iris Bahar, Joseph Mundy, Jie Chen Brown

Based on Markov random fields Propose a new architectural framework designed t

o handle faulty processes prevalent with nanoscale devices

Dynamically defect tolerant Adapts to errors as a natural consequence of probability

maximization Removes need to actually detect faults

Can handle both structure- and signal-based faults


Carbon Nanotubes (CNTs) Excellent conductors Diodes, FETs, and memory a

rrays using CNTs have been demonstrated

Physical placement of CNTs is an issue

Alumina substrates have been proposed to fabricate arrays of CNTs

Off Junction On Junction

Carbon Nanotubes


Molecular devices Direct use of molecules

and their electronic states

Conduction achieved by changes in physical configuration or electronic state

Diodes and memory have been demonstrated

switch on

additional electron


Quantum Cellular Automata (QCA) Based on local interaction of

quantum dots arranged in cells

Logic function is encoded into spatial patterns of the cells.

Information is propagates through chains of QCA devices

7A.2 - Modeling of Ballistic Carbon Nanotube Field Effect Transistors for Efficient Circuit Simulation

Arijit Raychowdhury, Saibal Mukhopadhyay, Kaushik Roy ECE@Purdue

Circuit/SPICE level model for Ballistic CNFETs

Removes self-consistent solutions of Poisson’s and Schrödinger's Equations

Proposed model closely replicates the self consistent numerical simulations

The model has been used to simulate simple adders/multipliers


Carbon nanotubes are graphite sheets rolled in the form of tubes. They act as channel material for FETs.

Source: IBM


D

Top Gate

ZrO2

S

Bottom Gate

Schottky barrier

Intrinsic CNT

b=Eg/2

Band Diagram

n+

Top Gate

ZrO2

Bottom Gate

n+

Intrinsic CNT

n+


• Performance of CNFETs can be evaluated only through circuit simulations

• SPICE compatible compact modeling is essential for circuit simulations

7A.3 - Circuit Simulation of Nanotechnology Devices with Non-Monotonic I-V Characteristics

Jiayong Le, Larry Pileggi, Anirudh DevganECE@CMU

Describes a circuit level simulator that can accommodate an important class of nanotechnology devices that are characterized by nonmonotonic I-V characteristics.

Other Areas in ICCAD Placement, Routing, and Floorplanning Analog design and Methodology Verification

Formal Verification Dynamic Verification

Timing Analysis Delay and Signal Modeling Statistical Static Timing Retiming for Global Interconnects

Other Areas in ICCAD (Cont’d) CAD Algorithms for Emerging

Technologies Reversible Logic Synthesis DNA Probe Array Layout MEMS

Design for Customized Processors Synthesis Testing

Documents

ICCAD ’ 03 Review CSE 597B Lin Li. Outline Overview Archive download URL Best paper award Paper from our group Interesting tutorial Paper in related areas