Upload
annabel-hicks
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
ICCAD’03 Review
CSE 597B
Lin Li
Outline Overview
Archive download URL Best paper award Paper from our group Interesting tutorial
Paper in related areas Power and energy optimization Interconnect-centric SoC design Reliable issue Performance optimization Simulation at the nanometer scale
Other areas in ICCAD
Archive Download URL
Papers and presentation slides can be downloaded from:
http://www.iccad.com/archive.html
Best Paper Award 6C.1 - Noise Analysis for Optical Fiber Com
munication Systems Alper Demir KOC University, Sariyer-Istanbul, Turkey
8B.1 - Block-Based Static Timing Analysis with Uncertainty Anirudh Devgan, Chandramouli Kashyap IBM Research at Austin, IBM Microelectronics
Paper from Our Group 1A.1 - Adaptive Error Protection for Energy
Efficiency Lin Li, N. Vijaykrishnan, Mahmut Kandemir, Mar
y Jane Irwin
3C.1 - Array Composition and Decomposition for Optimizing Embedded Applications Guilin Chen, Mahmut Kandemir, Ugur Sezer, Ava
nti Nadgir
Interesting Tutorial 2C.1 - Design and CAD Challenges in sub-90
nm CMOS Technology Kerry Bernstein, Ching-Te Chuang, Rajiv V. Joshi,
Ruchir Puri IBM T.J. Watson
11B.1 - Formal Methods for Dynamic Power Mangement Rajesh K. Gupta, Sandeep Shukla, Sandy Irani UCSD, UCI, and VT
2C.1 - Design and CAD Challenges in sub-90nm CMOS Technology
Introduction CMOS device scaling New devices for high-performance logic
Planar device structures Partially-depleted (PD) SOI Fully-depleted (FD) SOI Strained-Si & high-k gate
Emerging technologies Double-gate MOSFETs 3D integration and interconnects Carbon Nanotube Transistor (CNT) Molecular computing
CAD challenges Challenges of Advanced device technologies Major issues
Power crisis Coping with Variability
2C.1 - Design and CAD Challenges in sub-90nm CMOS Technology (Cont’d)
11B.1 - Formal Methods for Dynamic Power Mangement
Overview the formal methods that have been explored in solving the system-level Dynamic Power Management (DPM) problem.
Show how formal reasoning frameworks can unify apparently disparate DPM techniques.
Approaches that treat the DPM problem as one of stochastic optimization with probabilistic guarantees on performance.
Power and Energy Optimization
Using dynamic voltage scaling in embedded systems (Section 1B)
Using software techniques in embedded systems (Section 3C)
Energy issues in systems design (Section 7B)
Power-aware design (Section 8C)
1B.1 - Generalized Network Flow Techniquesfor Dynamic Voltage Scaling in Hard Real-Time Systems
Vishnu Swaminathan, Krishnendu Chakrabarty ECE@Duke
Energy consumption must be carefully balanced with real-time responsiveness in hard real-time systems.
Present an optimal offline dynamic voltage scaling (DVS) scheme for dynamic power management in such systems.
lij, uij, Cij, ij
i j
j1
jn
s1h
sni
s1l
snh
1
2
2n-1
2n-2
Jobs Speeds Intervals
st
c 1h,c 1h
,V h2 ,1
0,1,Vl2c
1l-Vh2c
1h,c1l-c1h
0, ,0,1
0, ,0,1
0, 1 ,0,1
0, 2 ,0,1
c 1h,c 1h
+1,
V h2
,1
.
.
.
.
s1i
snl
0,1,Vi
2c1i -V
h 2c1h ,c
1i -c1h .
.
.
.
Generalized Network Flow Models for the DVS problem
1B.2 - Approaching the Maximum Energy Saving on Embedded Systems with Multiple Voltages
Shaoxiong Hua, Gang Qu ECE@UMCP
For a multiple-voltage DVS system to serve a set of applications {(ei, di, pi): i=1, 2, …, n} without missing their deadlines, if the system has m voltages {v1, v2,… ,vm}, determ
ine the value of each vi to minimize the energy consumption.
determine m and the value of each vi .
1B.2 - Approaching the Maximum Energy Saving on Embedded Systems with Multiple Voltages (Cont’d)
Voltage set-up is the fundamental problem for multiple-voltage DVS system. application-specific 2-voltage DVS system: analytic solutions and
a linear search algorithm m-voltage DVS system: analytic solution does
not exist, an approximation method Multiple-voltage can be very close to the
maximal energy saving by DVS.
1B.3 - Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Heterogeneous Distributed Real-Time Embedded Systems
Le Yan, Jiong Luo, Niraj K. JhaEE@Princeton
New scheduling algorithm that combines DVS and adaptive body biasing (ABB) to simultaneously optimize both dynamic power consumption and leakage power consumption for real-time distributed embedded systems.
1B.3 - Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Heterogeneous Distributed Real-Time Embedded Systems
A novel two-phase approach
Phase IOptimal tradeoff between supply and threshold voltages
Phase IITrade off energy consumption and clock period
Return
1B.3 - Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Heterogeneous Distributed Real-Time Embedded Systems
Initializations
Allocate slack to reference task
Allocate slack to each other task
Invalidate this slack allocation
Yes
No
Extensible tasks exist?
Yes
No
EST+WCET>LFT?
Phase IIReference task:
highest energy_derivative
energy_derivative: higher than reference
level
Phase I
3C.3 - Energy Optimazation of Distributed Embedded Processors by Combined Data Compress ion and Functional Partitioning
Jinfeng Liu, Pai H. Chou ECE@UCI
Goal Energy minimization for distributed embedded
processors Combined optimization
Selection of optimal compression algorithm Functional partitioning
3C.3 - Energy Optimazation of Distributed Embedded Processors by Combined Data Compress ion and Functional Partitioning
DRECV2 SEND2PROC2
RECV1 SEND1PROC1
D
N1
N2
IDLE
IDLE
D
150MHz
150MHz
Non-optimal without compression
A bad partitioning scheme that produces extra I/O load,
without compression
RECV2
SEN
D2
PROC2
DE
CO
2
CO
MP2
D
RECV1 SEND1PROC1
DIDLE
DE
CO
1
CO
MP1
IDLE
N1
N2
80MHz
80MHzOptimal with compression
However, it could turn out optimal with compression, if the data from N1 to N2 can be compressed well.
3C.4 - Energy-Aware Fault Tolerance in Fixed-Priority Real-Time Embedded Systems
Ying Zhang, Krishnendu Chakrabarty, Vishnu Swaminathan ECE@Duke
Goal: low power, fault-tolerant real-time systems
Fault tolerance is achieved via checkpointing
Power management is carried out using dynamic voltage scaling (DVS).
7B.1 - A Game Theoretic Approach to Dynamic Energy Minimization in Wireless Transceivers
Ali Iranli, Hanif E. Fatemi, Massoud PedramEE@USC
A hierarchical formulation for energy optimization of wireless transceivers is proposed
A game theoretic approach to solve this energy minimization is proposed by which the energy consumption is reduced by 15% for BER = 10-5
The proposed hierarchical frame work can be used in general for energy optimization of server-client systems
7B.1 - A Game Theoretic Approach to Dynamic Energy Minimization in Wireless Transceivers
Transceiver Energy Optimization
Transmitter Leader
Receiver FollowerLeader’s
PolicyTransmit Power
& Modulation level
Overall energy consumption
Leader’scost function
Follower’sPolicy
Truncation length
Receiver's energy consumption
Follower’scost function
Stackelberg Game
7B.2 - Communication-Aware Task Scheduling and Voltage Selection for Total Systems Energy Minimization
Girish V. Varatkar, Radu MarculescuECE@CMU
Recent work in ES community: performance and energy are crucial!
Voltage selection Task scheduling algorithm should use the foresight that v
oltage selection is going to follow the scheduling step Schedule should provide the maximum slowing down pot
ential This work brings the communication aspect into th
e picture A ‘communication-centric’ approach A ‘voltage selection’ approach
7B.3 - LRU-SEQ: A Novel Replacement Policy for Transition Energy Reduction in Instruction Caches
Praveen G. Kalla, Xiaobo Sharon Hu, Joerg Henkel CSE@Notre Dame
LRU to LRU-SEQ (Sequential LRU) Constraining sequential fetches to the same bank (same
way) avoids bank transitions. It also increases the sleep time for the banks over-comin
g break-even time requirements. LRU nature has to be maintained, else associativity is los
t !! (hit-ratio is affected) Distance between the last fetched line and the present li
ne is a parameter that will affect the performance of this policy.
7B.3 - LRU-SEQ: A Novel Replacement Policy for Transition Energy Reduction in Instruction Caches
FOR (every cache access) DOIF (access == HIT) THEN
P_way = C_wayELSE
dist = abs(Curr_Addr, Prev_Addr);IF ( dist <= SEQ_DST) THEN
C_way = P_wayELSE
C_way = LRU_WayEND
ENDUpdate LRU state for access.
END
P_( ) : Previous_( )C_( ) : Current_( )
State Holder 1: P_way
(entire cache)
State Holder 2 : P_line
(each cache way)
7B.4 - Compiler-Based Register Name Adjustment for Low-Power Embedded Processors
Peter Petrov, Alex Orailoglu CSE@UCSD
Compiler-driven register name adjustment for low-power was proposed
Register names reassigned without incurring
any performance or power overhead No hardware support required whatsoever Efficient algorithm for Register Name Adjust
ment proposed with additional frequency skew enhancing phase
8C.1 - Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches
Nam S. Kim, David Blaauw, Trevor N. MudgeEECS@UMICH
Cost- effective # of VTH for cache leakage reduction depending on the target access time, but 1 or 2 high VT
H’s is enough for leakage reduction Cache leakage
another design constraint in processor design trade-off among delay / area / leakage Incorporating w/ realistic cache miss statistics for the lea
kage optimization
8C.1 - Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches
ITRS 2002 projections with doubling of # of transistors every two yearsITRS 2002 projections with doubling of # of transistors every two years
Using high-k dielectricUsing high-k dielectricreduces gate-oxide reduces gate-oxide leakageleakage
8C.1 - Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches
Abus buffer w/ repeater
VTH1
VTH2
dec
od
er
Dbus buffer w/ repeater
VTH4
VTH3
sense-amp w/ I/O circuits
memory cell
word-line
bit-line pair
cache sub-bank organizationcache sub-bank organization
Circuit model based on Circuit model based on CACTI CACTI
70nm Berkeley predictive 70nm Berkeley predictive technology modeltechnology model
Interconnect R/C annotatedInterconnect R/C annotated
repeaters used to minimize repeaters used to minimize interconnect delayinterconnect delay
8C.3 - Dynamic Platform Management for Configurable Platform-Based System-on-Chips
Krishna Sekar, Kanishka Lahiri, Sujit Dey ECE@UCSD
Described design techniques for dynamically customizing a general-purpose configurable platform
Dynamic platform management helps combine benefits of general-purpose & application-specific approaches
Benefits Improved application performance More efficient platform resource usage Improved energy efficiency
8C.3 - Dynamic Platform Management for Configurable Platform-Based System-on-Chips
Improving performance, power, size
Imp
rovi
ng
fle
xib
ilit
y, t
ime-
to-m
arke
t, e
ng
g.
cost
, ti
me-
in m
arke
t,
General-Purpose Processors
General PurposeConfigurable
Platforms
Platform Customization
Techniques
Customized Platforms
Domain Specific Platforms
ASIC, Custom SoC
8C.3 - Dynamic Platform Management for Configurable Platform-Based System-on-Chips
Embedded processor
General-purpose Configurable Platform
Programmable Voltage Regulator
Flexible on-chip SRAM
Programmable PLL
Re-configurable
Cache
Parameterized co-processor
PLD
On-chip communication architecture
Optimized Platform Configuration
Power Constraints
Processing Requirements Processing Requirements Processing Requirements
Dynamic Platform Management
Performance Objectives, Data Properties Application 1 Application 2 Application 3
Performance Objectives, Data Properties
Performance Objectives, Data Properties
Interconnect-Centric SoC Design
1A.2 - SAMBA-Bus: A High Performance Bus Architecture for System-on-Chips
Ruibing Lu, Cheng-Kok Koh ECE@Purdue
Single Arbitration, Multiple Bus Accesses Automatically delivers multiple bus transacti
ons High bandwidth
Bus transactions can be performed even without explicit bus access grant from the arbiter Communication latency increases only slightly ev
en with high arbitration latency
1A.2 - SAMBA-Bus: A High Performance Bus Architecture for System-on-Chips
Interface Unit
ForwardSub-bus
BackwardSub-bus
Two sub-buses
M1 M2 M3 M4
ForwardSub-bus
BackwardSub-bus
1A.3 - The Y-Architecture for On-Chip Interconnect: Analysis and Methodology
Hongyu Chen, Chung-Kuan Cheng, Andrew B. Kahng et.al. CSE@UCSD
The Y-architecture for on-chip interconnect is based on pervasive use of 0-, 120-, and 240-degree oriented semi-global and global wiring.
Communication capability (throughput of meshes) better than Manhattan architecture and X-architecture.
Better total wire length compared to both H and X clock tree structures and better path length compared to the H tree.
Achieve 8.5% less IR drop than an equally-resourced power network in Manhattan architecture.
1A.3 - The Y-Architecture for On-Chip Interconnect: Analysis and Methodology
7 x 7 meshes with different interconnect architectures.
(a) A 7 by 7 mesh using Y-architecture
(b) A 7 by 7 mesh using Manhattan-architecture
(c) A 7 by 7 mesh using X-architecture
Reliable Issue
3B.4 - Vectorless Analysis of Supply Noise Induced Delay Variation
Sanjay Pant, David Blaauw, Savithri SundareswaranUMICH, Motorola
Power Supply Integrity Issues Functional Failure
Voltage fluctuations inject noise in the circuit Performance Failure
Gate delay becoming increasing sensitive to supply voltage ±10% variation in supply can result in 30% delay increase
Proposed Approach Vectorless Conservative in estimating worst-case drop/delay increase Takes into account both IR and LdI/dt drops
3B.4 - Vectorless Analysis of Supply Noise Induced Delay Variation
Library Charac.
Input VectorsInput
Vectorsi/p
Vector
Search
Simulator
PowerGrid
PowerGrid
WorstVoltage
DropSTA
Worst-Case
Timing
Voltage Drop Estimation Worst Drop highly dependent on input vectors Slow simulation times allow only a few vectors to be tried
Worst-Case Voltage Budget Analysis Highly conservative
Worst-case drop is localized
Ignores voltage shifts between distant driver-receiver pairs
3B.4 - Vectorless Analysis of Supply Noise Induced Delay Variation
V(t)
Divide Chip Into Blocks
Compute Unit Pulse Response
Express Delay/Voltage Using
Spatial/Temporal Superposition
Formulate Delay/Voltage Max. As Linear Optimization
Gate Delay
Characterize
Gate Delay
Characterize
VDD
VDD
POWER GRID
GND
GND
GROUND GRID
Variables
i (t)V(t)
5B.2 - Fault-Tolerant Techniques for Ambient Intelligent Distributed Systems
Diana Marculescu ECE@CMU
Novel techniques for harnessing redundancy as a way for increasing fault-tolerance Assume a large number of networked devices Idle devices can act as surrogates for failing ones
via application migration or remapping Scheduling techniques for optimizing system
lifetime Determine optimal migration schedule, under
realistic battery models
8C.2 - Dynamic Fault-Tolerance and Metrics for Battery Powered, Failure-Prone Systems
Phillip Stanley-Marbell, Diana MarculescuECE@CMU
Introduce the concept of adaptive fault-tolerance management for failure-prone systems, and a classification of local algorithms for achieving system-wide reliability.
Performance Optimization
5B.1 - Cache Optimization For Embedded Processor Cores: An Analytical Approach
Arijit Ghosh, Tony Givargis CS@UCI
An efficient algorithm to directly compute cache parameters satisfying desired performance criteria.
5B.3 - Performance Efficiency of Context-Flow System-On-Chip Platform
Rami Beidas, Jianwen Zhu ECE@Toronto
A new programming model, called context-flow, that is simple, safe, highly parallelizable yet transparent to the underlying architectural details.
Simulation at the Nanometer Scale
7A.1 - A Probabilistic-Based Design Methodology for Nano-Scale Computation
Iris Bahar, Joseph Mundy, Jie Chen Brown
Based on Markov random fields Propose a new architectural framework designed t
o handle faulty processes prevalent with nanoscale devices
Dynamically defect tolerant Adapts to errors as a natural consequence of probability
maximization Removes need to actually detect faults
Can handle both structure- and signal-based faults
7A.1 - A Probabilistic-Based Design Methodology for Nano-Scale Computation
Carbon Nanotubes (CNTs) Excellent conductors Diodes, FETs, and memory a
rrays using CNTs have been demonstrated
Physical placement of CNTs is an issue
Alumina substrates have been proposed to fabricate arrays of CNTs
Off Junction On Junction
Carbon Nanotubes
7A.1 - A Probabilistic-Based Design Methodology for Nano-Scale Computation
Molecular devices Direct use of molecules
and their electronic states
Conduction achieved by changes in physical configuration or electronic state
Diodes and memory have been demonstrated
switch on
additional electron
7A.1 - A Probabilistic-Based Design Methodology for Nano-Scale Computation
Quantum Cellular Automata (QCA) Based on local interaction of
quantum dots arranged in cells
Logic function is encoded into spatial patterns of the cells.
Information is propagates through chains of QCA devices
7A.2 - Modeling of Ballistic Carbon Nanotube Field Effect Transistors for Efficient Circuit Simulation
Arijit Raychowdhury, Saibal Mukhopadhyay, Kaushik Roy ECE@Purdue
Circuit/SPICE level model for Ballistic CNFETs
Removes self-consistent solutions of Poisson’s and Schrödinger's Equations
Proposed model closely replicates the self consistent numerical simulations
The model has been used to simulate simple adders/multipliers
7A.2 - Modeling of Ballistic Carbon Nanotube Field Effect Transistors for Efficient Circuit Simulation
Carbon nanotubes are graphite sheets rolled in the form of tubes. They act as channel material for FETs.
Source: IBM
7A.2 - Modeling of Ballistic Carbon Nanotube Field Effect Transistors for Efficient Circuit Simulation
D
Top Gate
ZrO2
S
Bottom Gate
Schottky barrier
Intrinsic CNT
b=Eg/2
Band Diagram
n+
Top Gate
ZrO2
Bottom Gate
n+
Intrinsic CNT
n+
7A.2 - Modeling of Ballistic Carbon Nanotube Field Effect Transistors for Efficient Circuit Simulation
• Performance of CNFETs can be evaluated only through circuit simulations
• SPICE compatible compact modeling is essential for circuit simulations
7A.3 - Circuit Simulation of Nanotechnology Devices with Non-Monotonic I-V Characteristics
Jiayong Le, Larry Pileggi, Anirudh DevganECE@CMU
Describes a circuit level simulator that can accommodate an important class of nanotechnology devices that are characterized by nonmonotonic I-V characteristics.
Other Areas in ICCAD Placement, Routing, and Floorplanning Analog design and Methodology Verification
Formal Verification Dynamic Verification
Timing Analysis Delay and Signal Modeling Statistical Static Timing Retiming for Global Interconnects
Other Areas in ICCAD (Cont’d) CAD Algorithms for Emerging
Technologies Reversible Logic Synthesis DNA Probe Array Layout MEMS
Design for Customized Processors Synthesis Testing