24
Giga-Scale System-On-A-Chip International Center on System-on-a- Chip (ICSOC) Jason Cong University of California, Los Angeles Tel: 310-206-2775, Email: [email protected] (Other participants are listed inside)

Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

  • Upload
    anka

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC). Jason Cong University of California, Los Angeles Tel: 310-206-2775, Email: [email protected] (Other participants are listed inside). Project Summary. - PowerPoint PPT Presentation

Citation preview

Page 1: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Giga-Scale System-On-A-ChipInternational Center on System-on-a-Chip (ICSOC)

Jason CongUniversity of California, Los Angeles

Tel: 310-206-2775, Email: [email protected]

(Other participants are listed inside)

Page 2: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 2

Project Summary

• Develop new design methodology to enable efficient giga-scale integration for system-on-a-chip (SOC) designs

• Project includes three major components

– SOC synthesis tools and methodologies

– SOC verification, test, and diagnosis

– SOC design driver – network processor

Page 3: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 3

Research Team by Institutions

US UCLA: Jason Cong UC Santa Barbara: Tim Cheng

Taiwan NTHU: Shi-Yu Huang, Tingting Hwang, J. K. Lee, Youn-Long Lin,

C. L. Liu, Cheng-Wen Wu, Allen Wu NCTU: Jing-Yang Jou

China Tsinghua Univ.: Jinian Bian, Xianlong Hong, Zeyi Wang, Hongxi

Xue Peking Univ.: Xu Cheng Zhejiang Univ.: Xiaolang Yan

Page 4: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 4

Current Research Team US

UCLA: Jason Cong UC Santa Barbara: Tim Cheng

Taiwan NTHU: Shi-Yu Huang, Tingting Hwang, J. K. Lee, Youn-Long Lin, C. L.

Liu, Cheng-Wen Wu, Allen Wu NCTU: Jing-Yang Jou

China Tsinghua Univ.: Jinian Bian, Xianlong Hong, Zeyi Wang, Hongxi Xue Peking Univ.: Xu Cheng Zhejiang Univ.: Xiaolang Yan

Several new faculty members in the 7 institutions Guest members from National University of Singapore, Purdue

Univ., and UCLA (EE Dept)

Page 5: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 5

Thrust 1 -- SOC Synthesis Environment/Methodology(Led by Jason Cong)

Code Generation for Retargetable Compiler

and Assembler Generator

Design SpecVHDL/C

VHDL/CCo-Simulation

Design Partitioning

DSP Synthesis and Optimization

FPGA Synthesis and Technology Mapping

ASIC Synthesis

Interconnect-Driven High-level Synthesis

Synthesis for IP Reuse

Physical Synthesis for Full-Chip

Assembly

Embedded Processors DSPs Embedded

FPGAsCustomized

Logic

Page 6: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 611.4 22.8 28.30

1 clock

2 clock

3 clock

4 clock

5 clock ITRS’01 0.07um Tech 5.63 G Hz across-chip clock 800 mm2 (28.3mm x 28.3mm) IPEM BIWS estimations

Buffer size: 100x Driver/receiver size: 100x

On semi-global layer (tier 3) : Can travel up to 11.4 mm in

one cycle Need 5 clock cycles from

corner to corner

Interconnect Bottleneck in Nanometer Designs

2nd challenge: Single-cycle full chip synchronization is no longer possible2nd challenge: Single-cycle full chip synchronization is no longer possible

Not supported by the current CAD toolsetNot supported by the current CAD toolset

About to happen soonAbout to happen soon

Page 7: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 7

Regular Distributed Register Architecture (2)

Global Interconnect

LCC

Reg. file

LCC

Reg. file

LCC

Reg. file

LCC

Reg. file

LCC

Reg. file

LCC

Reg. file

FSMFSM

FSMFSM

FSMFSM

LocalComputationalCluster (LCC)

….

Register File

Wi

H i

Island

FSM

ADD

MUXMUL

Cluster with area constraint

Use register banks: Registers in each island are partitioned to k banks for 1 cycle, 2 cycle, … k

cycle interconnect communication in each island Highly regular

1 cycle

2 cycle

k cycle

Page 8: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 8

MCAS: Placement-Driven Architectural Synthesis Using RDR Architecture

Register and port binding

Datapath & FSM generation

Floorplan constraints

RTL VHDL files

Multi-cycle path constraints

CDFG

C / VHDL

CDFG generation

+ 2

* 3 * 4

- 6- 5

* 7 * 8

- 9 * 11 * 12

- 10

- 1

RD

R A

rch. Spec.

Target clock period

Resource allocation

Resource constraints

- +

* *

- -

* *

- *

-

* Interconnected Component Graph (ICG)

Functional unit binding

Mult1 Alu2

Mult2 Alu1

Interconnected Component Graph (ICG)

Location information

Scheduling-driven placement

Reg. file

Reg. file

…Alu1

1,5,10

…Reg. file

Reg. file

…Mul23,7,12

…Alu22,6,9

Mul14,8,11

Placement-driven rebinding & scheduling

Cycle1

Cycle2

Cycle3

Cycle4

Cycle5

Cycle6

Cycle7

*

*

*

+-

*

--

*

-

*

-

Reg. file

Reg. file

…Alu1

1,5,10

…Reg. file

Reg. file

…Mul23,7,11

…Alu22,6,9

Mul14,8,12

Page 9: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 9

Experimental Results (3)

Synopsys Behavioral Compiler setting: default (optimizing latency) Average latency ratio of MCAS vs. BC: 69%

MCAS basic flow vs. Synopsys’ Behavioral Compiler (on Virtex-II)

Latency Resource

Design Flow Cylces Reg ALU MULT fmax (MHz) LUTs Latency (ns) MCAS vs. BCSynopsys BC 25 28 5 8 95.87 877 260.78

MCAS 27 34 6 2 86.07 1477 313.69 120.29%Synopsys BC 29 36 7 8 63.02 1143 460.17

MCAS 14 35 5 8 140.31 1523 99.78 21.68%Synopsys BC 43 142 23 7 51.09 3256 841.60

MCAS 34 35 6 3 53.59 2561 634.44 75.39%Synopsys BC 29 44 8 14 52.13 2112 556.31

MCAS 23 42 6 8 71.95 2606 319.65 57.46%

pr

wang

mcm

honda

0

500

1000

1500

2000

2500

3000

3500

pr wang mcm honda

Synopsys BCMCAS

0. 00

100. 00

200. 00

300. 00

400. 00

500. 00

600. 00

700. 00

800. 00

900. 00

pr wang mcm honda

Synopsys BCMCAS

Page 10: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 10

Optimality Study of Large-Scale Circuit Placement

• Construction of Placement Example with Known Optimal (PEKO) [C. Chang et al, 2003]

? Construct instances with known

optimal using the characteristic of the original problem

First quantitative evaluation of the optimality of circuit placement problem

Existing placement algorithms can be 70% to 150% away from the optimal

Page 11: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 11

High Interest in the Community• Two EE Times articles coverage

– Placement tools criticized for hampering IC designs [Feb’03]

– IC placement benchmarks needed, researchers say [April’03]

• More than 60 downloads from our website– Cadence, IBM, Intel, Magma, Mentor

Graphics, Synopsys, etc

– CMU, SUNY, UCB, UCSB, UCSD,

UIC, UMichgan, UWaterloo, etc

• Used in every placement since its publication

http://ballade.cs.ucla.edu/~pubbench

Page 12: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 12

1. Synthesis & Verification Hardware/Software Partition: Hardware/Software Partition:

Propose a SSS based H/S partition algorithm (ASICON2003) Propose a SSS based H/S partition algorithm (ASICON2003) better solution than SA and less runtime than Tabu

High-level Synthesis: High-level Synthesis: Re-synthesis algorithm after floorplanning for timing optimization Re-synthesis algorithm after floorplanning for timing optimization

(ASICON2003)(ASICON2003) Based on initial scheduling do floorplanning After floorplanning do re-scheduling and re-allocation by force-

balance method

Controller Synthesis: A Heuristic State Minimization Algorithm For Incompletely Specified

Finite State Machine (ASICON2003, JCST)

Page 13: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 13

2. Floorplanning & Interconnect Planning Based on proposed Corner Block List (CBL) representation propose several

Extended Corner Block List, ECBL, CCBL and SUB-CBL to speed up floorplanning and handle more complicate L/T shaped and rectilinear shaped blocks.

Propose floorplanning algorithms with some geometric constraints, such as boundary, abutment, L/T shaped blocks.

Propose integrated floorplanning and buffer planning algorithms with consideration of congestion .

Using research results from UCLA on interconnect planning About 30 papers published in DAC, ICCAD, ISPD, ASPDAC, ISCAS and

Transactions.

Page 14: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 14

3. P/G Network Analysis & Optimization Propose an Area Minimization of Power Distribution Network Using Efficient

Nonlinear Programming Techniques (ICCAD2001, accepted by IEEE Trans. On CAD)

Propose a decoupling capacitance optimization algorithm for Robust On-Chip Power Delivery (ASPDAC2004, ASICON2003)

4. Global Routing & Special Routing Propose several congestion, timing, and both timing

and congestion optimization global routing algorithms Papers were published in ASPDAC, ISCAS, and IEEE

Transactions.

Page 15: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 15

5. Parasitic R/L/C Etraction

3-D R/C Extraction using Boundary Element Method (BEM) Quasi-Multiple Medium (QMM) BEM algorithms

Hierarchical Block BEM (HBBEM) technique Fast 3-D Inductance Extraction (FIE) Papers were published in ASPDAC, ASICON and IEEE Transaction on

MTT

Page 16: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 16

Thrust 2 -- SOC Verification, Test, and Diagnosis(Led by Tim Cheng)

Verification and Testing

Enabling techniques for semi-formal functional verification

Integrated framework for simulation, vector generation and model checking

Testing and diagnosisfor heterogeneous SOC

Self-testing using on-chip programmable

components

Self-testing for on-chip analog/mixed-signal

components

New test techniques for deep-submicron embedded

memories

Scalable constraint-solving techniques

Automatic/semi-automatic functional vector generation

from HDL code

Page 17: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Tim Cheng 17

Key Results - Verification

• Developed and released ATPG-based SAT solvers for circuits (Univ. of California, Santa Barbara)– Integrating structural ATPG and SAT techniques with new conflict learning– CSAT: Fast combinational solver (released on March 2003)

• Demonstrated 10-100X speedup over state-of-the-art SAT solvers on industrial test cases (reported by Intel and Calypto)

• Has been integrated into Intel’s FV verification system and a startup’s verification engine

• Publications: DATE2003 and DAC2003– Satori2: Fast sequential solver (released on Dec. 2003)

• Demonstrated 10X-200X speedup over a commercial, sequential ATPG engine on public benchmark circuits

• Publications: ICCAD2003, HLDVT2003 and ASPDAC2004

Page 18: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Tim Cheng 18

Key Results - Testing

A new Statistical Delay Testing and Diagnosis framework consisting of five major components (UCSB):

Defect Injection &Simulation

Statistical Timing Analysis Framework(Cell-based characterization)

Static Timing Analysis Dynamic Timing Simulator

Path Filtering

Critical Path Selection

DiagnosisATPG/Pattern Selection

• Selection/Generation of high quality tests for target paths Selection/Generation of high quality tests for target paths [ITC’01][DATE 2004][ITC’01][DATE 2004] Identifying tests that activate longer delay along the target pathIdentifying tests that activate longer delay along the target path

• Delay fault diagnosis based on statistical timing model Delay fault diagnosis based on statistical timing model [DATE’03, VTS’03, DAC’03][DATE’03, VTS’03, DAC’03] Ref: Krstic, Wang, Cheng,& Abadir, DATE’03–Best Paper Award in TestRef: Krstic, Wang, Cheng,& Abadir, DATE’03–Best Paper Award in Test

• Statistical timing analysis• Statistical critical path selection [DAC’02,ICCAD’02]

Selecting statistical long & true paths whose tests maximize detection of parametric failures

• Path coverage metric [ASPDAC’03] Estimating the quality of a path set

Page 19: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Tim Cheng 19

Key Results - Testing

• On-Chip Jitter Extraction for Bit-Error-Rate (BER) Testing of Multi-GHz Signal (UCSB)– Using on-chip, single-shot measurement unit to sample signal

periods for spectral analysis– Demonstrated, through simulation, accurate extraction of

multiple sinusoids and random jitter components for a 3GHz signal

– Publications: ASPDAC2004 and DATE2004

Page 20: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 20

Thrust 3 – Design Driver: Network Security Processor (Led by Prof. C. W. Wu)

• Applications: IPSec, SSL, VPN, etc.• Functionalities:

– Public key: RSA, ECC– Secret key: AES– Hashing (Message authentication): HMAC (SHA-1/MD5)– Truly random number generator (FIPS 140-1,140-2 compliant)

• Target technology: 0.18m or below• Clock rate: 200MHz or higher (internal)• 32-bit data and instruction word• 10Gbps (OC192)• Power: 1 to 10mW/MHz at 3V (LP to HP)• Die size: 50mm2

• On-chip bus: AMBA (Advanced Microcontroller Bus Architecture)

Page 21: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 21

Encryption Modules (PKEM)

• Public key encryption module– Operations:

• 32-bit word-based modular multiplication• Multiplication over GF(p) and GF(2m)

• An RSA cryptography engine with small area overhead and high speed• Scalable word-width• TSMC 0.35μm• 34K gates (1.7×1.8 mm2 )• 100MHz clock• Scalable key length• Throughput

– 512-bit key: 1.79Kbps/MHz– 1024-bit key: 470bps/MHz

Page 22: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 22

Encryption Modules (SKEM)

• Secret key encryption module– Operations:

• Matrix operations, manipulation

• AES cryptography• 32-bit external interface• 58K gates• Over 200MHz clock• Throughput: 2Gbps• Support key length of

128/192/256 bits

Technology TSMC 0.25m CMOS

Package 128CQFP

Core Size 1,279 x 1,271 m2

Gate Count 63.4K

Max. Freq. 250MHz

Throughput

2.977 Gbps (128-bit key)

2.510 Gbps (196-bit key)

2.169 Gbps (256-bit key)

Page 23: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 23

Journal Publications

• C.-T. Huang and C.-W. Wu, ``High-speed easily testable Galois-field inverter'', IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing, vol. 47, no. 9, pp. 909-918, Sept. 2000.

• S.-A. Hwang and C.-W. Wu, ``Unified VLSI systolic array design for LZ data compression'', IEEE Trans. VLSI Systems, vol. 9, no. 4, pp. 489-499, Aug. 2001.

• C.-H. Wu, J.-H. Hong, and C.-W. Wu, ``VLSI design of RSA cryptosystem based on the Chinese Remainder Theorem'', J. Inform. Science and Engineering, vol. 17, no. 6, pp. 967-979, Nov. 2001.

• J.-H. Hong and C.-W. Wu, ``Cellular array modular multiplier for the RSA public-key cryptosystem based on modified Booth's algorithm'', IEEE Trans. VLSI Systems, vol. 11, no. 3, pp. 474-484, June 2003.

• C.-P. Su, T.-F. Lin, C.-T. Huang, and C.-W. Wu, ``A high-throughput low-cost AES processor'', IEEE Communications Magazine, vol. 41, no. 12, pp. 86-91, Dec. 2003.

Page 24: Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC)

Jason Cong 24

Conference Publications• J.-H. Hong and C.-W. Wu, ``Radix-4 modular multiplication and exponentiation algorithms for the RSA public-key

cryptosystem'', in Proc. Asia and South Pacific Design Automation Conf. (ASP-DAC), Yokohama, Jan. 2000, pp. 565-570. • J.-H. Hong, P.-Y. Tsai, and C.-W. Wu, ``Interleaving schemes for a systolic RSA public-key cryptosystem based on an

improved Montgomery's algorithm'', in Proc. 11th VLSI Design/CAD Symp., Pingtung, Aug. 2000, pp. 163-166. • C.-H. Wu, J.-H. Hong, and C.-W. Wu, ``An RSA cryptosystem based on the Chinese Remainder Theorem'', in Proc. 11th VLSI

Design/CAD Symp., Pingtung, Aug. 2000, pp. 167-170. • C.-H. Wu, J.-H. Hong, and C.-W. Wu, ``RSA cryptosystem design based on the Chinese Remainder Theorem'', in Proc. Asia

and South Pacific Design Automation Conf. (ASP-DAC), Yokohama, Jan. 2001, pp. 391-395. • Y.-C. Lin, C.-P. Su, C.-W. Wang, and C.-W. Wu, ``A word-based RSA public-key crypto-procesoor core'', in Proc. 12th VLSI

Design/CAD Symp., Hsinchu, Aug. 2001. • T.-F. Lin, C.-P. Su, C.-T. Huang, and C.-W. Wu, ``A high-throughput low-cost AES cipher chip'', in Proc. 3rd IEEE Asia-

Pacific Conf. ASIC, Taipei, Aug. 2002, pp. 85-88. • Y.-T. Lin, C.-P. Su, C.-T. Huang, C.-W. Wu, S.-Y. Huang, and T.-Y. Chang, ``Low-power embedded memory architecture

design for SOC'', in Proc. 13th VLSI Design/CAD Symp., Taitung, Aug. 2002, pp. 306-309. • M.-C. Sun, C.-P. Su, C.-T. Huang, and C.-W. Wu, ``Design of a scalable RSA and ECC crypto-processor'', in Proc. Asia and

South Pacific Design Automation Conf. (ASP-DAC), Kitakyushu, Jan. 2003, pp. 495-498, (Best Paper Award). • C.-P. Su, T.-F. Lin, C.-T. Huang, and C.-W. Wu, ``A highly efficient AES cipher chip'', in Proc. Asia and South Pacific Design

Automation Conf. (ASP-DAC), Kitakyushu, Jan. 2003, pp. 561-562, (Design Contest Special Feature Award). • J.-H. Hong, C.-L. Liu, B.-Y. Tsai, and C.-W. Wu, ``A radix-4 modular multiplier for fast RSA public-key cryptosystem'', in

Proc. 14th VLSI Design/CAD Symp., Hualien, Aug. 2003, pp. 553-556. • M.-Y. Wang, C.-P. Su, C.-T. Huang, and C.-W. Wu, ``An HMAC processor with integrated SHA-1 and MD5 algorithms'', in

Proc. Asia and South Pacific Design Automation Conf. (ASP-DAC), Yokohama, Jan. 2004 (to appear).