View
2
Download
0
Category
Preview:
Citation preview
Copyright 2006 FTL Systems
Asynchronous Design AutomationEnabling FTL Systems’ StarStream Compact Supercomputer
StarStream and Asynchronous Merlin research and development solely at commercial expense,evaluation of early release systems funded in part by the US Navy
Specifications subject to change without notice
StarStream and Merlin are trademarks of FTL Systems, all rights reserved
StarStream and Merlin are the subject of FTL’s pending and issued patents
StarStream and Asynchronous versions of Merlin not yet offered for sale
Dr. Robert G Babb II
Northrop Grumman Information Technology
babb@seki.com
Dr. John Willis
FTL Systems
jwillis@ftlsys.com
Copyright 2006 FTL Systems
Outline
1. Company Roles
2. Why Use Asynchronous for High Performance Computing?
3. Design Automation Strategy using Merlin
4. StarStream Compact Supercomputers Enabled
5. Value of Asynchronous Technology
6. Lessons Learned
Copyright 2006 FTL Systems
Company Roles
! FTL Systems: Provides end-to-end design automation flow with behavioralasynchronous capability, StarStream compact supercomputer processors,computers, programming compilers
! Northrop Grumman (IT): Provides applications government-orientedapplications and on-site operational services
! Cisco: Provide high-bandwidth network and file system capability
! Many other companies, North American / European governmentalorganizations are involved
(Intellectual properties remain owned by respective developers, noendorsements expressed or implied)
Copyright 2006 FTL Systems
Why Use Asynchronous for High Performance Computing?:Power Management
! High performance processors are limited by power in and heat out:! Optimal per-die dissipation of 10-40 watts
! 100-200 watt dissipation feasible with economical technology
! 90nm die running at 2 to 6 GHz can exceed 100-200 watts
! Some power dissipation results from leakage currents (largelyspeed-independent), addressed by other means
! High die temperature leads to higher cost and lower reliability
! Current designs manage via limiting logic duty cycle (clock-gating…)
! Asynchronous logic facilitates managing power by disablinglocalized parts of processor on a cycle-by-cycle basis
Copyright 2006 FTL Systems
Why Use Asynchronous for High Performance Computing?:Complexity Management
! High-end microprocessors use 1,000M to 2,000M transistors
! Supercomputing processors such as StarStream use 5,000M to 10,000Mtransistors (exceeding current single-die fabrication, even at 45nM)
! Timing closure of even a 5,000M transistor design running at1G Hz to 10G Hz is time-consuming (frequently months),expensive (significant part of design cost) and error-prone(field failures often traced to marginal timing at particularvoltage, temperature, process point…)
! Asynchronous technology can reduce back-end design ruleclosure time, money and risk
! However, fully delay insensitive asynchronous logic is generally notsuitable for supercomputing performance requirements
Copyright 2006 FTL Systems
Why Use Asynchronous for High Performance Computing?:Complications, The Down-Side
! Transit times associated with various kinds of completion signals increase theeffective cycle time, especially between physically distant parts of a processor
! Most asynchronous logic techniques significantly increase bothtransistor count and logical wiring complexity (greatly complicatingdesigns that already do not fit on a single die)
! Meta-stability physics associated with merging of local time domains inducesstatistical reliability problems, especially with large and fast systems
! Standard cell libraries (FPGA & ASIC) are seldom complete forasynchronous logic technologies (resulting in less optimal realizationor specialized cell library development)
! Few designers have expertise or design time to manually deal with asynchronous
! Most design automation technology does not support complex asynchronousdesigns using standard design languages (such as VHDL) and design flowinterfaces
All are solvable challenges, this talk focuses on a commercial solution to the last twoissues
Copyright 2006 FTL Systems
Design Automation Strategy: Flow
StarStream
Behavioral Processor & System
Specification (using standard VHDL
with minimal extensions)
Logical Technology Specific &
Realization Technology Specific
Implementations of “type systems”
FTL Systems’
Software-only
Implementation
For Compiler Testing
& Early Performance
Evaluation
Early Release
Version (runs
at greater than
400 MHz with
modest cost, risk)
Asynchronous
Quasi-Custom
Logic Version
runs at greater
than 1.5G Hz
Clocked Logic
Quasi-Custom
Logic Version
Runs at greater
Than 1.5 GHz
StarStream implementations
Copyright 2006 FTL Systems
Design Automation Strategy:Abstraction of Logic & Realization Technology
! Conventional asynchronous design practice embodies a specificasynchronous logic design style and often realization directlyin design
! This implies an early (perhaps premature) selection of thebest logic technology and realization technology
! It also requires that all designers have an expert knowledge ofthe asynchronous logic technologies employed
New Approach:! Capture the behavioral design intent distinct from implementation
and realization decisions! Define one or more systems of data type representations
and operator implementations, one for each logic or realizationtechnology
! Design automation tool semi-automatically selects suitabletechnologies to apply to each segment of the design to bestmeet design constraints (timing, size, power)
Copyright 2006 FTL Systems
Design Automation Strategy: Example
Processor behavioral model specifies addition of two values
Each visible type system specifies:
1. Data representation (for example a particular approach
to representing bundled data)
2. Implementation of spanning set of operators
(such as addition) specific to data representation
3. Type conversion operators suitable for implicit or
explicit type conversion from other logic or
realization type (logic) systems
Combination of localized usage, hardware scheduling,
constraints and optimization strategies automatically
(or semi-automatically) choose and apply the most suitable
logic and realization technology.
Addition may be expressed using a particular bundled data approach, specific adder
algorithm and specific devices (much as compiler chooses specific registers & assembly)
Copyright 2006 FTL Systems
Application of Logic Technology to Design
Distinct asynchronous logic technologies
Distinct asynchronouslogic technologies maybe applied to designregions or specific state
Applications via operationalconstraints or interactivegraphical user interface
Single tool combines bothverification (digital andanalog) as well as synthesisinto a tight iterative loop
“Flat” regions Objects in region
Copyright 2006 FTL Systems
StarStream Supercomputers Enabled: Processors
• Nominal .5 integer TeraFlops in Early Release -> ~ greater than
1.5 TeraFlops with asynchronous or clocked production logic
• Automatically compiled from a common behavioral model: implementation
specific effort is largely verification, production & test costs
• CPU has ~5-8 times the number of transistors in next generation
microprocessors (6,000M to 10,000M transistors)
• Early release processors use 29 packages, each a large 90nM die
• Module power and cost are on high-end of microprocessor range
Copyright 2006 FTL Systems
StarStream Supercomputers Enabled: Systems
• approx 19” cube suitable for deskside or standard 19” rack mount
• Cubes assemble via electro-optic switches into PetaOP systems
using a few tens of racks
• Fully asynchronous cubes more easily assembled into multi-cube
systems than cubes with inherent clock domains; asynchronous scales!
12U H
15” D19” W
Copyright 2006 FTL Systems
StarStream Supercomputers Enabled: Status
! Early Release silicon being populated on boards and inbring-up test now
! Early release systems expected to be in circulation forsoftware verification and testing mid-2006
! Design focus shifting from Early Release to Production systems
Copyright 2006 FTL Systems
Value of Asynchronous Technology
! As physical size, operating speed, design complexity growwhile device dimensions decrease, asynchronous systemshave increasing criticality to feasibility of faster computers.
! Incorporating asynchronous technology into the first systemshas a very significant cost increase over conventional computersystem research and development costs; subsequent systemsare likely to be less expensive
! Asynchronous technologies are good for localizing power management
! Expect to have commercial relevant comparative data ofasynchronous and clocked logic on distinct implementationsof the same high performance computer system within the next year
Copyright 2006 FTL Systems
Lessons Learned so Far (1/2)
Merlin & StarStream effort underway for eight years, already learnedseveral important lessons:
! Largest effort is on compilers, behavioral architecturedesign & design automation tools; hardware comes downto diligent analog engineering and pre-fabrication verification
! Design teams can be significantly smaller and more efficient(tens rather than hundreds of people), resulting in agility
! Effective use of approach requires designers with even broaderknowledge and deeper experience than conventional approaches(most of those involved are PhD with 10-25 years experience,extremely hard for technicians and new graduates to get traction)
! Business model focuses on computer systems; EDA tools are anenabler
Copyright 2006 FTL Systems
Lessons Learned so Far (2/2)
! VHDL suitable for capturing designer’s behavioral intent; VHDL’sextensible type system of significant value. Behavioral synthesismust recognize domains in which VHDL over-constraints timing.
! Significant logic technology and device realization research anddevelopment required: handshake latency, growth in transistorcount which are fine for low-power, comparatively small systemsare not suitable for large, high-performance systems
! Highly automated behavioral synthesis, simulation, formalverification and design rule checking required with specialcharacteristics supporting asynchronous design. Manualapproaches are too error-prone for large designs with highverification and fabrication costs
Recommended