Click here to load reader
Upload
genevieve-harris
View
17
Download
2
Embed Size (px)
DESCRIPTION
Why do so many chips fail?. Ira Chayut , Verification Architect (opinions are my own and do not necessarily represent the opinion of my employer). Failure rate of first silicon is rising. - PowerPoint PPT Presentation
Citation preview
Why do so many chips fail?Why do so many chips fail?
Ira Chayut, Verification Architect(opinions are my own and do not necessarily represent the
opinion of my employer)
Failure rate of first silicon is rising
“… research by Collett International revealed that 52% of complex application specific integrated circuits (ASICs) required a respin and the reason was largely due to functional errors.” (http://www.techonline.com/community/ed_resource/feature_article/36655)
Who is to blame? (There must be someone to blame!)
Management – they didn’t provide enough resources
HW Engineering – they created the functional errors
Verification – they didn’t catch the functional errors
Architecture – they didn’t focus on testability
Marketing – they kept changing the specs
People don’t kill chips, complexity kills chips
http://www.cs.utexas.edu/users/dburger/teaching/cs395t-s99/papers/2_src.pdf (1999) — Projected numbers are a bit lower than current reality – a dual core AMD Opteron has 233 million transistors and the Intel Itanium 2 has 592 million transistors
Complexity increases exponentially
Transistors per chip
0
200
400
600
800
1000
1200
1400
1600
1995 2000 2005 2010 2015
Year
Mill
ions
of t
rans
isto
rs
• Chip component count increases exponentially over time (Moore’s law)• Interactions increase super-exponentially• IP reuse and parallel design teams facilitate more functions with fewer HW engineers per function and more functions per chip• Verification effort gets combinatorially more difficult as functions are added
Why verification is not able to keep up
Verification effort gets combinatorially more difficult as functions are added
BUT
Verification staffing/time cannot be made combinatorially larger to compensate
AND
Chip lifetimes are too short to allow for complete testing
THUS
Chips will continue to have ever-increasing functional errors as chips get more complex
Limiting the number of architectural and functional errors
Thorough unit-level verification testing
Small simulations run faster
Avoids combinatorial explosion of interactions
Well defined interfaces between blocks with assertions and formal verification techniques to reduce inter-block problems
Emulation or FPGA prototyping to accelerate testing
How to live with functional errors
Successful companies have learned how to ship chips with functional and architectural – time to market pressures and chip complexity force the delivery of chips that are not perfect (even if that were possible). How can this be done better?
For a long while, DRAMs have been made with extra components to allow a less-than-perfect chip to provide full device function and to ship
How to do the same with architectural features? How can full device function exist in the presence of architectural or implementation omissions or errors?
Architecture support
Embrace Perl’s motto: “There's More Than One Way to Do It” — allow for multiple ways of accomplishing all critical specified functions
Analogous to Design for Test (DFT) and Design for Verification (DFV), we should start thinking about Architect for Verification (AFV)
[Thanks to Dave Whipp for the AFV phrase and acronym]
In some problem domains, such as networking, upper-layer protocols can recover from some silicon errors; though there is a performance penalty when this is used
Architect support, continued
A programmable abstraction layer between the real hardware and user’s API can hide functional warts — hardware catches specific operations and either directs them to one of multiple hardware implementations, or signals a software trap
Pyramid minicomputers hid the assembly language from users, compiler could work around problems
Transmeta maps standard machine language to hidden processor architecture, translation software can work around problems
Soft hardware can allow chip redesign after silicon is frozen (and shipped!)
Summary
Ever increasing chip complexity prevents total testing before tape-out (or even before shipping)
AFV techniques can make chip verification not subject to combinatorial explosion
We have to accept that there will be architectural and functional failures in every advanced chip that is built
Architecture support needed to allow failures to be worked around or fixed after post-silicon