Upload
vuongnguyet
View
219
Download
2
Embed Size (px)
Citation preview
Mitigating the Dangers of Multi-Clock Designs
National Software and Complex Electronic Hardware ConferenceAugust 20-21, 2008Denver
David LandollApplications ArchitectMentor Graphics Corp.
Mentor Confidential
2
Mentor Graphics Around the World
R&D SitesSales Offices
4400 employees, 44 sales offices, 28 R&D locations, ~60% R&D outside NA
2008 National SW & CEH Conference
Mentor Confidential
2007 revenues $880M
Projecting ~$915M for 2008
Focus on growth through internal technology development— R&D 29% of total revenues, invested >$1.5B since 1996
Product portfolio that includes numerous market leaders— Calibre, Questa, 0-In, Expedition, Catapult, TestKompress…
Acquisitions that build on leadership positions
3
Who is Mentor Graphics?
2008 National SW & CEH Conference
Mentor Confidential4
2008 National SW & CEH Conference
MentorMentor’’s Graphics s Graphics Serves the MilServes the Mil--Aero MarketAero Market
4
PCB Design
FPGA ASIC
Cable/Harness Systems Embedded SW•EDGE Developer Suite•Nucleus Secure RTOS•Inflexion Application Platform
•HDL Designer•Precision synthesis•ModelSim simulator•Questa Adv. Verification
• C synthesis• Questa adv verification platform• Veloce HW-assisted verification• Seamless HW/SW co-design• Advanced M/S mixed signal design• Physical Implementation• Manufacturing tests
•Board station•Expedition Enterprise flow•PADS•Data Management System
•Capital platform•Veysys family•TransDesign
Mentor Confidential
Today’s FPGAs
Arbiter
Master #2
HADDR HWDATAHRDATA
Master #1
HADDR HWDATAHRDATA
Slave #3
HADDRHWDATAHRDATA
Slave #2
HADDRHWDATAHRDATA
Slave #1
HADDRHWDATAHRDATA
Decoder
Address and Control Mux
Read data mux
Write data mux
Design 2: AMBA AHB Interconnect
Design 1: DMA Controller
Design 4: UARTDesign 3: CPU
Onboard Computer System on Chip / FPGA
Memory Controller
Timers IRQCtrl
UART DMA Controller
AHB/APB Bridge
AMBA AHB
32bit Data bus
Data Memory Parity Memory
CAN Controller
HDLC Controller
SpaceWireController
AMBA APB
FPU
HDLC Controller
CAN Switch
Leon CPU
EDAC Controller
Boot PROM
SDRAMSDRAM
Address/Control bus
RS422 SpaceWireHDLC1
Up/Downlink HDLC0
Up/Downlink CAN0 CAN1
Dual CAN transceiver
1.5V Linear Regulator
+3.3V
+3.3V +1.5V
CLK Generator
Configuration PROM
JTAG
I/O port
PPS outPPS in
DMA Controller UART
CPU AMBAAHB
• Fabrication advances provide more available silicon area• More functionality can weigh less and take up less space• Integrating/reusing capabilities lowers cost
2008 National SW & CEH Conference 5
Mentor Confidential
Avionic Integration – New challenges
Flight Management
Integrated Avionics Processing Maintenance
Diagnostics
Image Processing
Flight Control & Avoidance Systems
Boeing 787: Integration’s Next StepFrom its central processor to its common data network, surveillance system and navigation system, the theme of the Boeing 787 Dreamliner is integration.James W. Ramsey Communications
Weather Radar
62008 National SW & CEH Conference
Mentor Confidential
Today’s Avionics…
Today’s CEH challenges:■ Complex systems■ Complex intercommunication■ New ARINC serial communication ■ Contain “Multi-clock designs”
All Multi-clock designs have CDCs■ So, what are “CDCs”?■ Why should you care?
72008 National SW & CEH Conference
7
Mentor Confidential
Clock Domain Crossing ErrorsUnpredictable Loss of Data
■ CDC problems— corrupt control and data signals— are subtle, intermittent, unpredictable— are the 2nd major cause of respins— are difficult to reproduce and debug — are temperature, voltage, and process sensitive— will only occur in hardware; often in the final design
■ Traditional verification techniques do not work for CDC signals
82008 National SW & CEH Conference
A CDC Verification methodology is needed to reduce the risk of CDC related data errors
A CDC Verification methodology is needed to reduce the risk of CDC related data errors
8
Mentor Confidential
FPGA/ASIC Verification TrendsIC/ASIC Designs Requiring Re-Spins by Type of Flaw
75%71%
0% 20% 40% 60% 80% 100%
Other
Firmware
IR Drops
Power Consumption
Mixed-Signal Interface
Slow Path
Delays/Glitches
Yield/Reliability
Fast Path
Tuning Analog Circuit
Clocking
Logic/Functional
Percent of Designs Requiring Two or More Silicon Spins
Market Study 2002Market Study 2004
Source: 2004/2002 IC/ASIC Functional Verification Study, Collett International Research, Used with PermissionSource: 2004/2002 IC/ASIC Functional Verification Study, Collett International Research, Used with Permission
Effort Allocation of Dedicated Verification Engineers by Type of Activity
60%
40%
Verif ication Debug Testbench Development
■ Traditional methods failing to catch bugs
■ Debugging becomes the main bottleneck
■ Usually indeterminate amount of time
■ 50% of ASICS require more than one spin
■ Principle contributors:■ Functional bugs■ Clocking related bugs
[Collet 2005]
2008 National SW & CEH Conference 9
Mentor Confidential
Agenda
Digital Clocks, Registers, Metastability Explained
Why are Clock Domain Crossings Dangerous?
Where are CDCs likely on today’s CEH?
Mitigating the Clock Domain Crossing issue
Verifying the mitigation was performed correctly
Recommendations
102008 National SW & CEH Conference
CDC = Clock Domain Crossing
10
Mentor Confidential
MetastabilityWhat the heck is it, anyway?
■ What is a clock?— Periodic pulsing signal— Digital logic uniformly connected to this signal— Acts as the Symphony Conductor – keeps logic in sync— Action happens across the logic at one specific point
■ Typically the “rising edge”
112008 National SW & CEH Conference
11
Vee, Vss: GND: 0VVcc, Vdd : +5V, +3.3V
Mentor Confidential
MetastabilityWhat the heck is it, anyway?
■ What’s in a register?— (Also known as a latch, flip-flop, etc)— Contain transistors that “trap” the input value at the
appropriate time■ E.g. rising edge of the clock
— How does this happen?
122008 National SW & CEH Conference
12
Mentor Confidential
MetastabilityThe Physics of a Register
■ Let’s take a look at a register— CMOS D-type transmission gate flipflop
132008 National SW & CEH Conference
CLK
D Q
D
CLK
Q00
0
-- simple D-type flip-flop
process(CLK)
begin
if rising_edge(CLK) then
Q <= D;
end if;
end process;
13
Transistor Model of a D FlipFlop
Mentor Confidential
MetastabilityThe Physics of a Register
■ Let’s take a look at a register— CMOS D-type transmission gate flipflop
142008 National SW & CEH Conference
CLK
D Q
D
CLK
Q10
0
-- simple D-type flip-flop
process(CLK)
begin
if rising_edge(CLK) then
Q <= D;
end if;
end process;
14
Transistor Model of a D FlipFlop
Mentor Confidential
MetastabilityThe Physics of a Register
■ Let’s take a look at a register— CMOS D-type transmission gate flipflop
152008 National SW & CEH Conference
CLK
D Q
D
CLK
Q11
1
-- simple D-type flip-flop
process(CLK)
begin
if rising_edge(CLK) then
Q <= D;
end if;
end process;
15
Transistor Model of a D FlipFlop
Mentor Confidential
Transistor Model of a D FlipFlop
■ Let’s take a look at a register— CMOS D-type transmission gate flipflop
2008 National SW & CEH Conference
MetastabilityThe Physics of a Register
16
CLK
D Q
D
CLK
Q00
1
Only works if D has a “good value” at the rising edge of the clock(no Set-up/hold time violations)
-- simple D-type flip-flop
process(CLK)
begin
if rising_edge(CLK) then
Q <= D;
end if;
end process;
16
Mentor Confidential
MetastabilityThe Physics of a Register
■ When setup/hold conditions are violated, the output of a storage element becomes unpredictable
■ This effect is called metastability■ If not contained, metastability can propagate…
172008 National SW & CEH Conference
din tffMTBF
clk ××=
1
fclk = Clock Frequency
fin = Input Signal Frequency
td = Duration of critical time window
CLK
D
QCLK
D Q
Setup/hold window
Metastability is UNAVOIDABLE in designs with multiple asynchronous clocks
17
Mentor Confidential
Clock Domain Crossing signal
Clock Domain CrossingsGuaranteed to Cause Metastability
When 2 or more designs run on disparate clocks:— The clocks will continually skew, guaranteeing setup/hold violations — Signals from one design to another are “Clock Domain Crossings” (CDCs)
182008 National SW & CEH Conference
D
CLK
Q
Sensor System Guidance System
TxTx
Clock B
Clock AClock A
Setup/hold window
Signals that cross asynchronous clock
domains (CDC signals) WILL violate setup and
hold conditions
18
D
CLK
Q
Mentor Confidential
Aircraft CEHWhere can CDC issues occur?
■ Any time 2 or more systems (or parts within a system) are run by unique clocks (e.g. PLLs)
■ Could be in ANY system■ Obvious Examples
— ARINC 818 ■ Digital Video Bus – based on Fibre Channel
— ARINC 664 (AFDX)■ Ethernet Data Bus
— Why these?■ Both “recover” the clock from the incoming
serial bitstream
192008 National SW & CEH Conference
19
Incoming Serial Data
SERDES
Mentor Confidential
Aircraft CEHWhere can CDC issues occur?
■ ARINC 818 — Digital Video Bus – based on Fibre Channel
■ ARINC 664 (AFDX)— Ethernet Data Bus
■ Recovered Clocks— Technology is widely used and understood, — But…the design needs to handle the clock…
■ Best solution: Transfer data to a more stable clock domain
202008 National SW & CEH Conference
SERDESSerializer
Deserializer
Clock period and duty cycle can vary
Clock can shut off
20
Incoming Serial Data
Mentor Confidential
Aircraft CEHWhere can these issues occur?
212008 National SW & CEH Conference
Serializer / Deserializer
(A.K.A SERDES)
21
Mentor Confidential
Agenda
■ Digital Clocks, Registers, Metastability Explained
■ Why are Clock Domain Crossings Dangerous?
■ Where are CDCs likely on today’s CEH?
■ Mitigating the Clock Domain Crossing issue
■ Verifying the mitigation was performed correctly
■ Recommendations
222008 National SW & CEH Conference
22
Mentor Confidential
Mitigating Clock Domain Crossing Issues
■ Problem:— Signals crossing a clock domain will violate set-up/hold— Impact: Control/data signals will be dropped/corrupted
■ Loss of Data
■ Approaches:— Avoid having systems that have multiple clocks— Only allow one clock, one edge— Only one problem…
232008 National SW & CEH Conference
23
Mentor Confidential
These systems ALREADY have multiple clocks…
242008 National SW & CEH Conference
24
Mentor Confidential
Mitigating Clock Domain Crossing Issues
Problem:— Signals crossing a clock domain will violate set-up/hold— Impact: Control/data signals will be dropped/corrupted
■ Loss of Data
Approaches:— Avoid having systems that have multiple clocks
■ So, although sensible, it’s becoming impossible— Design around the problem
■ Designer can add “synchronizers” to the design■ Metastability still happens, but nobody else sees it
— E.g. 2DFF, FIFO, etc.— “Fences in” metastability
252008 National SW & CEH Conference
25
Mentor Confidential
Isolate Metastability: Synchronizers
■ Designers add synchronizers to reduce the probability of metastable signals
■ Synchronizers are sub-circuits that can prevent metastable values from being sampled across clock domains
— Take unpredictable metastable signals and create predictable behavior
2008 National SW & CEH Conference 26
Mentor Confidential
Mitigating Clock Domain Crossing IssuesIsolate Metastability: Synchronizers
272008 National SW & CEH Conference
Clock AClock A Clock BClock B
ii i +1i +1 i +2i +2i i --11ii i +1i +1 i +2i +2i i --11
TxTx
Metastability window
When metastability occurs, the delay through a synchronizer becomes unpredictable
RxRx
ii i +1i +1 i +2i +2i i --11 i +3i +3
27
Mentor Confidential
Invalid Command
Synchronizer Delays Can Reconverge with unexpected resultswith unexpected results
■ CDC signals cross with an assumed relationship■ Can be combinational, sequential, or deeply sequential■ Unpredictable delays on CDC paths lead to reconvergence errors
— Designs need logic to correctly handle reconvergence— Can occur on single-bit or multiple-bit signals
282008 National SW & CEH Conference
S1
S2
S3 S4FS
M In
put
tx_d0
tx_d2
tx_d1
Sync 2
Sync 1
Sync 0
0 0
00 1
00 1
0
0 0
00 0
00 1
0
Valid Command – but delayedG
rey Enco
der
Grey D
ecoder
0 0
01 1
11 1
1
0 0
01 0
11 1
1
Sync 10 0
00 0
01 1
1
28
Mentor Confidential
And, Synchronizers Fail if Misused
■ Synchronization between clock domains requires a transfer protocol
— Ensures data is predictablytransferred between domains
■ These protocols must be verified
■ When protocol is violated— Data is lost— Simulation may not show a failure— Silicon will eventually show a
functional error
2008 National SW & CEH Conference
Synchronizer won’t function properly if the required Transfer Protocol is violated
29
Mentor Confidential
Must Verify All Three CDC Problems
302008 National SW & CEH Conference
Clock domain crossings need:— Structured synchronization— Transfer protocols— Global reconvergence checking
Reconvergence problem
Missing sync problem Possible protocol
problem
30
Mentor Confidential
Mitigating Clock Domain Crossing Issues
■ Problem:— Signals crossing a clock domain will violate set-up/hold— Impact: Control/data signals will be dropped/corrupted
■ Approaches:— Avoid having systems that have multiple clocks— Designer can add “synchronizers” to the design— Designer-added synchronizers + full CDC verification
■ Assures synchronizers are present and used correctly
312008 National SW & CEH Conference
31
Mentor Confidential
Recommendations
During design planning1. Create systems/designs using 1 clk, 1 edge when possible2. If multiple clocks are required, try to use 1 designer for both clock
domains3. When multi-clock design is required, plan for proper verification
1. How to we accomplish this?
322008 National SW & CEH Conference
32
Mentor Confidential
Agenda
■ Digital Clocks, Registers, Metastability Explained
■ Why are Clock Domain Crossings Dangerous?
■ Where are CDCs likely on today’s CEH?
■ Mitigating the Clock Domain Crossing issue
■ Verifying the mitigation was performed correctly
■ Recommendations
332008 National SW & CEH Conference
33
Mentor Confidential
Verifying CDC Synchronization
■ Problem:— Missing synchronizers will create metastability— Correctly placed but misused synchronizers won’t work— Reconvergence of synchronized signals can create
unexpected behavior■ Approaches:
— Simulation■ Digital logic simulators do NOT model transistor behavior■ Do not model “metastability”
342008 National SW & CEH Conference
34
Mentor Confidential
For example …
352008 National SW & CEH Conference
Q
D
CLK
Simulation captures a ‘1’ while silicon produces either a ‘1’ or ‘0’
Setup Violation
Q
D
CLK
Hold Violation
Simulation Does NOT Reflect Silicon Behavior
Q in silicon Q in silicon
Simulation captures a ‘0’ while silicon produces either a ‘1’ or ‘0’
Q in simulation
35
Q in simulation
Mentor Confidential
Verifying CDC Synchronization■ Problem:
— Missing synchronizers will create metastability— Correctly placed but misused synchronizers won’t work— Reconvergence of synchronized Control logic bugs
■ Approaches:— Simulation
■ Won’t model CDC’s correctly to detect errors— Static Timing Analysis
■ Can be used to identify signals that cross domains■ Can be used as input for a manual review■ But…Won’t detect missing or incorrectly used synchronizers, or
reconvergence
362008 National SW & CEH Conference
36
Mentor Confidential
Verifying CDC Synchronization■ Problem:
— Missing synchronizers will create metastability— Correctly placed but misused synchronizers won’t work— Reconvergence of synchronized Control logic bugs
■ Approaches:— Simulation
■ Won’t model CDC’s correctly to detect errors— Static Timing Analysis
■ Identifies signals for manual review, but otherwise useless— Manual Design Reviews
■ Error prone (and very time consuming)■ Typically only identifies synchronizer structures, misses
reconvergence and invalid sync protocol usage■ Evidence suggests at least some synchronizers will be missed
372008 National SW & CEH Conference
37
Mentor Confidential
For Example…Trivial Reconvergence Error
■ Reconverging synchronized CDC signals - timing is unpredictable.■ Need to verify the downstream logic can handle variations
— Manually identifying the reconvergence is very hard— Manually identifying all possible behaviors is harder— Manually assuring logic will behave correctly – typically intractable
382008 National SW & CEH Conference
38
Mentor Confidential
Verifying CDC Synchronization■ Problem:
— Missing synchronizers will create metastability— Correctly placed but misused synchronizers won’t work— Reconvergence of synchronized Control logic bugs
■ Approaches:— Simulation - Won’t model CDC’s correctly to detect errors
— Timing Analysis - Identifies signals for review, but otherwise useless
— Manual Design Reviews - error prone, incomplete
— Lab Verification? ■ Problem is intermittent, debug is impossible
— Spice simulation? – It *does* model transistors, but…■ Where will you get the “Spice deck”? (transistor level model)
■ Would be far too slow on a large FPGA
392008 National SW & CEH Conference
39
Mentor Confidential
Verifying CDC Synchronization■ Problem:
— Missing synchronizers will create metastability— Correctly placed but misused synchronizers won’t work— Reconvergence of synchronized Control logic bugs
■ Approaches:— So - we need a new method that reliably:
■ Identifies ALL CDC signals, structures, reconvergence■ Assures ALL connected, functioning correctly■ Creates reports for manual reviews■ The EDA industry has responded
— 6 commercial tools now available…and counting— But…most won’t identify all 3 of our CDC issues
402008 National SW & CEH Conference
40
Mentor Confidential
CDC Verification TechnologyCaveat – To the best of my knowledge…
Mentor 0-In CDC
Cadence Conformal
Synopsys Leda
Atrenta Spyglass-
CDC
Real Intent CIV
AldecCDC
Structural VerificationProtocols
VerificationReconvergance
VerificationDO-254
Robustness ? ? ? ? ?
412008 National SW & CEH Conference
41
Minimal Support
Moderate Support
Strong Support
Mentor Confidential
Mentor’s CDC Verification TechnologyWho’s using our technology?■ DO-254
— Honeywell, Inc. — L-3 Communications — Lockheed Martin Co — Ministry of Aerospace & Aeronautics — Northrop Grumman Corp — Raytheon — Rockwell Collins Inc. — SAAB Group — Thales
■ Commercial— Widely used in commercial space
■ The market leader in CDC verificationThe market leader in CDC verification
422008 National SW & CEH Conference
42
Mentor Confidential
Example Value from One Customer■ Design
— IEEE standard serial communications core— Used in 50-60 other COMMERCIAL ASIC products— Widely deployed (millions in use daily)
■ Placed core in a sensor guidance system— Found issues in the lab— Debugged FPGA for weeks— Suspected a CDC issue, but not sure…
■ Deployed Mentor’s CDC solution— Results same day— Found 199 serious CDC bugs!
■ 45 Missing Synchronizers■ 83 Incorrect Synchronizers■ 76 Reconverging Signals■ 11 other problems
— Most resulting from “more stressful” usage■ In production:
— Commercial ASIC : Customer issue – device is erratic, locks up— Avionics: Could result in an Airworthiness Directive
432008 National SW & CEH Conference
43
Mentor Confidential
SummaryRecommendations
During design planning1. Create systems/designs using 1 clk, 1 edge when possible2. If multiple clocks are required, try to use 1 designer for all clock domains3. When multi-clock design is required, plan for proper verification
During verification1. Watch for multiple clocks in designs (Tip – Count PLLs)2. Ask how CDC issues are mitigated (remember there are 3)
Utilize commercial tools designed for detecting these problems1. Verify all 3 classes of CDC problems
1. Structural Verification2. Protocols Verification3. Reconvergance Verification
2. Use reports to aid manual reviews3. Use CDC tools to support ROBUSTNESS
442008 National SW & CEH Conference
44
Mentor Confidential
In Conclusion …
■ Every multi-clock design is subject to metastability■ ARINC 664, 818 standards require multi-clock designs■ Traditional verification methodologies CANNOT
assure robustness
■ To properly mitigate the dangers of CDC, we strongly recommend a solution that… :
— Supports Manual Reviews— Automatically reports all sources of CDC problems— Has a proven CDC verification methodology &
customer success
452008 National SW & CEH Conference
45
Mentor Confidential
Further Learning
■ Visit our web site: www.mentor.com/go/do-254■ Has numerous resources & papers including:
— “Automating Clock-Domain Crossing Verification for DO-254 (and other Safety-Critical) Designs”
— “Achieving Quality and Traceability in FPGA/ASIC Flows for DO-254 Aviation Projects”
— “The Use of Advanced Verification Methods to Address DO-254 Design Assurance”
— “Effective Functional Verification Methodologies for DO-254 Level A/B and Other Safety-Critical Devices”
— “Assessing the ModelSim Tool for Use in DO-254 and ED-80 Projects”
— “DO-254 Compliant Design and Verification with VHDL-AMS”
462008 National SW & CEH Conference
46
Mentor Confidential47
2008 National SW & CEH Conference 47
David LandollApplications ArchitectMentor Graphics [email protected]/go/do-254
Mentor Confidential48
Appendix
Mentor Confidential
Multi-clock designs often exhibit design flaws that
simulation alone can’t find!
■ Structural CDC analysis— Automatically recognizes a large set of
synchronizers— Comprehensive modal analysis
■ Protocol verification— Automatic generation of CDC protocol
assertions— These can either be proven or verified
through simulation■ Reconvergence verification
— Structural analysis to identify potential reconvergence issues
— Metastability simulation to verify the design correctly handles reconvergence
■ Tuned for capacity, debug, and ease of use
The 0-In CDC Verification Solution
0-In CDC – The only complete RTL-level CDC solution
Mentor Confidential
BackgroundWhat’s a “SERDES” (and why do I care?)
512008 National SW & CEH Conference
Avionics Computer Clock
“Transmit Clock”
Outgoing Signal
Transmission – pretty straightforward• Take the data, bit by bit – shift it out fast enough• Done by a “serializer”
Mentor Confidential
BackgroundWhat’s a “SERDES” (and why do I care?)
522008 National SW & CEH Conference
Incoming Signal
“Receive PLL Clock”
“Receive Clock”
“Receive Clock”
“Receive PLL Clock”
00111110101100000101
Reception – This is where things get cleverRun a really fast clock – try to identify the characters coming in“Lock” the clock (Recovered Clock) on whatever shift allowed you to start seeing
charactersDeclare yourself “synced” after recognizing n characters without errorsThis is done by the “Deserializer”
Mentor Confidential
BackgroundWhat’s a “SERDES” (and why do I care?)
532008 National SW & CEH Conference
Incoming Signal
“Receive PLL Clock”
“Receive Clock”
“Receive Clock”
“Receive PLL Clock”
00111110101100000101
Reception – So, what’s the problem?If the transmitter sends garbage – you don’t get a “recovered clock”…
Running a digital design when the clock turns off doesn’t work very well…The solution is to “synchronize” the incoming data to another stable clock domain ASAP