Upload
ngoduong
View
215
Download
0
Embed Size (px)
Citation preview
SUBMITTED FOR PUBLICATION TO: , FEBRUARY 14, 2006 1
A Stimulus-free Probabilistic Model forSingle-Event-Upset Sensitivity
Abstract
With device size shrinking and fast rising frequency ranges, effect of cosmic radiations and
alpha particles known as Single-Event-Upset (SEU), Single-Event-transients (SET), is a growing
concern in logic circuits. Accurate understanding and estimation of Single-Event-Upset sensitivi-
ties of individual nodes is necessary to achieve better softerror hardening techniques at logic level
design abstraction. We propose a probabilistic framework to the study the effect of inputs, circuits
structure and gate delays on Single-Event-Upset sensitivities of nodes in logic circuits as a single
joint probability distribution function (pdf). To model the effect of timing, we consider signals at
their possible arrival times as the random variables of interest. The underlying joint probability dis-
tribution function, consists of two components: ideal random variables without the effect of SEU
and the random variables affected by the SEU. We use a Bayesian Network to represent the joint
pdf which is a minimal compact directional graph for efficient probabilistic modeling of uncertainty.
The attractive feature of this model is that not only does it use the conditional independence to arrive
at a sparse structure, but also utilizes the same for smart probabilistic inference. We show that results
with exact (exponential complexity) and approximate non-simulative stimulus-free inference (linear
in number of nodes and samples) on benchmark circuits yield accurate estimates in reasonably small
computation time.
I. INTRODUCTION
High-energy neutrons present in cosmic radiations and alpha particles from packaging materi-
als give rise to single event upsets (SEUs) resulting in softerrors in logic circuits. When particles
hit a semiconductor material, electron-hole pairs are generated, which may be collected by a P-N
junction, resulting in a short current pulse that causes logic upset or Single Event Upset (SEU) in
the signal value. An SEU may occur in an internal node of a combinational circuit and propagate
to an output latch. When a latch captures the SEU, it may causea bit flip, which can alter the state
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 2
of the system resulting in a soft error. In current technology, soft errors are of serious concern in
memories, whereas in logic circuits soft error rate is comparatively low due to logical, electrical
and temporal masking effects. However, as process technology scales below 100 nanometers and
operating frequencies increase, the above masking barriers diminish due to low supply voltages,
shrinking device geometry and small noise margin. This willresult in unacceptable soft error
failure rates in logic circuits even for mainstream applications [1].
Soft error susceptibility of a nodej with respect to a latchL, SESj QL is the soft error rate at
the latch outputQL, contributed by nodej. The propagation of an SEU generated due to a particle
hit at an internal nodej to an outputi which causes a bit flip at the output of a latchL is depicted
in Fig. 1.
We model the SEU propagation as follows: LetTj i be a Boolean variable which takes logic
value 1 if an SEU at a nodej causes an error at an output nodei. ThenP(Tj i) (measured as
the probability ofTj i being equal to 1) is the conditional probability of occurrence of an error at
output nodei given an SEU at nodej. Let P(SEUj) be the probability that a particle hit at nodej
generates an SEU of sufficient strength and letP(QLjTj i) be the probability that an error at output
node i causes an erroneous signal at latch outputQL. MathematicallySESj QL is expressed by
Eq. 1.
SESj QL = RHP(SEUj)P(Tj i)P(QLjTj i) (1)
whereRH is the particle hit rate on a chip which is fairly uniform in space and time.P(SEUj)depends onVdd, Vth and also on temperature.P(QLjTj i) is a function of latch characteristics and
the switching frequency.
In this work, we exploreP(Tj i) by accurately considering the effect of (1)SEU duration, (2)
effect of gate delayand (3)timing, (4) re-convergencein the circuit structure and most importantly
(5) inputs. Several works on soft error analysis estimate the overall output signal errors due to SEUs
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 3
I1
I2
I3
IN
,
Inputs
j_i QL
P(QL
| Tj_i )
the ith outputSEU propagated to
latch ouputa bit flip atSEU causing
j
iP(SEU
d=2
d=1
d=2
d=3
Combinational Logic
SEU width - Deltaj
)
d is the delay associated with gates
H) Particle Hit at node j (R
Latches
P(T )
Fig. 1. SEU Propagation.
at the internal nodes [8], [9], [13], [15], [16] . Note that our focus is to identify the SEU locations
that cause soft errors at the output(s) with high probabilities andnot on the overall soft error rates.
Knowledge of relative contribution of individual nodes to output error will help designers to apply
selective radiation hardening techniques. This model can easily be fused with the modeling of
the latches [15], [17] considering parameters such as latching window, setup, hold time,Vth and
Vdd [8], [9], [15] for a comprehensive model capturing processing, electrical and logical effect.
We model internal dependency of the signals taking into consideration timing issues so that the
SEU sensitization probability (P(Tj i)) captures the effect of circuit structure, circuit path delay and
also the input space. We use a circuit expansion algorithm similar to that presented in [5], [21] to
embed time-related information in the circuit topology without affecting its original functionality.
A fan-out dependent delay model is assumed where gate delay of each node is equal to its fan-
out. We also use logical effort based delay model where gate delays are dependent not only on
fan-out but also on input capacitance as well as parasitic capacitance. Due to the temporal nature
of SEUs, not all of the SEUs cause soft errors. From the expanded circuit, we generate a list of
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 4
SEUs (possible SEU list) that are possibly sensitized to thecircuit outputs at the time frame when
output signals are latched. From the expanded circuit and the possible SEU list, we construct an
error detection circuit and model SEU in large combinational circuits using a Timing aware Logic
induced Soft Error Sensitivity model (TALI-SES), which is acomplete joint probability model,
represented as a Bayesian Network.
Bayesian Networks are causal graphical probabilistic models representing the joint probability
function over a set of random variables. A Bayesian Network is a directed acyclic graphical struc-
ture (DAG), whose nodes describe random variables and node to node arcs denote direct causal
dependencies. A directed link captures the direct cause andeffect relationship between two ran-
dom variables. Each node is quantified by the conditional probability of the states of that node
given the states of its parents, or its direct causes. The attractive feature of this graphical repre-
sentation of the joint probability distribution is that notonly does it make conditional dependency
relationships among the nodes explicit but it also serves asa computational mechanism for effi-
cient probabilistic updating. Bayesian networks have traditionally been used in medical diagnosis,
artificial intelligence, image analysis, and specifically in switching model [3] and single stuck-at-
fault/error model [6] in VLSI but their use in timing aware modeling of Single-Event-Upsets is
new. We first explore an exact inference scheme also known as clustering technique [19], where
the original DAG is transformed into special tree of cliquessuch that the total message passing
between cliques will update the overall probability of the system. We then explore a stochastic
inference strategy, named Probabilistic Logic Sampling (PLS) [24], where a full instantiation of
the probabilistic network is collected based on a simplifiedimportance function. The sampling is
stopped when the probabilities of the nodes converge.It is worth pointing out that unlike simulative
approaches that sample the inputs, importance sampling based procedures generate instantiations
for the whole network, not just for the inputs.These samples can be looked upon as Markov Chain
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 5
j_i)P(T
SEU MODELING
Alexandrescu et al. ’02 [2]
Mohanram et al. ’03 [1]
Shivakumar et al. ’02 [4]
Karnik et. al. ’ 04 [8]
Violante ’03 [5]
Mohanram et al. ’03 [1] Mohanram et al. ’03 [1]
Karnik et. al. ’ 04 [8]
Degalahal et al ’04 [9]
Karnik et. al. ’ 04 [8]
Simulative ProbabilisticR
HP(Q
L| Tj_i )P(SEU )j
Mohanram et al. ’03 [1]
Degalahal et al ’04 [9]
Karnik et. al. ’ 04 [8]This work ’05
Other
Zhao et al ’04 [13]
Zhao et al ’04 [13]
Zhao et al ’04 [13]
Dhillon et al. ’05 [10]
Samudrala et al. ’04 [12]Krishnaswamy et al. ’04 [11]
Zhang et al. ’04 [15]
Seifert et al. ’02 [16]
Hazucha et al. ’00 [18] Zhang et al. ’04 [15]
Seifert et al. ’02 [16]
Hazucha et al. ’00 [18]
Zhang et al. ’04 [15]
Seifert et al. ’02 [16]
Zhang et al. ’04 [15]
Seifert et al. ’02 [16]
Hazucha et al. ’04 [17]
Hazucha et al. ’00 [18]
Omana et al ’03 [23]
Fig. 2. Recent Works on SEU Modeling.
sampling of the circuit state space.
The remainder of this paper is organized as follows. SectionII is a summary of the prior
works done on soft error modeling and analysis. In Section III, we give an outline of our mod-
eling. We discuss our model in detail in Section IV, explaining the timing issues, features of
Bayesian network-based modeling, and the proposed TALI-SES model, which can be used to esti-
mate the SEU sensitivities of individual nodes. This is followed by section V on Bayesian inference
where we discuss both exact and approximate(stochastic) inference schemes. In section VI we give
experimental results using both exact and stochastic inference. Using exact inference we can char-
acterize the input space to achieve zero output error even inthe presence of some of the SEUs. The
exact inference works well for small circuits. To handle larger circuits we use a stochastic infer-
ence scheme and compare our results with logic simulation results and found that our modeling is
accurate (close-to-zero error) and efficient.
II. BACKGROUND
Figure 2 gives a list of the recent works done on soft error analysis. An estimation method
for soft error failure rates resulting from Single Event Upsets proposed in [1] computes soft error
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 6
susceptibility of a node based on the rate at which a Single Event Upset (SEU) occurs at the node(RSEU), the probability that it is sensitized to an output(Psensitized) and the probability that it is
captured by a latch(Platched). In [2], Alexandrescuet al. present a SET fault simulation technique
to evaluate the soft error probability caused by transient pulses. A model that captures the effects
of technology trends in the Soft Error failure Rates (SER), considering different types of masking
phenomena such as electrical masking, latching window masking and logical masking, is presented
in [4]. Another model to analyze Single Event Upsets withzero-delaylogic simulation, which is
accurate and faster than timing simulators, is presented in[5]. As discussed in the previous section,
this model uses a circuit expansion algorithm to incorporate gate delays and a fault list generation
algorithm to get a reduced list of SETs. All of the above methods use simulation techniques which
are highly input pattern dependant.
Zhaoet al. proposes a methodology to evaluate softness or vulnerability of nodes in a circuit
due to compound noise effects by considering the effects of electrical, logical and timing mask-
ing [13]. They use a probabilistic approach to estimate the effect of logical masking. However this
method can not be used for circuits with re-convergent pathsand will not be able to handle larger
circuits. Also, this method doesn’t capture the effect of gate delays.
A selective triple modular redundancy technique (STMR) forachieving radiation tolerance in
FPGA designs is discussed in [12].A mathematical model to estimate the possible propagation of
glitches due to transient faults has been presented in [23].However they show results only for a
very small circuit.
Karnik et al. suggests that soft error rate should be considered as a design parameter along
with power, performance and area due to its increasing impact on circuits and systems with the
scaling of process technology [8]. They propose a methodology to quantify the impact of supply
voltage, transistor size, circuit topology, doping as wellas circuit structure on the Soft Error Rate
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 7
(SER) of a chip. effect of threshold voltage on SER of memories and combinational logic has been
studied in [9]. Zhanget al. in [15] proposed a composite soft error rate analysis method(SERA)
to capture the effect of supply voltage, clock period, latching window, logic depth, circuit topology
and input vector on soft error rate. However, they resort to logic and circuit level simulation to
capture these probabilities. This method uses a conditional probability based parameter extraction
technique obtained from device and logic simulation. In their work, combinational circuits are
assumed to have unbalanced re-convergent paths. However, other design considerations usually
drive optimal circuit design to have balanced paths by adding buffers wherever re-convergence
is necessary. For circuits with balanced paths, soft error analysis based on approximations given
in [15] might not be the best choice.
Seifertet al. discusses the importance of latch design on the soft error rate (SER) of core
logic [16]. It also analyses the impact technology scaling on SER at devise, circuit and chip level.
Relation between soft error rate and technology feature size based on device level simulation has
been studied in [18].
Since all the state-of-the-art techniques have resorted tosimulation for logical and device
level effects (known to be expensive and pattern-sensitiveespecially for low probability events),
we felt the need to explore the input data-driven uncertainty in a comprehensive manner through a
probabilistic model to capture the effect of primary inputs, the effect of gate delay and the effect of
SEU duration on the logical masking. There is future scope for these kinds of models to be fused
with other models [8], [9], [15], [16], [18] for capturing device effects such as electrical masking,
threshold voltage and supply voltage.
III. OVERVIEW
We model the effect of single event upsets produced at an internal node of a circuit on the
circuit output, by computing the joint probability distribution described by Eq. 2.
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 8
P(Tj i) = ∑j ;fIlg;Xk;k6= j
P(Tj i; I0; � � � Il ; � � � IN;X1; � � � ;Xk; � � �XM) (2)
whereP(Tj i) is the probability that an SEU generated at an internal nodej causes an erroneous
signal at outputi. Tj i is a test signal which compares the ideal output signal at theith output
with the corresponding error-sensitized output caused by an SEU at thejth node. IfTj i = 1, it
indicates the occurrence of an error at output i due to an SEU at j. P(Tj i) depends on the N input
signalsI0; � � � ; IN, M internal signalsX1; � � � ;XM, and the type of SEU atj (SEU1 caused by 0-1-
0 transition orSEU0 caused by 1-0-1 transition). Ideally, the real effect of SEUat ith output is
product ofP(Tj i) andP(SEUj), whereP(SEUj) is the probability that a particle hit at a nodej
produces an SEU at that node and it depends on process parameters such asVdd andVth and also
depend dynamically on temperature. With reduced supply voltages and diminishing dimensions,
this probability will be very close to one. In this work, we assume that a particle hit occurring
at a node generates an SEU and henceP(SEUj) = 1. In Eq. 2, the probabilityP(Tj i) does not
consider the transient nature of SEU. For example, the SEU effect may reach the output for a short
time span, but the output signal can be reinforced to its correct value before it is sampled by the
latch. SEU propagation depends on the gate delays and SEU duration. Letth be the time when an
SEU originates at a node,δ be the SEU duration,ts be the time when outputs are sampled andΠ
be the set of propagation delays(td) of sensitized paths from the node to the circuit outputs. Nodes
satisfying the following conditions do not cause soft error[5]:
th+δ+ td < ts 8td 2 Π: (3)
Even though the above empirical formula doesn’t take into account of set up and hold time require-
ments which affect latching window masking, we use this equation for our modeling because this
is pretty accurate as far as logical masking effect, circuitstructure and gate delays are concerned.
To capture the effect of gate delays and SEU duration, we do a time-space transformation of
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 9
the original circuit, by means of a circuit expansion algorithm similar to that presented in [5].
Our model captures not only the effect of gate delays, but also effect of difference in path delays
(arrival times) between the input signals of gates assuminggate delay is equal to its fan-out. In the
expanded circuit, each gate is replicated several times corresponding to the time instants at which
the gate output is evaluated. The circuit outputs are also replicated.
Thus each of the random variables in Eq. 2 represent a set of variables at different time frames.
Ii = fIi;0; Ii;1g whereIi;0 is the input signal value ofIi at time instant just before the occurrence of
a clock cycle andIi;1 is the new input signal after the clock pulse is applied. Signal fIi;1g remains
the same throughout the clock cycle.Xi = fXi;tkg8tk wheretk is the signal evaluation time.
Only the final output values - output signals arriving at the latching window - are captured by
the latch. An SEU is effect-less if it doesn’t cause a bit flip in the final outputs. We arrive at a
reduced SEU list by considering only those SEUs whose effectreach the final outputs - outputs at
the sampling time frames. We also modify the expanded circuit by removing parts of the circuit
which do not generate and propagate soft-error-causing-SEUs (discussed in section IV A).
TALI-SES is a Directed Acyclic Graph we build from the expanded circuit and the reduced
SEU list to capture the effect of each SEU at a node to the output. This model consists of the ideal
time-space transformed circuit without any SEUs and a set ofduplicate logic blocks to propagate
the SEU effects. Outputs from the SEU sensitized duplicate blocks are compared with correspond-
ing outputs of the ideal circuit. If those signal values are not the same, it indicates that the SEU
causes an error at the output. We discuss TALI-SES construction in section IV D. The salient
features of modeling SEU by Bayesian Network are as follows.
1. We provide a comprehensive model for the underlying errorframework using a graphical
probabilistic Bayesian Network based model TALI-SES that is causal, minimal and exact.
2. We can model the effect of timing and transient nature of the SEU’s along with the accurate
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 10
modeling of re-convergence in the circuit.
3. This model captures the data-driven uncertainty in the modeling of soft error that can be
used where exact input patterns are not known apriori and also can be used by building a
probabilistic model in case data traces are available by learning algorithm [31], [32].
4. We infer error probabilities by (1) exact inference that transforms the graph into a special
junction tree structure and relies on local message passingscheme and also by (2) smart
stochastic non-simulative inference algorithms that havethe feature of any-time estimates
and generates excellent accuracy time trade-off for largercircuits.
5. Bayesian Networks are unique tool where effect of an observation at a child node can be
used to get a probability space of the parents. This is calledbackward reasoning. Our
model can be used to generate input space for which the SEU occurring at a particular node
j might have no impact on the outputs. Note that in such case, hardening techniques will
not be needed for nodej. Similarly, we can find input space for which SEU at a nodej
cause high error probability at outputs. If the data trace issimilar to the second type of
input space, extensive hardening techniques need to be applied to j.
IV. THE PROPOSEDMODEL
In this section, we first focus on handling the timing aware feature of our probabilistic model,
followed by the fault list construction. We conclude the section with discussion about the model
itself, given the timing-aware graph and the fault list.
A. Timing Issues
We first expand the circuit by time-space transformation of the original circuit, without chang-
ing its functionality. The approach is similar to the methoddiscussed in [5], [21]. Fig. 3 is the
expanded circuit of benchmarkc17. A gate in the original circuitC will have many replicate gates
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 11
22,4
22,6
16,3
3,12,1
22,3
1,1
19,2
10,2
11,016,0
19,0,
7,06,03,0
22,010,0
10,3
7,1
23_0
19,5
t = 6
t = 2
t =
2,0
3
t = 4
t = 5
1,0
16,5
t = 1
t = 0
19,3
10,4
10,5
19,4
23,6
11,3 23,3
6,1
23,4
22,5
23,5
Fig. 3. Time-space transformed circuit of benchmark c17, modeling all SEUs.
in the expanded circuitC0, corresponding to different time-frames at which the gate output is eval-
uated. The output evaluation timefTg of each gate in the circuit is calculated based on variable
delay model. We assume that the delay associated with a gate is equal to its fan-out. For each
gateg whose output is evaluated at timet 2 fTg a replicate nodeg; t is constructed. In addition to
these replicate gates, we insert some duplicate gates (shown by filled gate symbols in Fig. 3). We
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 12
3,0
22,6
16,3
11,0
6,13,12,1
t = 6
t = 2
t =
19,5
3
t = 4
t = 516,5
10,5
t = 1
t = 0
10,4
19,4
11,3
7,1
23,6
22,5
23,5
1,1
6,0
Fig. 4. Modified time-space transformed circuit of benchmark c17, modeling only the possibly sensitized SEUs.
explain the reasons for adding these duplicate gates later in this section.
The inputs ofg; t are the replicate nodes of the gates, which are the inputs ofg in the original
circuit and belongs to the time-framest 0 < t. We consider the value of signali at timet by (i; t).Now the random variable that represents the value of a signali at timet is denoted byXi;t. The cir-
cuit outputs reach steady state values,X22;0 andX23;0 at t = 0, after the application of the previous
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 13
inputs,fX1;0;X2;0; X3;0; X6;0; X7;0g. Let the new inputsfX1;1;X2;1; X3;1; X6;1; X7;1g be applied at
t = 1. X10;2 is the signal value at the output of gate 10 at time instant 2.
We insert a few duplicate gates (example:(10;4), (10;5), (19;5), etc. shown by filled gate
symbols) due to the following reasons:� Input signals of certain gates in the circuit might have different arrival time due to the
difference in path delays. In order to model the effect of anySEU generated at the junction
of the gates at time instants, later than the signal’s arrival time, we insert additional duplicate
nodes for those internal signals with less path delay. For example, in Fig. 3, input signals
to gate 22 have path delays 2 and 5 respectively. The final output signal(22;6) is evaluated
with input signals(16;5) and(10;5). If no SEUs originated at the output of gate 10 between
time instants 2 and 5,(10;2) and(10;5) would be the same. However, in the event an SEU
occurs at node 10 att = 5, (10;2) and(10;5) may differ depending on the inputs, which
can cause a wrong output signal at(22;6). We model the effect of SEU at(10;5) by
introducing a duplicate gate(10;5) whose inputs are(1;1) and(3;1). Similarly, (10;3),
(10;4), (19;4)and (19;5) are other duplicate gates.� Duplicate gates also model the masking effect of some of the SEUs generated in the signal
path of the input having lesser path delay. Example: Duplicate gate(10;5) mask the effect
of an SEU originated at the output of gate 10, at timet = 2. Thus we can arrive at a reduced
SEU list which is further explained later in this section.
Steps for constructing the timing-aware expanded circuit,based on fan-out dependent delay model
are the following:
1. Arrange gates in the order of levels, with the level of input gates equal to zero.
2. Include all gates that are present in the original circuit. Output signals of these gates
represent the steady state signal values att = 0, before the application of new inputs.
3. Add additional input nodes representing new input signalvalues att = 1;
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 14
4. For each level of the circuit starting from levell i = 1, repeat the following step:
For each gateg in level l i , create replicate gates at time framet =tp + fg, wheretp is the
maximum time frame of the previously inserted parent gates of g and fg is the fan-out of
gate g. Update time frames of gateg.
Output signals of a circuit are sampled att = ts, wherets is the maximum of the latest signal
arrival times of the output signals. SEUs which do not satisfy Eq. 3 affect circuit outputs resulting
in soft errors. These SEUs are the upsets generated at the output of gates, which are in the fan-in
cones of final outputs (outputs evaluated at timets). SEUs occurring at certain other gates, which
are not in the fan-in cones of the final outputs, may also affect circuit outputs. These nodes arise
due to the SEU duration timeδ. For example in Fig. 3, we see that the final outputs are generated at
time instant t=6. If an SEU occurs at signal 19 at 4 ns and lastsfor one time unit, it will essentially
be capable of tampering the value of node 23 at 6 ns. Note that we assume thatδ is one time unit.
The fault list will be different if we change the value ofδ. Thus we can see that SEUs which are
sensitized to outputs at time frames betweents andts� δ may cause soft errors, depending on the
input signals and circuit structure.
Considering the above factors, we modify the expanded circuit by including only those gates
that propagate SEUs to the outputs between time instants,ts andts�δ. Thus we get a considerable
reduction in the circuit size. Fig. 4 is the modified expandedcircuit of c17, which models all SEUs
possibly sensitized to a final output.
Next, we discuss how to generate a list of possible SEUs affecting the circuit outputs. Not all
gates in Fig. 4 are SEU sensitive. As discussed above, a duplicate node introduces an additional
delay of at least one time unit. If the delay introduced by a duplicate gate is greater than or equal
to δ, the SEU duration time, the effect of SEUs originated at any of the gates in the fan-in cones
of the duplicate gate is nullified and correct signal value isrestored at the output of the duplicate
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 15
TABLE I
GATE DELAYS BASED ON LOGICAL EFFORT
Gate Type DelayInverter f an� out+Pinv
n-input NAND n+23 � f anout+ nPinv
n-input NOR 2n+13 � f anout+Pinv
2-input XOR 4� f anout+ 4nPinv
gate, and hence those SEUs are effect-less. Thus we create a reduced list of SEUs by traversing
the modified extended circuit from each of the circuit outputs at time instants betweents andts�δ,
until a duplicate gate or an input node is reached.
B. Delay Modeling Based on Logical Effort
We extend this work by using logical effort based model whichis dependent on fan-out, input
capacitance as well as parasitic delay. In this section we explain how gate delays are calculated
based on logical effort [30]. Delay of a logic gate can be expressed as the sum of two components,
effort delay and parasitic delay. effort delay is the product of logical effort and electrical effort,
where logical effort is defined as the relative ability of a gate topology to deliver current and
electrical effort is the ratio of output capacitance to input capacitance. Electrical effort is sometimes
called fan-out. Mathematically, gate delay is expressed asd = f + p = gh+ p where f is effort
delay, p is the parasitic delay,g is the logical effort andh is electrical effort. Logical effort is
defined to be 1 for an inverter. Hence logical effort is the ratio of input capacitance of a gate to the
input capacitance of an inverter delivering the same outputcurrent. It can be estimated counting
capacitance in units of transistor width. Parasitic delay represents delay of a gate driving no load
and it depends on diffusion capacitance. parasitic delay ofan inverter,Pinv � 1. From the above
considerations, we compute basic CMOS gate delays and use these delay values in our model.
Table below shows the delay expressions for basic gates.
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 16
Circuit expansion is performed in a similar way as explainedin the above section. Each gate
is replicated several times corresponding to the time frames at which new gate output signals are
evaluated. Here, gate output evaluation time is based on delay values calculated as above. This
is illustrated in Figure 4 which shows how benchmark circuitc17 is expanded with logical effort
based gate delay model. Delay of a 2-input nand gate with one fan-out is calculated as 3.33 time
units and that of a 2-input gate nand gate with 2 fan-out is 4.67 time units. Final output is evaluated
at time unitTs = 13:67. From this expanded circuit, we arrive at a reduced circuit by traversing
backward from outputs evaluated atTs andTs� δ until a duplicate gate or an input is reached,
thereby modeling only the possibly sensitized SEUs.
We report results obtained from logical-effort based delaymodel in a separate section (Sec-
tion VI C).
C. Bayesian Networks
A Bayesian network [25] is a Directed Acyclic Graph (DAG) in which the nodes of the network
represent random variables and a set of directed links connect pairs of nodes. The links represent
causal dependencies among the variables. Each node has a conditional probability table (CPT)
except the root nodes. Each root node has a prior probabilitytable. The CPT quantifies the effect
the parents have on the node. Bayesian networks compute the joint probability distribution over
all the variables in the network, based on the conditional probabilities and the observed evidence
about a set of nodes.
Fig. 6 illustrates a small Bayesian network. The exact jointprobability distribution over the
variables in this network is given by Eq. 4.
P(x6;x5;x4;x3;x2;x1) = P(x6jx5;x4;x3;x2;x1)P(x5jx4;x3;x2;x1)P(x4jx3;x2;x1)P(x3)P(x2)P(x1): (4)
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 17
22,13.67
19,10.34
,
t = 1.0
1,0
23,12.33t = 12.33
11,0
19,5.67
10,10.34t = 10.34
10,5.67
t = 4.3319,4.33
t = 0.0
16,10.34
7,0
10,4.33
7,16,1
16,0
19,9.0023_9.00
3,12,11,1
19,0
t = 9.00
22,9.00
11,5.6716,5.67
,
,
t = 5.67
,6,0
23_7.66
22,7.66
22,010,0
3,0
t = 7.66,
2,023_0
t = 13.6723_13.67
Fig. 5. Time-space transformed circuit of benchmark c17 with Logical Effort Based Delay Model
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 18
X 1 X 2 X 3
X 5
X 6
4X
Fig. 6. A small Bayesian network
In this BN, the random variable,X6 is independent ofX1, X2 andX3 given the states of its parent
nodes,X4 andX5. Thisconditional independencecan be expressed by Eq. 5.
P(x6jx5;x4;x3;x2;x1) = P(x6jx5;x4) (5)
Mathematically, this is denoted asI(X6;fX4;X5g;fX1;X2;X3g). In general, in a Bayesian network,
given the parents of a noden, n and its descendents are independent of all other nodes in thenet-
work. LetU be the set of all random variables in a network. Using the conditional independencies
in Eq. 5, we can arrive at the minimal factored representation shown in Eq. 6.
P(x6;x5;x4;x3;x2;x1) = P(x6jx5;x4)P(x5jx3;x2)P(x4jx2;x1)P(x3)P(x2)P(x1): (6)
In general, ifxi denotes some value of the variableXi andpa(xi) denotes some set of values
for Xi ’s parents, the minimal factored representation of exact joint probability distribution overm
random variables can be expressed as in Eq. 7.
P(X) = m
∏k=1
P(xkjpa(xk)) (7)
In Fig. 6, it can be seen that nodesx4 andx5 are dependent since they have a common par-
ent. Even though this dependency is not shown in the initial Bayesian net graph structure, during
Bayesian inference process these dependencies are taken care of by a process known as moral-
ization where each pair of unconnected nodes having a commonchild node are connected by an
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 19
16,519,5
22,623,6
10,5
TT ((16,5)-22,6))
23,6 s 22,6 s
XX22, 6s s
XT(16,5)-(23,6)) T((16,5)-22,6))
X
19, 5X
X
16, 5 10, 5XX
22, 6X23, 6
((16,5)-23,6))
16,51
s
23, 6
16,5Xs
1
Fig. 7. (a) An illustrative SEU sensitivity logic for a subset of c17. (b) Timing-aware-Logic-induced-DAG model ofthe SEU sensitivity logic in (a)
undirected graph, making every parent child set a complete sub graph. We explain this aspect of
Bayesian inference in detail under Section V.
D. TALI: Timing-aware-Logic-induced Soft error model
In this section, we first describe the proposed Bayesian network based model, which can be
used to estimate the soft error sensitivity of logic blocks.This model captures the dependence of
SEU sensitivity on the input pattern, circuit structure andthe gate delays. Note that this probabilis-
tic modeling does not require any assumptions on the inputs and can be used with any biased work-
load patterns. The proposed model, Timing-Aware-Logic-Induced-Soft-Error-Sensitivity (TALI-
SES) Model is a Directed Acyclic Graph (DAG) representing the time-space transformed, SEU-
encoded combinational circuit,< C0;J > whereC0 is the expanded circuit created by time-space
transformation as discussed in section. A andJ is the set of possible SEUs (also discussed in sec-
tion A). The error detection circuit consists of the expanded circuitC0, an error sensitization logic
for each SEU and a detection unitT consisting of several comparator gates. We explain it with the
help of a small example shown in Fig 7(a), which is the error detection circuit for a small portion
of benchmark c17. The error sensitization logic for an SEU atnode j consists of the duplicate
descendant nodes fromj. In Fig. 7(a), the block with the dotted square is the sensitization logic
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 20
for 16;51s [An SEU1 at node 16 at timet = 5]. It consists of nodes 22;5s and 22;6s descending
from node 16;5 of the time-space transformed circuit. For simplicity, weshow the modeling of
only one SEU in this example. Our model can handle any number of SEUs simultaneously. Each
SEU sensitization logic has an additional input to model theSEU. Example: inputSEU116;5. This
input signal value is set to logic one in order to model the effect of a 0-1-0 SEU occurring at node
16 at time frame 5.
As discussed previously in section A, an SEU lasting for a durationδ can cause an erroneous
output if its effect reaches the output at any instant between the sampling timets and time frame
ts� δ. In this work we assumeδ to be one. Hence we get error sensitized outputs at time framets
and for some SEUs atts� 1 also, if there exist re-convergent paths between SEU location and an
output. We need to compare the SEU-free output signals evaluated at the sampling time,ts with the
corresponding SEU-sensitized output signals arriving atts� 1 andts. Hence these signals are sent
to a detection unitT. The comparators in the detection unit compare the ideal anderror sensitized
outputs with the corresponding error-free outputs and generate test signals. For example, the test
signals for an SEU at nodej at timet areT( j ;t) (i;ts) andT( j ;t) (i;ts�1). If any of these the test signal
value is 1, it indicates the occurrence of an error. The probability P(T( j ;t) i), which is a measure of
the effect of SEU( j; t)s on the output nodei is computed as a joint probability which is explained
below:
Let A be an event that an SEU at nodej causes a bit-flip at outputi at timets and letB be an
event that an SEU at nodej causes a bit-flip at outputi at timets� 1. P(A= 1) is the probability
of occurrence of error and at timets. P(A= 0) is the probability that SEU doesn’t cause an error at
ts. P(B) can be explained in a similar way. The Error probability due to an SEU at nodej at timet
w.r.t. outputi is the joint probability
P(A[B) = P(A= 1;B= 0)+P(A= 0;B= 1)+P(A= 1;B= 1) (8)
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 21
which is expressed as:
P(T( j ;t) i) = P(T( j ;t) (i;ts)[T( j ;t) (i;ts�1)): (9)
An SEU can have effect on more than one output. The overall effect of an SEU( j; t)s on the
outputs is computed asP(T( j ;t)) = max8ifP(T( j ;t) i)g. In the example the SEU(16;5)s is sensitized
to outputs 22,6 and 23,6. Hence the two test signals for this SEU areT(16;5) (22;6) andT(16;5) (23;6).An SEU occurring at nodej at timet, which is eitherSEU1 or SEU0 (but not both),can cause a
bit-flip at the output with probabilityP(T1j ;t) or P(T0
j ;t). In order to compute the SEU sensitivity of
a node, we take the worst case probability, which is the maximum of the above two probabilities.
P(Tj ; t) = maxfP(T1( j ;t));P(T0( j ;t))gMore than one SEU can originate at a node at different time frames. Considering the effect of
SEUs at node j at all time frames, we compute the worst case output error probability due to node
j asP(Tj) = max8tfP(T( j ; t))g, which is the maximum probability over all time frames.
These detection probabilities depend on the circuit structural dependence, the inputs, depen-
dencies amongst the inputs, gate delays and the SEU duration. In this work we assume random
inputs for experimentation and validation of our model.
We construct the TALI-SES Bayesian Network of the SEU detection circuit by nodes which
are random variables representing signal values of the SEU detection circuit. A signali in the
detection circuit is represented by the random variableXi in the Bayesian Network.
In TALI-SES DAG structure the parents of each node are its Markov boundary elements.
Hence the TALI-SES is a boundary DAG. For definition of MarkovBoundary and boundary DAG,
please refer to [25]. Note that TALI-SES is a boundary DAG because of the causal relationship
between the inputs and the outputs of a gate that is induced bylogic. It has been proven in [25] that
if graph structure is a boundary DAGD of a dependency modelM, thenD is a minimal I-map of
M ( [25]). This theorem along with definitions of conditional independencies, in [25] (we omit the
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 22
details) specifies the structure of the Bayesian network. Thus TALI-SES DAG is a minimal I-map
and thus a Bayesian network (BN).
Quantification of TALI-SES-BN : TALI-SES-BN consists of nodes that are random variables
of the underlying probabilistic model and edges denote direct dependencies. All the edges are
quantified with the conditional probabilities making up thejoint probability function over all the
random variables. The overall joint probability function that is modeled by a Bayesian network
can be expressed as the product of the conditional probabilities. Let us say,X0 = fX
01;X0
2; � � � ;X0mg
are the node set in TALI-SES Bayesian Network, then we can say
P(X0) = m
∏k=1
P(x0kjPa(X0
k)) (10)
wherePa(X0k) is the set of nodes that has directed edges toX
0k. A complete specification of the
conditional probability of a two input AND gate output will have 23 entries, with 2 states for
each variable. These conditional probability specifications are determined by the gate type. By
specifying the appropriate conditional probability we ensure that the spatial dependencies among
sets of nodes (not only limited to just pair-wise) are effectively modeled.
V. BAYESIAN INFERENCE
We explore two inference schemes for the TALI-SES. The first inference scheme is cluster
based exact inference and the second one is based on stochastic inference algorithm which is an
approximate non-simulative scalable anytime method.
A. Junction Tree Based Inference
We demonstrate this inference scheme with a running exampleshown In Fig 8. The combina-
tional circuit is shown in Fig. 8a and a subset of the time transformed circuit in shown in Fig 8b.
The Bayesian Network captures the effect of SEU of “zero” at node 5 at a time instant 2 unit
(denoted by the random variableX5;2s0 on the output signal 6 at 3 time unit(denoted by random
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 23
variableX6;3). Note that the error in output signalX6;3 is T6 (5;2)) which is an xor combination of
X6;3 andX6;3S whereX6;3S is the node that captures the effect of SEU at node 5 at 2 time unit. This
is the original TALI-SES Bayesian Networks that we further process for exact inference.
The first step of the exact inference process is to create an undirected graph structure called
the moral graph(denoted byDm) given the Bayesian network DAG structure (denoted here by
D). The moral graph represents the Markov structure of the underlying joint function [29]. The
dependencies that are preserved in the original DAG are alsopreserved in the moral graph [29].
From a DAG, which is the structure of a Bayesian network, a moral graph is obtained by adding
undirected edges between the parents of a common child node and dropping the directions of the
links. Fig. 9a shows the undirected moral graph and the dashed edges are added at this stage.
This step ensures that every parent child set is a complete sub graph. Moral graph is undirected
and due to the added links, some of the independencies displayed in DAG will not be graphically
visible in moral graph. Some of the independencies that are lost in the transformation contributes
to the increased computational efficiency but does not affect the accuracy [29]. The independencies
that are graphically seen in the moral graph are used in inference process to ensure local message
passing.
The moral graph is said to be triangulated if it is chordal. The undirected graph G is called
chordal or triangulated if every one of its cycles of length greater than or equal to 4 possesses a
chord [29] that is we add additional links to the moral graph,so that cycles longer than 3 nodes
are broken into cycles of three nodes. Note that in this particular example, moral graph is chordal
and no additional links are needed. The junction tree is defined as a tree with nodes representing
cliques (collection of completely connected nodes) of the choral graph and between two cliques
in the tree T there is a unique path. Junction tree possesses aproperty called running intersection
that ensures that if two cliques share a common variable, thevariable should be present in all the
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 24
X1
X2
X3
X5
X4
X6
X3,1
X2,1
X1,1
X5,2S0
X5,2
X4,2
X6,3
T6_(5,2)
X6,3S
(a) (b)
Fig. 8. (a) A small Logic circuit (b) Time transformed Bayesian Network
cliques that lie in the unique path between them. Fig 9b showsthe junction tree. Note that every
clique in the moral graph is a node (exampleC1= [X6;3;X5;2;X4;2) in the junction tree. Also, C6
and C7 have nodeX4;2 in common, henceX4;2 is present in all the cliques namely C2,C1, and C4
that lie between the unique path between C6 and C7. This property of junction tree is utilized
for probabilistic inference so that local operation between neighboring cliques guarantees global
probabilistic consistency.
Chordal graphs are essential as they guarantee the existence of at least one junction tree. Hence
chordalization is a necessary step. There are many algorithms to obtain junction tree from chordal
graph and we use a tool HUGIN [19] that uses minimum-fill-in heuristics to obtain a minimal
chordal and junction tree structure.
Since every child parent team is present together in one of the cliques, we initialize the clique
joint probabilities by the original joint probability of a child parent team. We then use a message
passing scheme to have consistent probabilities. Suppose we have two leaf clique in the junction
tree say in our example in Fig 9b C7 and C3. Both the cliques areinitialized based on the child
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 25
parent team (C3 by nodes 3,2 and 5 and C7 by node 1,2, 4). Similarly, C6, C1 and C5 are initialized.
The initial clique probability of clique Ci is termed asφCi and is also called potential of a clique.
Let us now consider two neighboring cliques to understand the key feature of the Bayesian
updating scheme. Let two cliquesCl andCmhave probability potentialsφCl andφCm, respectively.
Let S be the set of nodes that separates cliquesA and B (Example: S= fX6;3;X6;3Sg between
cliques C4 and C5 in Fig. 9b). The two neighboring cliques have to agree on probabilities on
the node setSwhich is their separator. To achieve this we first compute themarginal probability
of S from probability potential of cliqueCl and then use that to scale the probability potential of
Cm. The transmission of this scaling factor, which needed in updating, is referred to as message
passing. New evidence is absorbed into the network by passing such local messages. The pattern
of the message is such that the process is multi-threadable and partially parallelizable. Because the
junction tree has no cycles, messages along each branch can be treated independently of the others.
Note that since junction tree has no cycle and it is also not directional, we can propagate
evidence from any node at any clique and the propagate the evidence in any direction. It is in sharp
contrast with simulative approaches where flow of information always propagate from input to the
outputs. Thus, we would be able to use it for input space characterization for achieving zero output
error due to SEUs. We would instantiate a desired observation in an output node (say zero error)
and backtrack the inputs that can create such a situation. Ifthe input trace has large distance from
the characterized input space, we can conclude that zero error is reasonably unlikely. Note that
this aspect of probabilistic aspect is already used in medical diagnosis but are new in the context
of input space modeling for soft error.
This exact inference in expensive in terms of time and hence for larger circuits, we explore
a stochastic sampling algorithm, namely probabilistic Logic Sampling (PLS). This algorithm has
been proven to converge to the correct probability estimates [24], without the added baggage of
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 26
X3,1
X2,1
X1,1
X5,2S0
X5,2
X4,2
X6,3
T6_(5,2)
X6,3S
C7=[(X4,2), (X1,1),(X2,1)]
C1=[(X6,3), (X4,2),(X5,2)] C3=[(X5,2),(X2,1),(X3,1)]
C5=[(X6,3), (T_6,(5,2)) ,(X6,3S)]C4=[(X6,3),(X4,2), (X6,3S)]
C6=[(X6,3),(X4,2), (X5,2S0)]
C2=[(X4,2), (X5,2),(X2,1)]
[(X5,2), (X2,1)]
[(X6,3), X(6,3s)]
(a) (b)
Fig. 9. (a) Chordal Graph (b) Junction Tree
high space complexity.
B. Probabilistic Logic Sampling (PLS)
Probabilistic logic sampling is the earliest and the simplest stochastic sampling algorithms
proposed for Bayesian Networks [24]. Probabilities are inferred by a complete set of samples
or instantiations that are generated for each node in the network according to local conditional
probabilities stored at each node. The advantages of this inference are that: (1) its complexity
scales linearly with network size, (2) it is an any-time algorithm, providing adequate accuracy-time
trade-off, and (3) the samples are not based on inputs and theapproach is input pattern insensitive.
The salient aspects of the algorithm are as follows.
1. Each sampling iteration stochastically instantiates all the nodes, guided by the link struc-
ture, to create a network instantiation.
2. At each node,xk, generate a random sample of its state based on the conditional probabil-
ity, P(xkjPa(xk)), wherePa(xk) represent the states of the parent nodes. This is the local,
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 27
importance sampling function.
3. The probability of all the query nodes are estimated by therelative frequencies of the states
in the stochastic sampling trace.
4. If states of some of the nodes are known (evidence), such asin diagnostic backtracking,
network instantiations that are incompatible with the evidence set are disregarded.
5. Repeat steps 1, 2, 3 and 4, until the probabilities converge.
The above scheme is efficient for predictive inference, whenthere is no evidence for any node,
but is not efficient for diagnostic reasoning due to the need to generate, but disregard samples that
do not satisfy the given evidence. We adopt the tool GeNie [20] for inference using Probabilistic
Logic Sampling.
Complexity: The computational complexity of the exact method is exponential in terms of
number of variables in the largest cliques. Space complexity of the exact inference isn:2jCmaxj [3],
where n is the number of nodes in the Bayesian Network, andjCmaxj is the number of variables in
the largest clique. The time complexity is given byp:2jCmaxj [3] where p is the number of cliques.
The time complexity, based on the stochastic inference scheme, is linear inn, the number of
nodes in the expanded circuit, specifically, it isO(njNSEUjN), whereNSEU is the number of SEUs
andN is the number of samples.
VI. EXPERIMENTAL RESULTS
We demonstrate the modeling of SEU based on TALI-SES using ISCAS benchmark circuits.
The logical relationship between the inputs and the output of a gate determines the conditional
probability of a child node, given the states of its parents,in the TALI-DAG.
In Table II we report the total number of gates in the actual circuit (column 2), total number
of gates in the modified expanded circuit (column 3), and the total number of nodes in the resulting
TALI-SES (column 4). Column 5 lists the maximum time-framesof the circuits.
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 28
TABLE II
SIZE OF ORIGINAL AND TIME-EXPANDED ISCAS CIRCUITS FOR FANOUT-DEPENDENT DELAY MODEL
Gates Gatesex-panded
# ofnodes(TALI)
Timeframes
c432 196 476 1989 55c499 243 464 1596 30c880 443 729 2552 51
c1355 587 1440 3388 55c1908 913 1524 18118 79c2670 1426 2584 4097 81c3540 1719 3795 15670 93c5315 2485 4887 13228 90c6288 2448 30113 31157 263c7552 3719 10006 45907 88
We compute the SEU sensitivity of an individual nodeP(Tj) in a circuit as follows:
1. Compute the output error probability at output nodei due to an SEU at node j at time t by
taking the joint probabilities as discussed in section IV D.
P(T( j ;t) i) = P(T( j ;t) (i;ts)[T( j ;t) (i;ts�1)) (11)
2. Considering the effect of all SEUs at node j at all possibletime frames, compute the prob-
ability of occurrence of an error at theith output due SEUs at node j by Eq. 12.
P(Tj i) = max8tfP(T( j ;t) i)g (12)
3. Compute the worst case SEU sensitivity of a node j due to anSEU1 andSEU0 and all for
outputs by Eq. 13
P(Tj) = max8ifP(T1
j i);P(T0j i)g (13)
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 29
node j SEU1 SEU0
P(Tj 22) P(Tj 23) P(Tj 22) P(Tj 23)10 0.2813 0 0.4375 011 0.0625 0.2344 0.3125 0.656316 0.3125 0.1875 0.4375 0.437519 0 0.375 0 0.437522 0.4375 0 0.5625 023 0 0.4375 0 0.5625
TABLE III
ESTIMATED P(Tj i) VALUES OF NODES IN BENCHMARK C17 FROM EXACT INFERENCE
A. Exact Inference
In this section, we explore a small circuit c17, with exact inference where we transform the
original graph into junction tree and compute probabilities by local message passing between the
neighboring cliques of the junction tree as outlined in section VA. Note that this inference is
proven to be exact [25], [29](zero estimation error).
Table III tabulates the results of the TALI-SES of benchmarkc17 using the exact inference.
In this table, we report the probabilities of error at outputnodes 22 and 23 due an SEU at each
node j (column 1) namely (10; 11; 16; 19; 22 and 23). Column 2 and 3 of Table III give error
probabilities due toSEU1 (0-1-0 transition) at output nodes 22and23 respectively. Similarly 4 and
5 give error probabilities due toSEU0 (1-0-1 transition) at output nodes 22and23 respectively. We
compare the error-free outputs at 22 and 23 at sampling timets with corresponding error sensitized
outputs arriving at time framests� 1 andts due to SEUs generated at a node at all possible time
frames (as discussed in section IV D). Columns 2, 3, 4 and 5 of Table III reports the maximum
of error probabilities due to SEUs originated at individualnodes at all time frames. From this
table it can be seen that for this benchmark circuitSEU0s have high impact on the output error
probabilities thanSEU1s. Error probability at output node 22 due to anSEU1 at node 11, is very
low (0.0625) whereas error probability at output 22 due toSEU0 at 11 is 0.3125. It also shows
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 30
that the effect of SEUs are not the same over all outputs. For example, anSEU1 at node 19 causes
no error at output 22 whereas error probability due to this SEU at output node 23 is 0.4375. Note
that nodes 22 and 23 are the output nodes. SEUs occurring at these nodes at sampling timets or
time ts�1 will be latched by an output latch, and are expected to causevery high error probability.
However from Table III, it is observed that probability of occurrence of an error due toSEU1 at
node 23 is only 0.4375. Similarly, probability of occurrence of an error due toSEU1 at node 22 is
also 0.4375. This is due to the type of input pattern. In this work, we assume random inputs. This
result shows the dependence of input pattern onP(Tj i).A.1 Input Space Characterization
In this section, we describe the input space characterization for a particular observation explor-
ing the diagnostic (backtracking) feature of the TALI-SES model. Note that this feature makes it
really unique as instead of predicting the effect of inputs and SEU at a node on the outputs, we try
to answer queries like “What input behavior will make SEU at node j definitely causing a bit-flip
the at circuit outputs?” or “What input behavior will be moreconducive to no error at output given
that there is an SEU at node j?” Resolving queries like this, aids the designer in observing the input
space and helps perform input clustering or modeling. Let ustake an example of c17 benchmark.
We explore the input space for studying the effect ofSEU0 andSEU1 at node 19 on errors on both
the outputs (22 and 23). One can characterize input space forany one of the outputs (or in general
effect of SEU at any node on any other subset of nodes). Fig 10acharacterizes the input space for
anSEU0 at node 19 such that no bit-flip occur at the outputs. This is done by setting the output
error probability at zero (by giving “evidence” to the detection nodes in the Bayesian Network) and
then back propagating the probabilities. We plot the probabilities of each inputs 1; 2; 3; 6 and 7
that gives no output error for anSEU0 at 19. Each column in the plot represents an input. The
lighter color represents the probability of that input= 0 and the black color represents the proba-
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 31
0
0.2
0.4
0.6
0.8
1
1.2
In 1 In 2 In 3 In 6 In 7
INPUTS
PR
OB
AB
ILIT
IES
P(in) =1 P(in) = 0
0
0.2
0.4
0.6
0.8
1
1.2
In 1 In 2 In 3 In 6 In 7
INPUTS
PR
OB
AB
ILIT
IES
P(in) =1 P(in) = 0
(a) (b)
0
0.2
0.4
0.6
0.8
1
1.2
In 1 In 2 In 3 In 6 In 7
INPUTS
PR
OB
AB
ILIT
IES
P(in) =1 P(in) = 0
0
0.2
0.4
0.6
0.8
1
1.2
In 1 In 2 In 3 In 6 In 7
INPUTS
PR
OB
AB
ILIT
IES
P(in) =1 P(in) = 0
(c) (d)
Fig. 10. Input probabilities for achieving zero output errors (at nodes 22and23 in presence of SEU’s: (a)SEU0 atnode 19 (b)SEU1 at node 19 (c)SEU0 at node 11 (d)SEU1 at node 11 for c17 benchmark
bility of input = 1 (sum of these two part should always beone). One can see that for obtaining
zero output error with anSEU0 at 19, input 1 can be random, input 2 and 7 have 65% probability
of being at logic one and node 3 and 6 has probability of 30% forlogic 1. Note that the input space
is nearly random (p(1)=p(0)=0.5) whenSEU1 at node 19 produces zero output error at both the
outputs. Similar characteristics are shown in Fig. 10c, 10dfor characterizing the input space with
respect to output errors whileSEU0 or SEU1 occurs at node 11. Once again it can be seen that
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 32
0
500
1000
1500
2000
2500
3000
3500
4000
c432
c499
c880
c1355
c1908
c2670
c3540
c5315
c6288
c7552
Benchmarks
No
. o
f G
ate
s/S
EU
s Listed SEUs Gates
0
50
100
150
200
250
300
350
400
450
500
c432
c499
c880
c135
5
c190
8
c267
0
c354
0
c531
5
c628
8
c755
2
Benchmarks
No
. o
f S
EU
Sen
sit
ive G
ate
s
0.0<p<=0.3 0.3<p<=0.6 0.6>p
(a) (b)
Fig. 11. (a)SEU List-Fanout Dependent Delay Model (b)SEU Sensitivity Range-Fanout Dependent Delay Model,with Delta=1; Input Bias=0.5
zero output error forSEU1 can be more likely by a random inputs than forSEU0.
B. Larger Benchmarks
We use approximate inference for larger circuit using Probabilistic Logic sampling [24] which
is pattern independent random markov chain sampling and hasshown good results in many large
industry-size applications.
In Fig. 11(a), we plot the number of gates and the number of possibly sensitized SEUs for
ISCAS benchmarks. This reduced SEU list was created based onfanout-dependent delay model
and assuming an SEU durationδ equal to one time unit. We get a considerable reduction in the
number of listed SEUs compared to the number of gates in a circuit. This is because reduced
SEU list is generated by traversing backward from the final outputs evaluated at sampling timets
andts� 1 and only those gates that lie between the final outputs and duplicate gates need to be
considered for SEU sensitivity analysis. Depending on the input pattern and the circuit structure,
only a few of these SEUs actually cause soft errors. Based on the estimated SEU sensitivityP(Tj)calculated as in Eq. 13 we classify the SEU sensitive gates ina circuit into three categories, gates
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 33
whereP(Tj) is (i) less than or equal to 0.3 (ii) between 0.3 and 0.6 and (iii) above 0.6. This
is plotted in Fig. 11(b). These results are helpful to apply selective redundancy measures or to
modify P(SEUj) (by changing device features) by giving higher priority to nodes those are in the
high sensitivity range than those in the lower sensitivity ranges. From Fig. 11(b), it can be seen
that the SEU sensitive nodes of circuit c432 are equally distributed within the three probability
ranges (i), (ii) and (iii), whereas all the SEU sensitive nodes in circuit c1355 lie within the middle
range whereP(Tj) is between 0.3 and 0.6. Results of c7552 shows thatP(Tj) of most of the SEU
sensitive nodes is in the lowest range (less than or equal to 0.3), which indicates that gates in this
circuit do not require extensive hardening techniques, whereas majority of SEU sensitive gates in
c2670 requires extensive hardening techniques sinceP(Tj) is very high (above 0.6) for these nodes.
We implemented the SEU simulator based on the work done in [5]with a fanout-dependent
delay model for the ground truth. We performed the simulation with 500;000 random vectors
obtained by changing seed after every 50000 vectors to get the ground-truth SEU probabilities.
For our probabilistic framework, we use Probabilistic Logic Sampling [24] inference scheme. We
compute the SEU sensitivitiesPj of gates in ISCAS benchmark circuits using Probabilistic Logic
Sampling (PLS) [24] with 9999 samples and compare our results with ground-truth simulation re-
sults. Table IV gives the average estimation errorEmean[in column 2] and maximum estimation
errorsEmax [in column 3]. HereEmeanof a circuit is the average of difference between theSEU
detection probabilities (orSEUsensitivities) obtained from simulation and estimated probabilities
from PLS sampling over all possible SEU sensitive nodes in the circuit. SimilarlyEmaxof a circuit
is the maximum of difference between theSEU sensitivities obtained from simulation and esti-
matedSEU sensitivities from PLS sampling over all possible SEU sensitive nodes in the circuit.
Estimation time,Tbn [column 4] is the time taken by the PLS scheme for belief propagation. We
estimated the SEU sensitivities all the ISCAS’85 benchmarks with an average belief propagation
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 34
TABLE IV
SEU SENSITIVITY ESTIMATION ERRORS AND TIME FOR9999SAMPLES.(Emean) (Emax) Tbn(sec)c432 0.0031 0.0069 18.57c499 0.0024 0.0198 13.43c880 0.0027 0.0090 27.58
c1355 0.0027 0.0120 28.84c1908 0.0028 0.0120 176.63c2670 0.0034 0.0130 34.70c3540 0.0023 0.0101 148.07c5315 0.0045 0.0112 121.62c7552 0.0035 0.0100 513.05
time of 140.49 sec, whereas the average time taken for logic simulation of these circuits is 33
hours. Estimation error over all benchmarks is below 0.0034which shows excellent accuracy-time
trade-off.Tbnis the total elapsed time,including memory and I/O access. This time is obtained by
the ftime command in the WINDOWS environment on a Pentium-4 2.0 GHz PC. It is evident from
the results that using a graph-based causal, compact probabilistic framework, Bayesian Network,
we are able to accurately model the Single-event-upset (SEU) sensitivities of logic circuit signals
accounting for temporal and spatial dependencies. The exciting feature of this stimulus-free ap-
proach is that it uses conditional independencies in modeling spatial correlations and time-space
transformation for capturing temporal dependencies.
C. Results with Delay Model based on Logical Effort
In this section we give estimation results from our model with logical effort based gate delay
modeling. In Table V, we list the number of nodes in TALI Bayesian network and the estimation
time in seconds for some of the ISCAS benchmarks. Number of TALI nodes depends on the SEU
list as well as the circuit size, whereas estimation time directly depends on the number of nodes
and the number of samples. We show results for ProbabilisticLogic Sampling (PLS) with 9999
samples.
Figure 12(a) shows the number of possibly sensitized SEUs vs. the number of gates in ISCAS
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 35
TABLE V
SIZE OF TALI-M ODEL AND ESTIMATION TIME FOR LOGICAL-EFFORT BASED DELAY MODEL
# of nodes(TALI)
EstimationTime(s)
c432 2390 22.32c499 7814 65.75c880 1097 12.49
c1355 1773 15.092c1908 2279 22.22c3540 14370 135.79
benchmarks. From this graph, it can be seen that the number ofSEUs in the reduced SEU list is
low compared to fanout dependent delay model. This is due to high gate delay values with logical
effort based delay modeling since we take into account the input capacitance as well as parasitic
delay in addition to fanout. Due to increased gate delays therelative effect of an SEU at an internal
gate on a primary output during latching period is less sincemost of the signals get enough time to
restore to their ideal values. Figure 12(b) shows the SEU sensitivity ranges of gates in the circuits,
with an input bias of 0.5 and SEU width equal to one time unit. As with fanout-dependent delay
modeling, here also we classify the SEU sensitive gates in a circuit into 3 categories. Gates with
estimated sensitivity values (1) less than 0.3, (2) between0.3 and 0.6 and (3) above 0.6. Given
any delay library for a logic circuit, our model can be used toclassify the gates in the circuit in
the order of their SEU sensitivity values capturing logicalmasking effect, circuit structure, input
pattern and SEU duration.
Please note the above estimated probability values are relatively high when we consider the
overall soft error susceptibility of individual gates. To get a comprehensive model, the electrical
masking effect, latching window masking effect and also theSEU generation and propagation
characteristics of individual gates are to be incorporatedwith our model. Modeling electrical
masking effect needs circuit level simulation techniques,which we are trying to integrate with our
current approach as a future direction.
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 36
0
200
400
600
800
1000
1200
1400
1600
1800
2000
c432 c499 c880 c1355 c1908 c3540
Benchmarks
No
. o
f G
ate
s/S
EU
s
Listed SEUs Gates
0
20
40
60
80
100
120
c432 c499 c880 c1355 c1908 c3540
Benchmarks
No
. o
f S
EU
Sen
sit
ive G
ate
s
0.0<p<=0.3 0.3<p<=0.6 0.6>p
(a) (b)
Fig. 12. (a)SEU List-Logical Effort Delay Model (b)SEU Sensitivity Range-Logical Effort Delay Model with Delta=1 and Input Bias = 0.5
VII. CONCLUSION
We are able to effectively model Single-event-Upsets in logic circuits (ISCAS benchmarks)
to estimate the SEU sensitivity of individual nodes in a circuit capturing spatial and temporal sig-
nal correlations, specially emphasizing the effect of inputs, gate delay, SEU duration and circuit
structure. We show results with exact and approximate inferences. Using exact inference we char-
acterize input space which gives zero output error even in the presence of some SEUs. Results
from approximate inference shows excellent accuracy-timetrade-offs. We report SEU sensitiv-
ity estimates for fanout dependent delay model as well as forlogical effort based delay model.
Given an appropriate delay library of gates in a circuit, ourmodel is capable of estimating SEU
sensitivities of individual gates in the circuit and these results can be used for classifying gates for
application of mitigation schemes. Future effort includesmodeling with biased input patterns and
also for different SEU widthδ, to study the effect of these factors on SEU sensitivities. We are
also investigating on the effect of threshold voltage and supply voltage on the electrical masking
effect on transient pulses caused by particle bombardment.
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 37
VIII. L IST OF SYMBOLS
I- SESj�QL: Soft Error Susceptibility of a nodej with respect to a latch outputQL.
II- Tj i : A Boolean variable to identify the occurrence of an error atoutputi condi-
tional to anSEUat nodej.
III- RH : Particle Hit Rate on a chip.
IV- P(SEUj) : Probability that a particle hit at node j generates an SEU atthat node.
V- P(QLjTj i) : Probability that an error at output node i due to an SEU at node j
causes an erroneous signal at latch outputQL.
VI- Xi;t : Random variable representing the value of signali at timet.
VII- P(xi) : Probability that signalXi takes the valuexi .
VIII- δ : Width of an SEU (SEU duration).
IX- ts : Sampling time.
X- fg : fan-out of a gateg.
XI- SEU0 : An SEU causing a 10 1 transition of a signal.
XII- SEU1 : An SEU causing a 01 0 transition of a signal.
XIII- j; t1s : An SEU1 at node j at timet.
XIV- j; t0s : An SEU0 at node j at timet.
XV- P(T( j t);(i ts)) : Probability of an SEU at nodej at time t causing an error at
outputi at timets.
XVI- P(T( j t);(i ts�δ)) : Probability of an SEU at nodej at timet causing an error at
outputi at timets� δ.
XVII- P(T( j t);i) : Probability of an SEU at nodej at t causing an error at outputi.
XVIII- P(Tj) : Worst case SEU sensitivity of a nodej.
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 38
REFERENCES
[1] K. Mohanram and N. A. Touba, ”Cost-Effective Approach for Reducing Soft Error Failure Rate in Logic Circuits,”Interna-
tional Test Conference, pp. 893–901, 2003.
[2] D. Alexandrescu, L. Anghel and M. Nicolaidis, New Methods for Evaluating the Impact of Single Event Transients in VDSM
ICs,Proc. Defect and Fault Tolerance Symposium, pp. 99–107, 2002.
[3] S. Bhanja and N. Ranganathan, “Cascaded Bayesian inferencing for switching activity estimation with correlated inputs,”
Accepted for publication in IEEE Transaction on VLSI, 2004.
[4] P. Shivakumar, et al.,“Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic,”Proc.
International Conference on Dependable Systems and Networks, pp. 389–398, 2002.
[5] M. Violante, “Accurate Single-Event-Transient Analysis via Zero-Delay Logic Simulation,”IEEE Transactions on Nuclear
Science, Vol. 50, No. 6, pp. 2113–2118, 2003.
[6] T. Rejimon and S. Bhanja, “An Accurate Probabilistic Model for Error Detection ,”Proc. IEEE International Conference on
VLSI Design, pp. 717–722, Jan. 2005.
[7] M. Sonza Reorda and M. Violante, “Fault List Compaction through Static Timing Analysis for Efficient Fault Injection
Experiments,”Proc. Defect and Fault Tolerance Symposium, pp. 263–271, 2002.
[8] T. Karnik and P. Hazucha, ”Characterization of soft errors caused by single event upsets in CMOS processes,”IEEE Transac-
tions on Dependable and Secure Computing, Volume: 1-2, pp. 128–143, Apr-Jun. 2004.
[9] V. Degalahal, R. Rajaram, N. Vijaykrishan, Y. Xie and M. JIrwin, ”The effect of threshold voltages on soft error rate,” 5th
International Symposium on Quality Electronic Design, March 2004.
[10] Y. S. Dhillon, A. U. Diril and A. Chatterjee, “Soft-Error Tolerance Analysis and Optimization of nanometer circuits,” Pro-
ceedings of Design, Automation and Test in Europe, Volume: 1, pp. 288–293, Mar. 2005.
[11] S. Krishnaswamy, G. S. Viamontes, I. L. Markov, and J. P.Hayes, “Accurate Reliability Evaluation and Enhancement via
Probabilistic Transfer Matrices”,Design Automation and Test in Europe (DATE), March 2005.
[12] P. K. Samudrala, J. Ramos and S. Katkoori, “Selective Triple Modular Redundancy (STMR) Based Single-Event-Upset (SEU)
Tolerant Synthesis for FPGAs,”IEEE Transactions on Nuclear Science, Vol. 51, No. 5, Oct. 2004.
[13] Chong Zhao, Xiaoliang Bai and S. Dey, “A scalable soft spot analysis methodology for compound noise effects in nano-meter
circuits,” Proceedings of Design Automation Conference, pp. 894–899, Jun. 2004.
[14] T. Rejimon and S. Bhanja, “A Stimulus-Free Probabilistic Model for Single-Event-Upset Sensitivity,”Proc. IEEE Interna-
tional Conference on VLSI Design, Jan. 2006.
[15] M. Zhang and N. R. Shanbhag, “A Soft Error Rate Analysis (SERA) Methodology”International Conference on Computer
Aided Design, November, 2004.
[16] N. Seifertet al. “Impact of Scaling on Soft-Error Rates in Commercial Microprocessors”IEEE Transactions on Nuclear
Science,Volume: 49, No. 6, pp. 3100–3106, Dec. 2002.
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 39
[17] P. Hazuchaet al. “Measurement and Analysis of SER-Tolerant Latch in a 90-nm Dual-V �T CMOS Process”IEEE Transac-
tions on Solid-State Circuits,Volume: 39, No. 9, pp. 1536–1543, Sept. 2004.
[18] P. Hazucha and C. Stevenson, ”Impact of CMOS technologyscaling on the atmospheric neutron soft error rate,”IEEE Trans-
actions on Nuclear Science, Volume: 47-6 , pp. 2586–2594, Dec. 2000.
[19] URL http://www.hugin.com
[20] "GeNie", URL http://www.sis.pitt.edu/˜genie/genie2
[21] S. Manich and J. Figueras,“Maximizing the weighted switching activity in combinational CMOS circuits under the variable
delay model,”European Design and Test Conference, pp. 597–602, 1997.
[22] M. Nicolaidis,“Time Redundancy based Soft-Error Tolerance to Rescue nanometer Technologies,”VLSI Test Symposium, pp.
86–94, 1999.
[23] M. Omana, G. Papasso, D. Rossi, C. Metra,“A Model for Transient Fault Propagation in Combinatorial Logic,”On-Line
Testing Symposium, pp. 111-115, 2003.
[24] M. Henrion, “Propagation of uncertainty by probabilistic logic sampling in Bayes’ networks,”Uncertainty in Artificial Intel-
ligence, 1988.
[25] J. Pearl, “Probabilistic Reasoning in Intelligent Systems: Network of Plausible Inference,” Morgan Kaufmann Publishers,
Inc., 1988.
[26] P. Robinson, W. Lee, R. Aguero and S. Gabriel,“Anomalies due to single event upsets,” Journal of Spacecraft and Rockets,”
Journal of Spacecraft and Rockets, vol. 31, no. 2, pp. 166–171, Mar-Apr 1994.
[27] J.T. Wallmark and S.M. Marcus,“Minimum size and maximum packaging density of non-redundant semiconductor devices,”
Proceedings of IRE, vol. 50, pp. 286–298, March 1962.
[28] G.H. Johnson, J.H. Hohl, R.D. Schrimpf and K.F. Galloway,“Simulating Single-event burnout in n-channel power MOSFETs,”
IEEE Transactions on Electron Devices, vol. 40, pp. 1001–1008, 1993.
[29] R. G. Cowell, A. P. David, S. L. Lauritzen, D. J. Spiegelhalter, “Probabilistic Networks and Expert Systems”, Springer-Verlag
New York, Inc., 1999.
[30] I. Sutherland, R. Sproull and D. Harris, “Logical Effort: Designing Fast CMOS Circuits”,Morgan Kaufmann, February 1999.
[31] N. Ramalingam and S. Bhanja, “Causal Probabilistic Input Dependency Learning for Switching Model in VLSI Circuits”,
ACM GLSVLSI, 2005.
[32] R. Marculescu, D. Marculescu, M. Pedram, ”Sequence Compaction for Power Estimation: Theory and Practice”,IEEE Trans.
on Computer-Aided Design of Integrated Circuits and Systems, vol.18, No.7, pp. 973-993, July 1999.
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 40
IX. RESPONSE TOREVIEWERS
Associate Editor’s Comments: This paper requires a major revision. Reviewers 1 and 3 like
the paper but want the presentation to be improved. Reviewer2 is rather critical and has strong
objections. In summary, all three reviewers find the paper hard to understand, point to unnecessary
or complicated notation, and require improved writing. Allcomments should be responded to in
their entirety while preparing the revision. A summary of point-by-point responses outlining the
corresponding changes in the manuscript will be useful.
We thank the AE for compiling all the key issues and we revisedour manuscript significantly
based on the reviewers’ comments addressing all the relevant issues. We first provide the summary
of all the major revisions and then the detailed responses describing the revisions based on each of
the reviewers’ comments.
A. Summary of Revisions
In this subsection, we report all the significant changes made to the original manuscript. We
thank all the reviewers and the AE for helping us re-model andrestructure and most importantly
re-think and re-visit most of the concepts to improve the quality of this paper.
1. We show that TALI-SES model can handle any delay model at logic level and show results
using logical-effort based delay model, which depend on gate input capacitance, parasitic
capacitance as well as fanout. We added two sections:� Section IVB on logical effort based delay modeling� Section VIC, results based on this delay model.
2. We modified our algorithm and re-generated all the resultsbased on the reviewers’ sugges-
tions. Earlier we took the worst case probabilities as the maximum offP( j ; t)�(i; ts);P( j ; t)�(i; ts�1)g.
In this version, we realize that we need to compute joint probability values for computing
the output error probability at an output nodei due to an SEU at nodej at timet. This is
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 41
explained below:
The effect of an SEU at nodej at timet may be propagated to an outputi at sampling time
ts or ts� 1, if there are re-convergent paths between nodesj and i. In either case it can
cause a bit-flip at the output. We took the worst case probability which is the maximum
of fP( j ; t)�(i; ts);P( j ; t)�(i; ts�1)g. However, this might not be true for some SEUs. Ideally,
it should be a joint probability as explained below: LetA be an event that an SEU at node
j causes a bit-flip at outputi at timets and letB be an event that an SEU at nodej causes
a bit-flip at outputi at timets� 1. P(A= 1) is the probability of occurrence of error and
at timets. P(A = 0) is the probability that SEU doesn’t cause an error atts. P(B) can be
explained in a similar way. SEU sensitivity of nodej w.r.t. outputi is the joint probability
P(A[B) = P(A= 1;B= 0)+P(A= 0;B= 1)+P(A= 1;B= 1)We re-ran all benchmark circuits by taking joint probabilit y values P(T( j ;t) (i;ts) [T( j ;t) (i;ts�1)) as discussed in section IV D in page 20 and modified our results(in Ta-
ble II, Table III and Table IV)
3. We appended excel sheets assupplemental documentsgiving exhaustive SEU sensitivity
estimates of individual nodes in benchmark circuits.
4. We modified bar charts in Fig. 11 (a) and (b) as suggested by the reviewers.
5. Last but not the least, we modified the write-up and cleareda few confusions in the nota-
tions.
B. Detailed Response
In this subsection, we respond to each of the reviewer separately and also indicate the revisions
in the main manuscript.
Response to Reviewer 1’s comments:
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 42� This is a good paper introducing the use Bayesian networks for estimating SET probability.
It may improve the state of the art in this domain.
We thank you very much for reviewing this work and for appreciating our effort. We
appreciate your valuable comments and give appropriate modifications and new results as
suggested by you.� The delay model is simplistic.
We agree with the reviewer that the delay model is simplistic. However, time space transfor-
mation and Bayesian network modeling can be done for any typeof delay models, without
affecting model structure. Given an accurate delay libraryfor gates in a logic circuit, our
model can be used to estimate SEU sensitivities of gates in a circuits, capturing logical
masking effect, circuit structure, input pattern and SEU duration . We have added new re-
sults by using logical effort based delay model consideringinput capacitance, fanout and
parasitic capacitance. We included two additional sections. (1) Section IVB which explains
modeling using logical effort based delay model. (2) Section VIC giving estimated results
with this delay model.
Calculation of gate delays based on logical effort has been explained in several works and
we used the reference [30]. Here we summarize it briefly.
Delay of a logic gate can be expressed as the sum of two components, effort delay and
parasitic delay. Effort delay is the product of logical effort and electrical effort, where
logical effort is defined as the relative ability of a gate topology to deliver current and
electrical effort is the ratio of output capacitance to input capacitance. Electrical effort is
sometimes called fanout. Mathematically, gate delay is expressed asd = f + p = gh+ p
wheref is effort delay,p is the parasitic delay,g is the logical effort andh is electrical effort.
Logical effort is defined to be 1 for an inverter. Hence logical effort is the ratio of input
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 43
TABLE VI
GATE DELAYS BASED ON LOGICAL EFFORT
Gate Type DelayInverter f anout+Pinv
n-input NAND n+23 � f anout+ nPinv
n-input NOR 2n+13 � f anout+Pinv
2-input XOR 4� f anout+ 4nPinv
capacitance of a gate to the input capacitance of an inverterdelivering the same output
current. It can be estimated counting capacitance in units of transistor width. Parasitic
delay represents delay of a gate driving no load and it depends on diffusion capacitance.
Parasitic delay of an inverter,Pinv � 1. From the above considerations, we compute basic
CMOS gate delays and use these delay values in our model. Table below shows the delay
expressions for basic gates.� The presentation needs some improvements. At the bottom of page 12 you say that gate
(10,4) is additional. This is not justified from the previousdiscussion. If you add this gate
because of a transient pulse of a one unit duration, this aspect is discussed in page 14. So,
saying in page 12 that you add page (10,4) is confusing for thereader. As a matter of fact:
1- Remove from page 12 the above statement.
Done
2- Either remove from figure 3 gate (10,4) and add later a new figure in which you add gate
(10,4) or mention in page 11 that the reasons for adding this gate in figure 3 are given later.
Similar problems with gate (19,4).
Thank you very much for pointing this out and helping us to clear this. We have mentioned
in page 12 that the reasons for adding duplicate gates like (10,4) and (19,4) are explained
later. We explain the reasons in page 13, which we repeat below.
The duplicate gates are introduced due to difference in pathdelays between the input signals
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 44
of a gate. Addition of these gates serves two purposes.
1. Model the effect of any SEUs originated at an input having lesser path delay during
the period while it waits for the arrival of the other input signal. Example: Duplicate
gate(10;5) captures the effect of an SEU originated at the output of gate10, at time
t = 5.
2. Model the masking effect of some of the SEUs generated in the signal path of the input
having lesser path delay. Example: Duplicate gate(10;5) mask the effect of an SEU
originated at the output of gate 10, at timet = 2.
We have explained these points in page 13.� In page 17 explain why the output error probability due to node j is equal to the maximum
of 0 SEU and 1 SEU and not the sum.
An SEU occurring at nodej at timet, which is eitherSEU0 or SEU1, (not both)depending
on the location of particle bombardment, can cause a bit-flipat the output with probability
P(T0j ; t) or P(T1
j ; t). We take the worst case probability which is the maximum of the above
two probabilities. This is explained in page 21.� Same page, explain why you take the maximum of soft error probability at time ts and at
time ts-d.
We are extremely thankful for the reviewer for pointing thisout. We thought about it and
found out that a correction is needed here which we explain below. We appreciate the
reviewer for his effort to understand our work and to give us helpful suggestions.
The effect of an SEU at nodej at time t may be propagated to an outputi at time ts
or ts� 1, if there are re-convergent paths between nodesj and i. In either case it can
cause a bit-flip at the output. We took the worst case probability which is the maximum
of fP( j ; t)�(i; ts);P( j ; t)�(i; ts�1)g. However, this might not be true for some SEUs. Ideally,
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 45
it should be a joint probability as explained below: LetA be an event that an SEU at node
j causes a bit-flip at outputi at timets and letB be an event that an SEU at nodej causes
a bit-flip at outputi at timets� 1. P(A= 1) is the probability of occurrence of error and
at timets. P(A = 0) is the probability that SEU doesn’t cause an error atts. P(B) can be
explained in a similar way. SEU sensitivity of nodej w.r.t. outputi is the joint probability
P(A[B) = P(A= 1;B= 0)+P(A= 0;B= 1)+P(A= 1;B= 1) We modified our results
by taking joint probability values, as described by the above equation.
We have explained this in pages 20 and 21.� Same page, what is LIPEM.
We are sorry that we gave the wrong model name. It should be TALI-SES, instead of
LIPEM. It is corrected. Thank you for pointing out that mistake.� Consider the example of figure 4. Probability of X6 uses the probability of parent nodes x4,
x5. These probabilities are dependent. How the model takes into account such dependen-
cies? Perhaps this is described in references given in the paper, but for self-sufficiency, it is
better to add a sentence on this aspect.
Thank you for the reviewer’s comment and we’ve added an explanation to this in page 18,
which we further explain below:
These dependencies are taken care of during Bayesian inferencing. The first step in Bayesian
inference scheme is moralization where an undirected edge is added between each pair of
disconnected vertices with common children and, when this has been completed, all di-
rected edges are replaced by undirected ones. The resultinggraph structure is called moral
graph. This step ensures that every parent child set is a complete sub graph. In example
figure 4, nodes x4 and x5 are connected by undirected edges during moralization, thereby
preserving their dependencies.
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 46
We have explained this in detail in Section V A under Bayesianinference schemes.
Response to Reviewer 2’s comments:
My detailed comments are grouped into several categories listed below:
1. Methodology and problem formulation:� doesn’t include electrical masking, which can also affect susceptibility of nodes - equation
3, not really the best way to represent latching-window masking; it should depend on setup
and hold time and duration of the glitch, too; - this equationdoesn’t cover all cases when
a glitch is not masked (very inaccurate)
We agree with the reviewer that our model doesnt cover electrical masking effect. We would
like to point out that our focus here is to model the effect of logical masking, gate delays,
circuit re-convergence, temporal nature of SEUs and input pattern. Modeling Electrical
masking needs circuit level simulation or estimation techniques, which we are currently
trying to integrate with our current approach as a future direction. There are works done on
soft error rate analysis modeling electrical masking. (Example: Dhillonet. al [10]). Our
model can be fused with any such model to get the exact SEU sensitivity values.
We also agree with the reviewer that the empirical equation 3doesn’t consider latching
window masking effects which is dependent on set-up and holdtime requirements. How-
ever, this equation is good enough for modeling logical masking effect and other timing
factors such as temporal nature of SEUs and gates delays. Theprobabilities estimated with
our TALI model, when multiplied with other factors such as (1) particle bombardment rate
(2) probability of occurrence of an SEU when a particle hit occurs at a node (3) electrical
masking factor and (4) latching probability will give the overall soft error susceptibility of
individual gates in a circuit.� a lot of inconsistency in notation, same things defined more than once in a different way
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 47
(e.g. Tj�i, pages 2 and 8; also, sometimes they use P(Tj�ijSEU), sometimes P(Tj�i)Thank you for pointing out the inconsistency. We are now using the termP(Tj i) as the
conditional probability of getting an erroneous signal atith output conditional to an SEU
at nodej. The conditional aspect is implemented by using an extra node in the TALI-SES
model to inject an erroneous signal (SEU) at nodej.� assumption about gate delay - page 3, they assume gate delay is proportional to fan-out
-page 10, the authors assume is equal to fan-out; in any case,it is pretty inaccurate, it will
depend on fan-in and parasitic delay also
In this work, we assumed that gate delay is equal to fan-out. We’ve corrected it in page
3. Our model can handle any delay model. We can plug-in an average delay value for
each gate type. We incorporate logical effort-based delay model, which takes care of input
capacitance, parasitic capacitance as well as fanout. We included two additional sections.
(1) Section IVB which explains modeling using logical effort based delay model. (2) Sec-
tion VIC giving estimated results with this delay model. Thus we show that, given an
appropriate delay library of gates in a circuit, our model iscapable of estimating SEU sen-
sitivities of individual gates in the circuit.
We have explained logical effort based delay modeling underresponse to reviewer 1’s com-
ments.
2. Experimental results:� Figure 9. - I would say this represents logical masking, but it doesnt’ really say much about
the circuit and the soft-error susceptibility of differentgates
We agree with the reviewer that the probabilities estimatedby our model do not cover
all factors which contribute to the soft error susceptibility. We haven’t modeled electrical
masking effect which need circuit level simulation. As we mentioned before, the scope of
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 48
this work is limited to logical level modeling and we model logical masking effect, circuit
re-convergence, inputs, timing issues such as gate delays,SEU duration, path delays, etc.
In this work, we estimate the SEU sensitivity of gates in a circuit in terms of the probability
that an SEU generated at an internal node produces an erroneous signal at the latch inputs.
This probability, when multiplied with the particle bombardment rate, electrical masking
probability and the latching probability will give the actual soft error susceptibility of in-
dividual gates. The existing estimation techniques for SEUsensitization (logical masking)
do not handle circuits with re-convergence. Our estimationtechnique is accurate and ef-
ficient when compared with simulation. This model can be combined with other latching
and electrical masking models to get the exact soft error susceptibility values.� Figure (10a) - these three bars are redundant - one bar with two colors will give same
information
We modified the charts by giving number of SEUs relative to thenumber of gates� Figure 10 - maybe SEU numbers relative to number of gates in the circuit would explain
better the influence of SEU in different circuits, instead ofpresenting just SEU numbers
Done� worst-case probabilities average would be better, maybe, min, median, too
We have appended excel sheets showing exhaustive SEU sensitivity values (average, mini-
mum, maximum and median values) of individual gates in the benchmark circuits.� except for the smallest circuit, no results showing probabilities that they claimed they cal-
culated they just show number of SEUs .
As said before, we have appended excel sheets showing all probabilities.� Table III no units in last column
Time is in seconds. We’ve added that.
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 49� fonts in figures are too small
We modified the figure.
3. Missing references: - Krishnaswamy et al. ”Accurate Reliability Evaluation and Enhance-
ment via Probabilistic Transfer Matrices”, DATE 2005
- Dhillon et al. ”Soft-Error Tolerance Analysis and Optimization of Nanometer Circuits”
DATE 2005
- Bahar et al. A Probabilistic-Based Design Methodology forNanoscale Computation ICCAD
2003
Thank you for your suggestions. We added the first two references. However, the third refer-
ence mentioned here is not related to our work.
4. Minor things:
page 2, last paragraph: do you mean ”SEU” duration? Also, (4)for inputs should be actually
(5)
It should be SEU duration. We corrected the errors.
page 6, last paragraph, last line: ”doping as well as circuitstructure on the SER of a product”
doesnt make any sense
It actually means that the authors propose a methodology to quantify the impact of various
factors (such as supply voltage, transistor size, circuit topology, doping as well as circuit structure)
on the Soft Error Rate (SER) of a chip. We’ve made it clear in page 6.
page 13, 3rd paragraph: ”SEUs which satisfy Eq. 3 affect circuit outputs resulting in soft
errors” is misleading since the authors say the opposite when eq. (3) is introduced
We thank you for bringing it to our attention. It should be “SEUs which do not satisfy Eq. 3
affect circuit outputs resulting in soft errors”. It is corrected.
Reviewer 3 Comments:
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 50� Author proposed a probabilistic framework to study the single-event-upset in logic circuit.
This model has some nice and interesting features, such as the consideration of the effect
of inputs, gate delays, SEU durations, and the exploration of input space that leads to zero
error. Although most of the technique adopted in this paper is not new, author did a good
job in constructing such a probabilistic framework with above considerations using both
accurate and approximated inference of Bayesian network.
Thank you for appreciating our effort.� Few problems still exists in this paper. The writing of the paper should be further improved,
especially section 3.
We’ve modified the writing especially section III and made itclearer.
In page 8, the 2nd line. It says ”P(Tj i) is the probability that an SEU generated at an
internal node j caused an erroneous signal at output i.” It looks like P(T j i) here is the
conditional probability that a error occurs when an SEU happens. However, in equation 1,
P(Tj ijSEUj) is used to represent such a conditional probability. It confused me here.
Thank you for pointing out the inconsistency. We are now using the termP(Tj i) as the
conditional probability of getting an erroneous signal atith output conditional to an SEU
at nodej. The conditional aspect is implemented by using an extra node in the TALI-SES
model to inject an erroneous signal (SEU) at nodej.
Another problem is the example time-space transformation of circuit. In Fig 3, for t=6, the
input of 22,6 is 10,5 and 16,5. But, isn’t gate delay equal to its fanout here? Then, the input
of 22,6 should be 10,5 and 16,4. Same thing happens in Figure 4, 6. Maybe I did not get it
correctly, but it is worth some more explanations here. Also, is the input delay also equal
to the fanouts?
Output signal evaluation time of a gate is equal to the sum of the gate delay (in this example,
A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 51
we take gate delay = fanout) and the latest arrival time of itsinput signals. Taking fan-out
of primary output gates (here, gates 22 and 23) as 1, signal 22;6 is derived from signals
10;5 and 16;5. Ideally, gate delay depends on input capacitance and parasitic capacitance.
We give results for logical effort based delay model and showthat TALI-SES can be used
with any type of delay models.