SUBMITTED FOR PUBLICATION TO: , FEBRUARY 14, …bhanja/Sanjukta Bhanja/pdfs/SEU_TVLSI.pdf · A Stimulus-free Probabilistic Model for Single-Event-Upset Sensitivity Abstract With device

SUBMITTED FOR PUBLICATION TO: , FEBRUARY 14, 2006 1

A Stimulus-free Probabilistic Model forSingle-Event-Upset Sensitivity

Abstract

With device size shrinking and fast rising frequency ranges, effect of cosmic radiations and

alpha particles known as Single-Event-Upset (SEU), Single-Event-transients (SET), is a growing

concern in logic circuits. Accurate understanding and estimation of Single-Event-Upset sensitivi-

ties of individual nodes is necessary to achieve better softerror hardening techniques at logic level

design abstraction. We propose a probabilistic framework to the study the effect of inputs, circuits

structure and gate delays on Single-Event-Upset sensitivities of nodes in logic circuits as a single

joint probability distribution function (pdf). To model the effect of timing, we consider signals at

their possible arrival times as the random variables of interest. The underlying joint probability dis-

tribution function, consists of two components: ideal random variables without the effect of SEU

and the random variables affected by the SEU. We use a Bayesian Network to represent the joint

pdf which is a minimal compact directional graph for efficient probabilistic modeling of uncertainty.

The attractive feature of this model is that not only does it use the conditional independence to arrive

at a sparse structure, but also utilizes the same for smart probabilistic inference. We show that results

with exact (exponential complexity) and approximate non-simulative stimulus-free inference (linear

in number of nodes and samples) on benchmark circuits yield accurate estimates in reasonably small

computation time.

I. INTRODUCTION

High-energy neutrons present in cosmic radiations and alpha particles from packaging materi-

als give rise to single event upsets (SEUs) resulting in softerrors in logic circuits. When particles

hit a semiconductor material, electron-hole pairs are generated, which may be collected by a P-N

junction, resulting in a short current pulse that causes logic upset or Single Event Upset (SEU) in

the signal value. An SEU may occur in an internal node of a combinational circuit and propagate

to an output latch. When a latch captures the SEU, it may causea bit flip, which can alter the state

A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 2

of the system resulting in a soft error. In current technology, soft errors are of serious concern in

memories, whereas in logic circuits soft error rate is comparatively low due to logical, electrical

and temporal masking effects. However, as process technology scales below 100 nanometers and

operating frequencies increase, the above masking barriers diminish due to low supply voltages,

shrinking device geometry and small noise margin. This willresult in unacceptable soft error

failure rates in logic circuits even for mainstream applications [1].

Soft error susceptibility of a nodej with respect to a latchL, SESj QL is the soft error rate at

the latch outputQL, contributed by nodej. The propagation of an SEU generated due to a particle

hit at an internal nodej to an outputi which causes a bit flip at the output of a latchL is depicted

in Fig. 1.

We model the SEU propagation as follows: LetTj i be a Boolean variable which takes logic

value 1 if an SEU at a nodej causes an error at an output nodei. ThenP(Tj i) (measured as

the probability ofTj i being equal to 1) is the conditional probability of occurrence of an error at

output nodei given an SEU at nodej. Let P(SEUj) be the probability that a particle hit at nodej

generates an SEU of sufficient strength and letP(QLjTj i) be the probability that an error at output

node i causes an erroneous signal at latch outputQL. MathematicallySESj QL is expressed by

Eq. 1.

SESj QL = RHP(SEUj)P(Tj i)P(QLjTj i) (1)

whereRH is the particle hit rate on a chip which is fairly uniform in space and time.P(SEUj)depends onVdd, Vth and also on temperature.P(QLjTj i) is a function of latch characteristics and

the switching frequency.

In this work, we exploreP(Tj i) by accurately considering the effect of (1)SEU duration, (2)

effect of gate delayand (3)timing, (4) re-convergencein the circuit structure and most importantly

(5) inputs. Several works on soft error analysis estimate the overall output signal errors due to SEUs


I1

I2

I3

IN

,

Inputs

j_i QL

P(QL

| Tj_i )

the ith outputSEU propagated to

latch ouputa bit flip atSEU causing

j

iP(SEU

d=2

d=1

d=2

d=3

Combinational Logic

SEU width - Deltaj

)

d is the delay associated with gates

H) Particle Hit at node j (R

Latches

P(T )

Fig. 1. SEU Propagation.

at the internal nodes [8], [9], [13], [15], [16] . Note that our focus is to identify the SEU locations

that cause soft errors at the output(s) with high probabilities andnot on the overall soft error rates.

Knowledge of relative contribution of individual nodes to output error will help designers to apply

selective radiation hardening techniques. This model can easily be fused with the modeling of

the latches [15], [17] considering parameters such as latching window, setup, hold time,Vth and

Vdd [8], [9], [15] for a comprehensive model capturing processing, electrical and logical effect.

We model internal dependency of the signals taking into consideration timing issues so that the

SEU sensitization probability (P(Tj i)) captures the effect of circuit structure, circuit path delay and

also the input space. We use a circuit expansion algorithm similar to that presented in [5], [21] to

embed time-related information in the circuit topology without affecting its original functionality.

A fan-out dependent delay model is assumed where gate delay of each node is equal to its fan-

out. We also use logical effort based delay model where gate delays are dependent not only on

fan-out but also on input capacitance as well as parasitic capacitance. Due to the temporal nature

of SEUs, not all of the SEUs cause soft errors. From the expanded circuit, we generate a list of


SEUs (possible SEU list) that are possibly sensitized to thecircuit outputs at the time frame when

output signals are latched. From the expanded circuit and the possible SEU list, we construct an

error detection circuit and model SEU in large combinational circuits using a Timing aware Logic

induced Soft Error Sensitivity model (TALI-SES), which is acomplete joint probability model,

represented as a Bayesian Network.

Bayesian Networks are causal graphical probabilistic models representing the joint probability

function over a set of random variables. A Bayesian Network is a directed acyclic graphical struc-

ture (DAG), whose nodes describe random variables and node to node arcs denote direct causal

dependencies. A directed link captures the direct cause andeffect relationship between two ran-

dom variables. Each node is quantified by the conditional probability of the states of that node

given the states of its parents, or its direct causes. The attractive feature of this graphical repre-

sentation of the joint probability distribution is that notonly does it make conditional dependency

relationships among the nodes explicit but it also serves asa computational mechanism for effi-

cient probabilistic updating. Bayesian networks have traditionally been used in medical diagnosis,

artificial intelligence, image analysis, and specifically in switching model [3] and single stuck-at-

fault/error model [6] in VLSI but their use in timing aware modeling of Single-Event-Upsets is

new. We first explore an exact inference scheme also known as clustering technique [19], where

the original DAG is transformed into special tree of cliquessuch that the total message passing

between cliques will update the overall probability of the system. We then explore a stochastic

inference strategy, named Probabilistic Logic Sampling (PLS) [24], where a full instantiation of

the probabilistic network is collected based on a simplifiedimportance function. The sampling is

stopped when the probabilities of the nodes converge.It is worth pointing out that unlike simulative

approaches that sample the inputs, importance sampling based procedures generate instantiations

for the whole network, not just for the inputs.These samples can be looked upon as Markov Chain


j_i)P(T

SEU MODELING

Alexandrescu et al. ’02 [2]

Mohanram et al. ’03 [1]

Shivakumar et al. ’02 [4]

Karnik et. al. ’ 04 [8]

Violante ’03 [5]

Mohanram et al. ’03 [1] Mohanram et al. ’03 [1]


Degalahal et al ’04 [9]


Simulative ProbabilisticR

HP(Q

L| Tj_i )P(SEU )j

Mohanram et al. ’03 [1]

Degalahal et al ’04 [9]

Karnik et. al. ’ 04 [8]This work ’05

Other

Zhao et al ’04 [13]



Dhillon et al. ’05 [10]

Samudrala et al. ’04 [12]Krishnaswamy et al. ’04 [11]

Zhang et al. ’04 [15]

Seifert et al. ’02 [16]

Hazucha et al. ’00 [18] Zhang et al. ’04 [15]


Hazucha et al. ’00 [18]







Omana et al ’03 [23]

Fig. 2. Recent Works on SEU Modeling.

sampling of the circuit state space.

The remainder of this paper is organized as follows. SectionII is a summary of the prior

works done on soft error modeling and analysis. In Section III, we give an outline of our mod-

eling. We discuss our model in detail in Section IV, explaining the timing issues, features of

Bayesian network-based modeling, and the proposed TALI-SES model, which can be used to esti-

mate the SEU sensitivities of individual nodes. This is followed by section V on Bayesian inference

where we discuss both exact and approximate(stochastic) inference schemes. In section VI we give

experimental results using both exact and stochastic inference. Using exact inference we can char-

acterize the input space to achieve zero output error even inthe presence of some of the SEUs. The

exact inference works well for small circuits. To handle larger circuits we use a stochastic infer-

ence scheme and compare our results with logic simulation results and found that our modeling is

accurate (close-to-zero error) and efficient.

II. BACKGROUND

Figure 2 gives a list of the recent works done on soft error analysis. An estimation method

for soft error failure rates resulting from Single Event Upsets proposed in [1] computes soft error


susceptibility of a node based on the rate at which a Single Event Upset (SEU) occurs at the node(RSEU), the probability that it is sensitized to an output(Psensitized) and the probability that it is

captured by a latch(Platched). In [2], Alexandrescuet al. present a SET fault simulation technique

to evaluate the soft error probability caused by transient pulses. A model that captures the effects

of technology trends in the Soft Error failure Rates (SER), considering different types of masking

phenomena such as electrical masking, latching window masking and logical masking, is presented

in [4]. Another model to analyze Single Event Upsets withzero-delaylogic simulation, which is

accurate and faster than timing simulators, is presented in[5]. As discussed in the previous section,

this model uses a circuit expansion algorithm to incorporate gate delays and a fault list generation

algorithm to get a reduced list of SETs. All of the above methods use simulation techniques which

are highly input pattern dependant.

Zhaoet al. proposes a methodology to evaluate softness or vulnerability of nodes in a circuit

due to compound noise effects by considering the effects of electrical, logical and timing mask-

ing [13]. They use a probabilistic approach to estimate the effect of logical masking. However this

method can not be used for circuits with re-convergent pathsand will not be able to handle larger

circuits. Also, this method doesn’t capture the effect of gate delays.

A selective triple modular redundancy technique (STMR) forachieving radiation tolerance in

FPGA designs is discussed in [12].A mathematical model to estimate the possible propagation of

glitches due to transient faults has been presented in [23].However they show results only for a

very small circuit.

Karnik et al. suggests that soft error rate should be considered as a design parameter along

with power, performance and area due to its increasing impact on circuits and systems with the

scaling of process technology [8]. They propose a methodology to quantify the impact of supply

voltage, transistor size, circuit topology, doping as wellas circuit structure on the Soft Error Rate


(SER) of a chip. effect of threshold voltage on SER of memories and combinational logic has been

studied in [9]. Zhanget al. in [15] proposed a composite soft error rate analysis method(SERA)

to capture the effect of supply voltage, clock period, latching window, logic depth, circuit topology

and input vector on soft error rate. However, they resort to logic and circuit level simulation to

capture these probabilities. This method uses a conditional probability based parameter extraction

technique obtained from device and logic simulation. In their work, combinational circuits are

assumed to have unbalanced re-convergent paths. However, other design considerations usually

drive optimal circuit design to have balanced paths by adding buffers wherever re-convergence

is necessary. For circuits with balanced paths, soft error analysis based on approximations given

in [15] might not be the best choice.

Seifertet al. discusses the importance of latch design on the soft error rate (SER) of core

logic [16]. It also analyses the impact technology scaling on SER at devise, circuit and chip level.

Relation between soft error rate and technology feature size based on device level simulation has

been studied in [18].

Since all the state-of-the-art techniques have resorted tosimulation for logical and device

level effects (known to be expensive and pattern-sensitiveespecially for low probability events),

we felt the need to explore the input data-driven uncertainty in a comprehensive manner through a

probabilistic model to capture the effect of primary inputs, the effect of gate delay and the effect of

SEU duration on the logical masking. There is future scope for these kinds of models to be fused

with other models [8], [9], [15], [16], [18] for capturing device effects such as electrical masking,

threshold voltage and supply voltage.

III. OVERVIEW

We model the effect of single event upsets produced at an internal node of a circuit on the

circuit output, by computing the joint probability distribution described by Eq. 2.


P(Tj i) = ∑j ;fIlg;Xk;k6= j

P(Tj i; I0; � � � Il ; � � � IN;X1; � � � ;Xk; � � �XM) (2)

whereP(Tj i) is the probability that an SEU generated at an internal nodej causes an erroneous

signal at outputi. Tj i is a test signal which compares the ideal output signal at theith output

with the corresponding error-sensitized output caused by an SEU at thejth node. IfTj i = 1, it

indicates the occurrence of an error at output i due to an SEU at j. P(Tj i) depends on the N input

signalsI0; � � � ; IN, M internal signalsX1; � � � ;XM, and the type of SEU atj (SEU1 caused by 0-1-

0 transition orSEU0 caused by 1-0-1 transition). Ideally, the real effect of SEUat ith output is

product ofP(Tj i) andP(SEUj), whereP(SEUj) is the probability that a particle hit at a nodej

produces an SEU at that node and it depends on process parameters such asVdd andVth and also

depend dynamically on temperature. With reduced supply voltages and diminishing dimensions,

this probability will be very close to one. In this work, we assume that a particle hit occurring

at a node generates an SEU and henceP(SEUj) = 1. In Eq. 2, the probabilityP(Tj i) does not

consider the transient nature of SEU. For example, the SEU effect may reach the output for a short

time span, but the output signal can be reinforced to its correct value before it is sampled by the

latch. SEU propagation depends on the gate delays and SEU duration. Letth be the time when an

SEU originates at a node,δ be the SEU duration,ts be the time when outputs are sampled andΠ

be the set of propagation delays(td) of sensitized paths from the node to the circuit outputs. Nodes

satisfying the following conditions do not cause soft error[5]:

th+δ+ td < ts 8td 2 Π: (3)

Even though the above empirical formula doesn’t take into account of set up and hold time require-

ments which affect latching window masking, we use this equation for our modeling because this

is pretty accurate as far as logical masking effect, circuitstructure and gate delays are concerned.

To capture the effect of gate delays and SEU duration, we do a time-space transformation of


the original circuit, by means of a circuit expansion algorithm similar to that presented in [5].

Our model captures not only the effect of gate delays, but also effect of difference in path delays

(arrival times) between the input signals of gates assuminggate delay is equal to its fan-out. In the

expanded circuit, each gate is replicated several times corresponding to the time instants at which

the gate output is evaluated. The circuit outputs are also replicated.

Thus each of the random variables in Eq. 2 represent a set of variables at different time frames.

Ii = fIi;0; Ii;1g whereIi;0 is the input signal value ofIi at time instant just before the occurrence of

a clock cycle andIi;1 is the new input signal after the clock pulse is applied. Signal fIi;1g remains

the same throughout the clock cycle.Xi = fXi;tkg8tk wheretk is the signal evaluation time.

Only the final output values - output signals arriving at the latching window - are captured by

the latch. An SEU is effect-less if it doesn’t cause a bit flip in the final outputs. We arrive at a

reduced SEU list by considering only those SEUs whose effectreach the final outputs - outputs at

the sampling time frames. We also modify the expanded circuit by removing parts of the circuit

which do not generate and propagate soft-error-causing-SEUs (discussed in section IV A).

TALI-SES is a Directed Acyclic Graph we build from the expanded circuit and the reduced

SEU list to capture the effect of each SEU at a node to the output. This model consists of the ideal

time-space transformed circuit without any SEUs and a set ofduplicate logic blocks to propagate

the SEU effects. Outputs from the SEU sensitized duplicate blocks are compared with correspond-

ing outputs of the ideal circuit. If those signal values are not the same, it indicates that the SEU

causes an error at the output. We discuss TALI-SES construction in section IV D. The salient

features of modeling SEU by Bayesian Network are as follows.

1. We provide a comprehensive model for the underlying errorframework using a graphical

probabilistic Bayesian Network based model TALI-SES that is causal, minimal and exact.

2. We can model the effect of timing and transient nature of the SEU’s along with the accurate


modeling of re-convergence in the circuit.

3. This model captures the data-driven uncertainty in the modeling of soft error that can be

used where exact input patterns are not known apriori and also can be used by building a

probabilistic model in case data traces are available by learning algorithm [31], [32].

4. We infer error probabilities by (1) exact inference that transforms the graph into a special

junction tree structure and relies on local message passingscheme and also by (2) smart

stochastic non-simulative inference algorithms that havethe feature of any-time estimates

and generates excellent accuracy time trade-off for largercircuits.

5. Bayesian Networks are unique tool where effect of an observation at a child node can be

used to get a probability space of the parents. This is calledbackward reasoning. Our

model can be used to generate input space for which the SEU occurring at a particular node

j might have no impact on the outputs. Note that in such case, hardening techniques will

not be needed for nodej. Similarly, we can find input space for which SEU at a nodej

cause high error probability at outputs. If the data trace issimilar to the second type of

input space, extensive hardening techniques need to be applied to j.

IV. THE PROPOSEDMODEL

In this section, we first focus on handling the timing aware feature of our probabilistic model,

followed by the fault list construction. We conclude the section with discussion about the model

itself, given the timing-aware graph and the fault list.

A. Timing Issues

We first expand the circuit by time-space transformation of the original circuit, without chang-

ing its functionality. The approach is similar to the methoddiscussed in [5], [21]. Fig. 3 is the

expanded circuit of benchmarkc17. A gate in the original circuitC will have many replicate gates


22,4

22,6

16,3

3,12,1

22,3

1,1

19,2

10,2

11,016,0

19,0,

7,06,03,0

22,010,0

10,3

7,1

23_0

19,5

t = 6

t = 2

t =

2,0

3

t = 4

t = 5

1,0

16,5

t = 1

t = 0

19,3

10,4

10,5

19,4

23,6

11,3 23,3

6,1

23,4

22,5

23,5

Fig. 3. Time-space transformed circuit of benchmark c17, modeling all SEUs.

in the expanded circuitC0, corresponding to different time-frames at which the gate output is eval-

uated. The output evaluation timefTg of each gate in the circuit is calculated based on variable

delay model. We assume that the delay associated with a gate is equal to its fan-out. For each

gateg whose output is evaluated at timet 2 fTg a replicate nodeg; t is constructed. In addition to

these replicate gates, we insert some duplicate gates (shown by filled gate symbols in Fig. 3). We


3,0

22,6

16,3

11,0

6,13,12,1

t = 6

t = 2

t =

19,5

3

t = 4

t = 516,5

10,5

t = 1

t = 0

10,4

19,4

11,3

7,1

23,6

22,5

23,5

1,1

6,0

Fig. 4. Modified time-space transformed circuit of benchmark c17, modeling only the possibly sensitized SEUs.

explain the reasons for adding these duplicate gates later in this section.

The inputs ofg; t are the replicate nodes of the gates, which are the inputs ofg in the original

circuit and belongs to the time-framest 0 < t. We consider the value of signali at timet by (i; t).Now the random variable that represents the value of a signali at timet is denoted byXi;t. The cir-

cuit outputs reach steady state values,X22;0 andX23;0 at t = 0, after the application of the previous


inputs,fX1;0;X2;0; X3;0; X6;0; X7;0g. Let the new inputsfX1;1;X2;1; X3;1; X6;1; X7;1g be applied at

t = 1. X10;2 is the signal value at the output of gate 10 at time instant 2.

We insert a few duplicate gates (example:(10;4), (10;5), (19;5), etc. shown by filled gate

symbols) due to the following reasons:� Input signals of certain gates in the circuit might have different arrival time due to the

difference in path delays. In order to model the effect of anySEU generated at the junction

of the gates at time instants, later than the signal’s arrival time, we insert additional duplicate

nodes for those internal signals with less path delay. For example, in Fig. 3, input signals

to gate 22 have path delays 2 and 5 respectively. The final output signal(22;6) is evaluated

with input signals(16;5) and(10;5). If no SEUs originated at the output of gate 10 between

time instants 2 and 5,(10;2) and(10;5) would be the same. However, in the event an SEU

occurs at node 10 att = 5, (10;2) and(10;5) may differ depending on the inputs, which

can cause a wrong output signal at(22;6). We model the effect of SEU at(10;5) by

introducing a duplicate gate(10;5) whose inputs are(1;1) and(3;1). Similarly, (10;3),

(10;4), (19;4)and (19;5) are other duplicate gates.� Duplicate gates also model the masking effect of some of the SEUs generated in the signal

path of the input having lesser path delay. Example: Duplicate gate(10;5) mask the effect

of an SEU originated at the output of gate 10, at timet = 2. Thus we can arrive at a reduced

SEU list which is further explained later in this section.

Steps for constructing the timing-aware expanded circuit,based on fan-out dependent delay model

are the following:

1. Arrange gates in the order of levels, with the level of input gates equal to zero.

2. Include all gates that are present in the original circuit. Output signals of these gates

represent the steady state signal values att = 0, before the application of new inputs.

3. Add additional input nodes representing new input signalvalues att = 1;


4. For each level of the circuit starting from levell i = 1, repeat the following step:

For each gateg in level l i , create replicate gates at time framet =tp + fg, wheretp is the

maximum time frame of the previously inserted parent gates of g and fg is the fan-out of

gate g. Update time frames of gateg.

Output signals of a circuit are sampled att = ts, wherets is the maximum of the latest signal

arrival times of the output signals. SEUs which do not satisfy Eq. 3 affect circuit outputs resulting

in soft errors. These SEUs are the upsets generated at the output of gates, which are in the fan-in

cones of final outputs (outputs evaluated at timets). SEUs occurring at certain other gates, which

are not in the fan-in cones of the final outputs, may also affect circuit outputs. These nodes arise

due to the SEU duration timeδ. For example in Fig. 3, we see that the final outputs are generated at

time instant t=6. If an SEU occurs at signal 19 at 4 ns and lastsfor one time unit, it will essentially

be capable of tampering the value of node 23 at 6 ns. Note that we assume thatδ is one time unit.

The fault list will be different if we change the value ofδ. Thus we can see that SEUs which are

sensitized to outputs at time frames betweents andts� δ may cause soft errors, depending on the

input signals and circuit structure.

Considering the above factors, we modify the expanded circuit by including only those gates

that propagate SEUs to the outputs between time instants,ts andts�δ. Thus we get a considerable

reduction in the circuit size. Fig. 4 is the modified expandedcircuit of c17, which models all SEUs

possibly sensitized to a final output.

Next, we discuss how to generate a list of possible SEUs affecting the circuit outputs. Not all

gates in Fig. 4 are SEU sensitive. As discussed above, a duplicate node introduces an additional

delay of at least one time unit. If the delay introduced by a duplicate gate is greater than or equal

to δ, the SEU duration time, the effect of SEUs originated at any of the gates in the fan-in cones

of the duplicate gate is nullified and correct signal value isrestored at the output of the duplicate


TABLE I

GATE DELAYS BASED ON LOGICAL EFFORT

Gate Type DelayInverter f an� out+Pinv

n-input NAND n+23 � f anout+ nPinv

n-input NOR 2n+13 � f anout+Pinv

2-input XOR 4� f anout+ 4nPinv

gate, and hence those SEUs are effect-less. Thus we create a reduced list of SEUs by traversing

the modified extended circuit from each of the circuit outputs at time instants betweents andts�δ,

until a duplicate gate or an input node is reached.

B. Delay Modeling Based on Logical Effort

We extend this work by using logical effort based model whichis dependent on fan-out, input

capacitance as well as parasitic delay. In this section we explain how gate delays are calculated

based on logical effort [30]. Delay of a logic gate can be expressed as the sum of two components,

effort delay and parasitic delay. effort delay is the product of logical effort and electrical effort,

where logical effort is defined as the relative ability of a gate topology to deliver current and

electrical effort is the ratio of output capacitance to input capacitance. Electrical effort is sometimes

called fan-out. Mathematically, gate delay is expressed asd = f + p = gh+ p where f is effort

delay, p is the parasitic delay,g is the logical effort andh is electrical effort. Logical effort is

defined to be 1 for an inverter. Hence logical effort is the ratio of input capacitance of a gate to the

input capacitance of an inverter delivering the same outputcurrent. It can be estimated counting

capacitance in units of transistor width. Parasitic delay represents delay of a gate driving no load

and it depends on diffusion capacitance. parasitic delay ofan inverter,Pinv � 1. From the above

considerations, we compute basic CMOS gate delays and use these delay values in our model.

Table below shows the delay expressions for basic gates.


Circuit expansion is performed in a similar way as explainedin the above section. Each gate

is replicated several times corresponding to the time frames at which new gate output signals are

evaluated. Here, gate output evaluation time is based on delay values calculated as above. This

is illustrated in Figure 4 which shows how benchmark circuitc17 is expanded with logical effort

based gate delay model. Delay of a 2-input nand gate with one fan-out is calculated as 3.33 time

units and that of a 2-input gate nand gate with 2 fan-out is 4.67 time units. Final output is evaluated

at time unitTs = 13:67. From this expanded circuit, we arrive at a reduced circuit by traversing

backward from outputs evaluated atTs andTs� δ until a duplicate gate or an input is reached,

thereby modeling only the possibly sensitized SEUs.

We report results obtained from logical-effort based delaymodel in a separate section (Sec-

tion VI C).

C. Bayesian Networks

A Bayesian network [25] is a Directed Acyclic Graph (DAG) in which the nodes of the network

represent random variables and a set of directed links connect pairs of nodes. The links represent

causal dependencies among the variables. Each node has a conditional probability table (CPT)

except the root nodes. Each root node has a prior probabilitytable. The CPT quantifies the effect

the parents have on the node. Bayesian networks compute the joint probability distribution over

all the variables in the network, based on the conditional probabilities and the observed evidence

about a set of nodes.

Fig. 6 illustrates a small Bayesian network. The exact jointprobability distribution over the

variables in this network is given by Eq. 4.

P(x6;x5;x4;x3;x2;x1) = P(x6jx5;x4;x3;x2;x1)P(x5jx4;x3;x2;x1)P(x4jx3;x2;x1)P(x3)P(x2)P(x1): (4)


22,13.67

19,10.34

,

t = 1.0

1,0

23,12.33t = 12.33

11,0

19,5.67

10,10.34t = 10.34

10,5.67

t = 4.3319,4.33

t = 0.0

16,10.34

7,0

10,4.33

7,16,1

16,0

19,9.0023_9.00

3,12,11,1

19,0

t = 9.00

22,9.00

11,5.6716,5.67

,

,

t = 5.67

,6,0

23_7.66

22,7.66

22,010,0

3,0

t = 7.66,

2,023_0

t = 13.6723_13.67

Fig. 5. Time-space transformed circuit of benchmark c17 with Logical Effort Based Delay Model


X 1 X 2 X 3

X 5

X 6

4X

Fig. 6. A small Bayesian network

In this BN, the random variable,X6 is independent ofX1, X2 andX3 given the states of its parent

nodes,X4 andX5. Thisconditional independencecan be expressed by Eq. 5.

P(x6jx5;x4;x3;x2;x1) = P(x6jx5;x4) (5)

Mathematically, this is denoted asI(X6;fX4;X5g;fX1;X2;X3g). In general, in a Bayesian network,

given the parents of a noden, n and its descendents are independent of all other nodes in thenet-

work. LetU be the set of all random variables in a network. Using the conditional independencies

in Eq. 5, we can arrive at the minimal factored representation shown in Eq. 6.

P(x6;x5;x4;x3;x2;x1) = P(x6jx5;x4)P(x5jx3;x2)P(x4jx2;x1)P(x3)P(x2)P(x1): (6)

In general, ifxi denotes some value of the variableXi andpa(xi) denotes some set of values

for Xi ’s parents, the minimal factored representation of exact joint probability distribution overm

random variables can be expressed as in Eq. 7.

P(X) = m

∏k=1

P(xkjpa(xk)) (7)

In Fig. 6, it can be seen that nodesx4 andx5 are dependent since they have a common par-

ent. Even though this dependency is not shown in the initial Bayesian net graph structure, during

Bayesian inference process these dependencies are taken care of by a process known as moral-

ization where each pair of unconnected nodes having a commonchild node are connected by an


16,519,5

22,623,6

10,5

TT ((16,5)-22,6))

23,6 s 22,6 s

XX22, 6s s

XT(16,5)-(23,6)) T((16,5)-22,6))

X

19, 5X

X

16, 5 10, 5XX

22, 6X23, 6

((16,5)-23,6))

16,51

s

23, 6

16,5Xs

1

Fig. 7. (a) An illustrative SEU sensitivity logic for a subset of c17. (b) Timing-aware-Logic-induced-DAG model ofthe SEU sensitivity logic in (a)

undirected graph, making every parent child set a complete sub graph. We explain this aspect of

Bayesian inference in detail under Section V.

D. TALI: Timing-aware-Logic-induced Soft error model

In this section, we first describe the proposed Bayesian network based model, which can be

used to estimate the soft error sensitivity of logic blocks.This model captures the dependence of

SEU sensitivity on the input pattern, circuit structure andthe gate delays. Note that this probabilis-

tic modeling does not require any assumptions on the inputs and can be used with any biased work-

load patterns. The proposed model, Timing-Aware-Logic-Induced-Soft-Error-Sensitivity (TALI-

SES) Model is a Directed Acyclic Graph (DAG) representing the time-space transformed, SEU-

encoded combinational circuit,< C0;J > whereC0 is the expanded circuit created by time-space

transformation as discussed in section. A andJ is the set of possible SEUs (also discussed in sec-

tion A). The error detection circuit consists of the expanded circuitC0, an error sensitization logic

for each SEU and a detection unitT consisting of several comparator gates. We explain it with the

help of a small example shown in Fig 7(a), which is the error detection circuit for a small portion

of benchmark c17. The error sensitization logic for an SEU atnode j consists of the duplicate

descendant nodes fromj. In Fig. 7(a), the block with the dotted square is the sensitization logic


for 16;51s [An SEU1 at node 16 at timet = 5]. It consists of nodes 22;5s and 22;6s descending

from node 16;5 of the time-space transformed circuit. For simplicity, weshow the modeling of

only one SEU in this example. Our model can handle any number of SEUs simultaneously. Each

SEU sensitization logic has an additional input to model theSEU. Example: inputSEU116;5. This

input signal value is set to logic one in order to model the effect of a 0-1-0 SEU occurring at node

16 at time frame 5.

As discussed previously in section A, an SEU lasting for a durationδ can cause an erroneous

output if its effect reaches the output at any instant between the sampling timets and time frame

ts� δ. In this work we assumeδ to be one. Hence we get error sensitized outputs at time framets

and for some SEUs atts� 1 also, if there exist re-convergent paths between SEU location and an

output. We need to compare the SEU-free output signals evaluated at the sampling time,ts with the

corresponding SEU-sensitized output signals arriving atts� 1 andts. Hence these signals are sent

to a detection unitT. The comparators in the detection unit compare the ideal anderror sensitized

outputs with the corresponding error-free outputs and generate test signals. For example, the test

signals for an SEU at nodej at timet areT( j ;t) (i;ts) andT( j ;t) (i;ts�1). If any of these the test signal

value is 1, it indicates the occurrence of an error. The probability P(T( j ;t) i), which is a measure of

the effect of SEU( j; t)s on the output nodei is computed as a joint probability which is explained

below:

Let A be an event that an SEU at nodej causes a bit-flip at outputi at timets and letB be an

event that an SEU at nodej causes a bit-flip at outputi at timets� 1. P(A= 1) is the probability

of occurrence of error and at timets. P(A= 0) is the probability that SEU doesn’t cause an error at

ts. P(B) can be explained in a similar way. The Error probability due to an SEU at nodej at timet

w.r.t. outputi is the joint probability

P(A[B) = P(A= 1;B= 0)+P(A= 0;B= 1)+P(A= 1;B= 1) (8)


which is expressed as:

P(T( j ;t) i) = P(T( j ;t) (i;ts)[T( j ;t) (i;ts�1)): (9)

An SEU can have effect on more than one output. The overall effect of an SEU( j; t)s on the

outputs is computed asP(T( j ;t)) = max8ifP(T( j ;t) i)g. In the example the SEU(16;5)s is sensitized

to outputs 22,6 and 23,6. Hence the two test signals for this SEU areT(16;5) (22;6) andT(16;5) (23;6).An SEU occurring at nodej at timet, which is eitherSEU1 or SEU0 (but not both),can cause a

bit-flip at the output with probabilityP(T1j ;t) or P(T0

j ;t). In order to compute the SEU sensitivity of

a node, we take the worst case probability, which is the maximum of the above two probabilities.

P(Tj ; t) = maxfP(T1( j ;t));P(T0( j ;t))gMore than one SEU can originate at a node at different time frames. Considering the effect of

SEUs at node j at all time frames, we compute the worst case output error probability due to node

j asP(Tj) = max8tfP(T( j ; t))g, which is the maximum probability over all time frames.

These detection probabilities depend on the circuit structural dependence, the inputs, depen-

dencies amongst the inputs, gate delays and the SEU duration. In this work we assume random

inputs for experimentation and validation of our model.

We construct the TALI-SES Bayesian Network of the SEU detection circuit by nodes which

are random variables representing signal values of the SEU detection circuit. A signali in the

detection circuit is represented by the random variableXi in the Bayesian Network.

In TALI-SES DAG structure the parents of each node are its Markov boundary elements.

Hence the TALI-SES is a boundary DAG. For definition of MarkovBoundary and boundary DAG,

please refer to [25]. Note that TALI-SES is a boundary DAG because of the causal relationship

between the inputs and the outputs of a gate that is induced bylogic. It has been proven in [25] that

if graph structure is a boundary DAGD of a dependency modelM, thenD is a minimal I-map of

M ( [25]). This theorem along with definitions of conditional independencies, in [25] (we omit the


details) specifies the structure of the Bayesian network. Thus TALI-SES DAG is a minimal I-map

and thus a Bayesian network (BN).

Quantification of TALI-SES-BN : TALI-SES-BN consists of nodes that are random variables

of the underlying probabilistic model and edges denote direct dependencies. All the edges are

quantified with the conditional probabilities making up thejoint probability function over all the

random variables. The overall joint probability function that is modeled by a Bayesian network

can be expressed as the product of the conditional probabilities. Let us say,X0 = fX

01;X0

2; � � � ;X0mg

are the node set in TALI-SES Bayesian Network, then we can say

P(X0) = m

∏k=1

P(x0kjPa(X0

k)) (10)

wherePa(X0k) is the set of nodes that has directed edges toX

0k. A complete specification of the

conditional probability of a two input AND gate output will have 23 entries, with 2 states for

each variable. These conditional probability specifications are determined by the gate type. By

specifying the appropriate conditional probability we ensure that the spatial dependencies among

sets of nodes (not only limited to just pair-wise) are effectively modeled.

V. BAYESIAN INFERENCE

We explore two inference schemes for the TALI-SES. The first inference scheme is cluster

based exact inference and the second one is based on stochastic inference algorithm which is an

approximate non-simulative scalable anytime method.

A. Junction Tree Based Inference

We demonstrate this inference scheme with a running exampleshown In Fig 8. The combina-

tional circuit is shown in Fig. 8a and a subset of the time transformed circuit in shown in Fig 8b.

The Bayesian Network captures the effect of SEU of “zero” at node 5 at a time instant 2 unit

(denoted by the random variableX5;2s0 on the output signal 6 at 3 time unit(denoted by random


variableX6;3). Note that the error in output signalX6;3 is T6 (5;2)) which is an xor combination of

X6;3 andX6;3S whereX6;3S is the node that captures the effect of SEU at node 5 at 2 time unit. This

is the original TALI-SES Bayesian Networks that we further process for exact inference.

The first step of the exact inference process is to create an undirected graph structure called

the moral graph(denoted byDm) given the Bayesian network DAG structure (denoted here by

D). The moral graph represents the Markov structure of the underlying joint function [29]. The

dependencies that are preserved in the original DAG are alsopreserved in the moral graph [29].

From a DAG, which is the structure of a Bayesian network, a moral graph is obtained by adding

undirected edges between the parents of a common child node and dropping the directions of the

links. Fig. 9a shows the undirected moral graph and the dashed edges are added at this stage.

This step ensures that every parent child set is a complete sub graph. Moral graph is undirected

and due to the added links, some of the independencies displayed in DAG will not be graphically

visible in moral graph. Some of the independencies that are lost in the transformation contributes

to the increased computational efficiency but does not affect the accuracy [29]. The independencies

that are graphically seen in the moral graph are used in inference process to ensure local message

passing.

The moral graph is said to be triangulated if it is chordal. The undirected graph G is called

chordal or triangulated if every one of its cycles of length greater than or equal to 4 possesses a

chord [29] that is we add additional links to the moral graph,so that cycles longer than 3 nodes

are broken into cycles of three nodes. Note that in this particular example, moral graph is chordal

and no additional links are needed. The junction tree is defined as a tree with nodes representing

cliques (collection of completely connected nodes) of the choral graph and between two cliques

in the tree T there is a unique path. Junction tree possesses aproperty called running intersection

that ensures that if two cliques share a common variable, thevariable should be present in all the


X1

X2

X3

X5

X4

X6

X3,1

X2,1

X1,1

X5,2S0

X5,2

X4,2

X6,3

T6_(5,2)

X6,3S

(a) (b)

Fig. 8. (a) A small Logic circuit (b) Time transformed Bayesian Network

cliques that lie in the unique path between them. Fig 9b showsthe junction tree. Note that every

clique in the moral graph is a node (exampleC1= [X6;3;X5;2;X4;2) in the junction tree. Also, C6

and C7 have nodeX4;2 in common, henceX4;2 is present in all the cliques namely C2,C1, and C4

that lie between the unique path between C6 and C7. This property of junction tree is utilized

for probabilistic inference so that local operation between neighboring cliques guarantees global

probabilistic consistency.

Chordal graphs are essential as they guarantee the existence of at least one junction tree. Hence

chordalization is a necessary step. There are many algorithms to obtain junction tree from chordal

graph and we use a tool HUGIN [19] that uses minimum-fill-in heuristics to obtain a minimal

chordal and junction tree structure.

Since every child parent team is present together in one of the cliques, we initialize the clique

joint probabilities by the original joint probability of a child parent team. We then use a message

passing scheme to have consistent probabilities. Suppose we have two leaf clique in the junction

tree say in our example in Fig 9b C7 and C3. Both the cliques areinitialized based on the child


parent team (C3 by nodes 3,2 and 5 and C7 by node 1,2, 4). Similarly, C6, C1 and C5 are initialized.

The initial clique probability of clique Ci is termed asφCi and is also called potential of a clique.

Let us now consider two neighboring cliques to understand the key feature of the Bayesian

updating scheme. Let two cliquesCl andCmhave probability potentialsφCl andφCm, respectively.

Let S be the set of nodes that separates cliquesA and B (Example: S= fX6;3;X6;3Sg between

cliques C4 and C5 in Fig. 9b). The two neighboring cliques have to agree on probabilities on

the node setSwhich is their separator. To achieve this we first compute themarginal probability

of S from probability potential of cliqueCl and then use that to scale the probability potential of

Cm. The transmission of this scaling factor, which needed in updating, is referred to as message

passing. New evidence is absorbed into the network by passing such local messages. The pattern

of the message is such that the process is multi-threadable and partially parallelizable. Because the

junction tree has no cycles, messages along each branch can be treated independently of the others.

Note that since junction tree has no cycle and it is also not directional, we can propagate

evidence from any node at any clique and the propagate the evidence in any direction. It is in sharp

contrast with simulative approaches where flow of information always propagate from input to the

outputs. Thus, we would be able to use it for input space characterization for achieving zero output

error due to SEUs. We would instantiate a desired observation in an output node (say zero error)

and backtrack the inputs that can create such a situation. Ifthe input trace has large distance from

the characterized input space, we can conclude that zero error is reasonably unlikely. Note that

this aspect of probabilistic aspect is already used in medical diagnosis but are new in the context

of input space modeling for soft error.

This exact inference in expensive in terms of time and hence for larger circuits, we explore

a stochastic sampling algorithm, namely probabilistic Logic Sampling (PLS). This algorithm has

been proven to converge to the correct probability estimates [24], without the added baggage of


X3,1

X2,1

X1,1

X5,2S0

X5,2

X4,2

X6,3

T6_(5,2)

X6,3S

C7=[(X4,2), (X1,1),(X2,1)]

C1=[(X6,3), (X4,2),(X5,2)] C3=[(X5,2),(X2,1),(X3,1)]

C5=[(X6,3), (T_6,(5,2)) ,(X6,3S)]C4=[(X6,3),(X4,2), (X6,3S)]

C6=[(X6,3),(X4,2), (X5,2S0)]

C2=[(X4,2), (X5,2),(X2,1)]

[(X5,2), (X2,1)]

[(X6,3), X(6,3s)]

(a) (b)

Fig. 9. (a) Chordal Graph (b) Junction Tree

high space complexity.

B. Probabilistic Logic Sampling (PLS)

Probabilistic logic sampling is the earliest and the simplest stochastic sampling algorithms

proposed for Bayesian Networks [24]. Probabilities are inferred by a complete set of samples

or instantiations that are generated for each node in the network according to local conditional

probabilities stored at each node. The advantages of this inference are that: (1) its complexity

scales linearly with network size, (2) it is an any-time algorithm, providing adequate accuracy-time

trade-off, and (3) the samples are not based on inputs and theapproach is input pattern insensitive.

The salient aspects of the algorithm are as follows.

1. Each sampling iteration stochastically instantiates all the nodes, guided by the link struc-

ture, to create a network instantiation.

2. At each node,xk, generate a random sample of its state based on the conditional probabil-

ity, P(xkjPa(xk)), wherePa(xk) represent the states of the parent nodes. This is the local,


importance sampling function.

3. The probability of all the query nodes are estimated by therelative frequencies of the states

in the stochastic sampling trace.

4. If states of some of the nodes are known (evidence), such asin diagnostic backtracking,

network instantiations that are incompatible with the evidence set are disregarded.

5. Repeat steps 1, 2, 3 and 4, until the probabilities converge.

The above scheme is efficient for predictive inference, whenthere is no evidence for any node,

but is not efficient for diagnostic reasoning due to the need to generate, but disregard samples that

do not satisfy the given evidence. We adopt the tool GeNie [20] for inference using Probabilistic

Logic Sampling.

Complexity: The computational complexity of the exact method is exponential in terms of

number of variables in the largest cliques. Space complexity of the exact inference isn:2jCmaxj [3],

where n is the number of nodes in the Bayesian Network, andjCmaxj is the number of variables in

the largest clique. The time complexity is given byp:2jCmaxj [3] where p is the number of cliques.

The time complexity, based on the stochastic inference scheme, is linear inn, the number of

nodes in the expanded circuit, specifically, it isO(njNSEUjN), whereNSEU is the number of SEUs

andN is the number of samples.

VI. EXPERIMENTAL RESULTS

We demonstrate the modeling of SEU based on TALI-SES using ISCAS benchmark circuits.

The logical relationship between the inputs and the output of a gate determines the conditional

probability of a child node, given the states of its parents,in the TALI-DAG.

In Table II we report the total number of gates in the actual circuit (column 2), total number

of gates in the modified expanded circuit (column 3), and the total number of nodes in the resulting

TALI-SES (column 4). Column 5 lists the maximum time-framesof the circuits.


TABLE II

SIZE OF ORIGINAL AND TIME-EXPANDED ISCAS CIRCUITS FOR FANOUT-DEPENDENT DELAY MODEL

Gates Gatesex-panded

# ofnodes(TALI)

Timeframes

c432 196 476 1989 55c499 243 464 1596 30c880 443 729 2552 51

c1355 587 1440 3388 55c1908 913 1524 18118 79c2670 1426 2584 4097 81c3540 1719 3795 15670 93c5315 2485 4887 13228 90c6288 2448 30113 31157 263c7552 3719 10006 45907 88

We compute the SEU sensitivity of an individual nodeP(Tj) in a circuit as follows:

1. Compute the output error probability at output nodei due to an SEU at node j at time t by

taking the joint probabilities as discussed in section IV D.

P(T( j ;t) i) = P(T( j ;t) (i;ts)[T( j ;t) (i;ts�1)) (11)

2. Considering the effect of all SEUs at node j at all possibletime frames, compute the prob-

ability of occurrence of an error at theith output due SEUs at node j by Eq. 12.

P(Tj i) = max8tfP(T( j ;t) i)g (12)

3. Compute the worst case SEU sensitivity of a node j due to anSEU1 andSEU0 and all for

outputs by Eq. 13

P(Tj) = max8ifP(T1

j i);P(T0j i)g (13)


node j SEU1 SEU0

P(Tj 22) P(Tj 23) P(Tj 22) P(Tj 23)10 0.2813 0 0.4375 011 0.0625 0.2344 0.3125 0.656316 0.3125 0.1875 0.4375 0.437519 0 0.375 0 0.437522 0.4375 0 0.5625 023 0 0.4375 0 0.5625

TABLE III

ESTIMATED P(Tj i) VALUES OF NODES IN BENCHMARK C17 FROM EXACT INFERENCE

A. Exact Inference

In this section, we explore a small circuit c17, with exact inference where we transform the

original graph into junction tree and compute probabilities by local message passing between the

neighboring cliques of the junction tree as outlined in section VA. Note that this inference is

proven to be exact [25], [29](zero estimation error).

Table III tabulates the results of the TALI-SES of benchmarkc17 using the exact inference.

In this table, we report the probabilities of error at outputnodes 22 and 23 due an SEU at each

node j (column 1) namely (10; 11; 16; 19; 22 and 23). Column 2 and 3 of Table III give error

probabilities due toSEU1 (0-1-0 transition) at output nodes 22and23 respectively. Similarly 4 and

5 give error probabilities due toSEU0 (1-0-1 transition) at output nodes 22and23 respectively. We

compare the error-free outputs at 22 and 23 at sampling timets with corresponding error sensitized

outputs arriving at time framests� 1 andts due to SEUs generated at a node at all possible time

frames (as discussed in section IV D). Columns 2, 3, 4 and 5 of Table III reports the maximum

of error probabilities due to SEUs originated at individualnodes at all time frames. From this

table it can be seen that for this benchmark circuitSEU0s have high impact on the output error

probabilities thanSEU1s. Error probability at output node 22 due to anSEU1 at node 11, is very

low (0.0625) whereas error probability at output 22 due toSEU0 at 11 is 0.3125. It also shows


that the effect of SEUs are not the same over all outputs. For example, anSEU1 at node 19 causes

no error at output 22 whereas error probability due to this SEU at output node 23 is 0.4375. Note

that nodes 22 and 23 are the output nodes. SEUs occurring at these nodes at sampling timets or

time ts�1 will be latched by an output latch, and are expected to causevery high error probability.

However from Table III, it is observed that probability of occurrence of an error due toSEU1 at

node 23 is only 0.4375. Similarly, probability of occurrence of an error due toSEU1 at node 22 is

also 0.4375. This is due to the type of input pattern. In this work, we assume random inputs. This

result shows the dependence of input pattern onP(Tj i).A.1 Input Space Characterization

In this section, we describe the input space characterization for a particular observation explor-

ing the diagnostic (backtracking) feature of the TALI-SES model. Note that this feature makes it

really unique as instead of predicting the effect of inputs and SEU at a node on the outputs, we try

to answer queries like “What input behavior will make SEU at node j definitely causing a bit-flip

the at circuit outputs?” or “What input behavior will be moreconducive to no error at output given

that there is an SEU at node j?” Resolving queries like this, aids the designer in observing the input

space and helps perform input clustering or modeling. Let ustake an example of c17 benchmark.

We explore the input space for studying the effect ofSEU0 andSEU1 at node 19 on errors on both

the outputs (22 and 23). One can characterize input space forany one of the outputs (or in general

effect of SEU at any node on any other subset of nodes). Fig 10acharacterizes the input space for

anSEU0 at node 19 such that no bit-flip occur at the outputs. This is done by setting the output

error probability at zero (by giving “evidence” to the detection nodes in the Bayesian Network) and

then back propagating the probabilities. We plot the probabilities of each inputs 1; 2; 3; 6 and 7

that gives no output error for anSEU0 at 19. Each column in the plot represents an input. The

lighter color represents the probability of that input= 0 and the black color represents the proba-


0

0.2

0.4

0.6

0.8

1

1.2

In 1 In 2 In 3 In 6 In 7

INPUTS

PR

OB

AB

ILIT

IES

P(in) =1 P(in) = 0

0

0.2

0.4

0.6

0.8

1

1.2


INPUTS

PR

OB

AB

ILIT

IES

P(in) =1 P(in) = 0

(a) (b)

0

0.2

0.4

0.6

0.8

1

1.2


INPUTS

PR

OB

AB

ILIT

IES

P(in) =1 P(in) = 0

0

0.2

0.4

0.6

0.8

1

1.2


INPUTS

PR

OB

AB

ILIT

IES

P(in) =1 P(in) = 0

(c) (d)

Fig. 10. Input probabilities for achieving zero output errors (at nodes 22and23 in presence of SEU’s: (a)SEU0 atnode 19 (b)SEU1 at node 19 (c)SEU0 at node 11 (d)SEU1 at node 11 for c17 benchmark

bility of input = 1 (sum of these two part should always beone). One can see that for obtaining

zero output error with anSEU0 at 19, input 1 can be random, input 2 and 7 have 65% probability

of being at logic one and node 3 and 6 has probability of 30% forlogic 1. Note that the input space

is nearly random (p(1)=p(0)=0.5) whenSEU1 at node 19 produces zero output error at both the

outputs. Similar characteristics are shown in Fig. 10c, 10dfor characterizing the input space with

respect to output errors whileSEU0 or SEU1 occurs at node 11. Once again it can be seen that


0

500

1000

1500

2000

2500

3000

3500

4000

c432

c499

c880

c1355

c1908

c2670

c3540

c5315

c6288

c7552

Benchmarks

No

. o

f G

ate

s/S

EU

s Listed SEUs Gates

0

50

100

150

200

250

300

350

400

450

500

c432

c499

c880

c135

5

c190

8

c267

0

c354

0

c531

5

c628

8

c755

2

Benchmarks

No

. o

f S

EU

Sen

sit

ive G

ate

s

0.0<p<=0.3 0.3<p<=0.6 0.6>p

(a) (b)

Fig. 11. (a)SEU List-Fanout Dependent Delay Model (b)SEU Sensitivity Range-Fanout Dependent Delay Model,with Delta=1; Input Bias=0.5

zero output error forSEU1 can be more likely by a random inputs than forSEU0.

B. Larger Benchmarks

We use approximate inference for larger circuit using Probabilistic Logic sampling [24] which

is pattern independent random markov chain sampling and hasshown good results in many large

industry-size applications.

In Fig. 11(a), we plot the number of gates and the number of possibly sensitized SEUs for

ISCAS benchmarks. This reduced SEU list was created based onfanout-dependent delay model

and assuming an SEU durationδ equal to one time unit. We get a considerable reduction in the

number of listed SEUs compared to the number of gates in a circuit. This is because reduced

SEU list is generated by traversing backward from the final outputs evaluated at sampling timets

andts� 1 and only those gates that lie between the final outputs and duplicate gates need to be

considered for SEU sensitivity analysis. Depending on the input pattern and the circuit structure,

only a few of these SEUs actually cause soft errors. Based on the estimated SEU sensitivityP(Tj)calculated as in Eq. 13 we classify the SEU sensitive gates ina circuit into three categories, gates


whereP(Tj) is (i) less than or equal to 0.3 (ii) between 0.3 and 0.6 and (iii) above 0.6. This

is plotted in Fig. 11(b). These results are helpful to apply selective redundancy measures or to

modify P(SEUj) (by changing device features) by giving higher priority to nodes those are in the

high sensitivity range than those in the lower sensitivity ranges. From Fig. 11(b), it can be seen

that the SEU sensitive nodes of circuit c432 are equally distributed within the three probability

ranges (i), (ii) and (iii), whereas all the SEU sensitive nodes in circuit c1355 lie within the middle

range whereP(Tj) is between 0.3 and 0.6. Results of c7552 shows thatP(Tj) of most of the SEU

sensitive nodes is in the lowest range (less than or equal to 0.3), which indicates that gates in this

circuit do not require extensive hardening techniques, whereas majority of SEU sensitive gates in

c2670 requires extensive hardening techniques sinceP(Tj) is very high (above 0.6) for these nodes.

We implemented the SEU simulator based on the work done in [5]with a fanout-dependent

delay model for the ground truth. We performed the simulation with 500;000 random vectors

obtained by changing seed after every 50000 vectors to get the ground-truth SEU probabilities.

For our probabilistic framework, we use Probabilistic Logic Sampling [24] inference scheme. We

compute the SEU sensitivitiesPj of gates in ISCAS benchmark circuits using Probabilistic Logic

Sampling (PLS) [24] with 9999 samples and compare our results with ground-truth simulation re-

sults. Table IV gives the average estimation errorEmean[in column 2] and maximum estimation

errorsEmax [in column 3]. HereEmeanof a circuit is the average of difference between theSEU

detection probabilities (orSEUsensitivities) obtained from simulation and estimated probabilities

from PLS sampling over all possible SEU sensitive nodes in the circuit. SimilarlyEmaxof a circuit

is the maximum of difference between theSEU sensitivities obtained from simulation and esti-

matedSEU sensitivities from PLS sampling over all possible SEU sensitive nodes in the circuit.

Estimation time,Tbn [column 4] is the time taken by the PLS scheme for belief propagation. We

estimated the SEU sensitivities all the ISCAS’85 benchmarks with an average belief propagation


TABLE IV

SEU SENSITIVITY ESTIMATION ERRORS AND TIME FOR9999SAMPLES.(Emean) (Emax) Tbn(sec)c432 0.0031 0.0069 18.57c499 0.0024 0.0198 13.43c880 0.0027 0.0090 27.58

c1355 0.0027 0.0120 28.84c1908 0.0028 0.0120 176.63c2670 0.0034 0.0130 34.70c3540 0.0023 0.0101 148.07c5315 0.0045 0.0112 121.62c7552 0.0035 0.0100 513.05

time of 140.49 sec, whereas the average time taken for logic simulation of these circuits is 33

hours. Estimation error over all benchmarks is below 0.0034which shows excellent accuracy-time

trade-off.Tbnis the total elapsed time,including memory and I/O access. This time is obtained by

the ftime command in the WINDOWS environment on a Pentium-4 2.0 GHz PC. It is evident from

the results that using a graph-based causal, compact probabilistic framework, Bayesian Network,

we are able to accurately model the Single-event-upset (SEU) sensitivities of logic circuit signals

accounting for temporal and spatial dependencies. The exciting feature of this stimulus-free ap-

proach is that it uses conditional independencies in modeling spatial correlations and time-space

transformation for capturing temporal dependencies.

C. Results with Delay Model based on Logical Effort

In this section we give estimation results from our model with logical effort based gate delay

modeling. In Table V, we list the number of nodes in TALI Bayesian network and the estimation

time in seconds for some of the ISCAS benchmarks. Number of TALI nodes depends on the SEU

list as well as the circuit size, whereas estimation time directly depends on the number of nodes

and the number of samples. We show results for ProbabilisticLogic Sampling (PLS) with 9999

samples.

Figure 12(a) shows the number of possibly sensitized SEUs vs. the number of gates in ISCAS


TABLE V

SIZE OF TALI-M ODEL AND ESTIMATION TIME FOR LOGICAL-EFFORT BASED DELAY MODEL

# of nodes(TALI)

EstimationTime(s)

c432 2390 22.32c499 7814 65.75c880 1097 12.49

c1355 1773 15.092c1908 2279 22.22c3540 14370 135.79

benchmarks. From this graph, it can be seen that the number ofSEUs in the reduced SEU list is

low compared to fanout dependent delay model. This is due to high gate delay values with logical

effort based delay modeling since we take into account the input capacitance as well as parasitic

delay in addition to fanout. Due to increased gate delays therelative effect of an SEU at an internal

gate on a primary output during latching period is less sincemost of the signals get enough time to

restore to their ideal values. Figure 12(b) shows the SEU sensitivity ranges of gates in the circuits,

with an input bias of 0.5 and SEU width equal to one time unit. As with fanout-dependent delay

modeling, here also we classify the SEU sensitive gates in a circuit into 3 categories. Gates with

estimated sensitivity values (1) less than 0.3, (2) between0.3 and 0.6 and (3) above 0.6. Given

any delay library for a logic circuit, our model can be used toclassify the gates in the circuit in

the order of their SEU sensitivity values capturing logicalmasking effect, circuit structure, input

pattern and SEU duration.

Please note the above estimated probability values are relatively high when we consider the

overall soft error susceptibility of individual gates. To get a comprehensive model, the electrical

masking effect, latching window masking effect and also theSEU generation and propagation

characteristics of individual gates are to be incorporatedwith our model. Modeling electrical

masking effect needs circuit level simulation techniques,which we are trying to integrate with our

current approach as a future direction.


0

200

400

600

800

1000

1200

1400

1600

1800

2000

c432 c499 c880 c1355 c1908 c3540

Benchmarks

No

. o

f G

ate

s/S

EU

s

Listed SEUs Gates

0

20

40

60

80

100

120

c432 c499 c880 c1355 c1908 c3540

Benchmarks

No

. o

f S

EU

Sen

sit

ive G

ate

s

0.0<p<=0.3 0.3<p<=0.6 0.6>p

(a) (b)

Fig. 12. (a)SEU List-Logical Effort Delay Model (b)SEU Sensitivity Range-Logical Effort Delay Model with Delta=1 and Input Bias = 0.5

VII. CONCLUSION

We are able to effectively model Single-event-Upsets in logic circuits (ISCAS benchmarks)

to estimate the SEU sensitivity of individual nodes in a circuit capturing spatial and temporal sig-

nal correlations, specially emphasizing the effect of inputs, gate delay, SEU duration and circuit

structure. We show results with exact and approximate inferences. Using exact inference we char-

acterize input space which gives zero output error even in the presence of some SEUs. Results

from approximate inference shows excellent accuracy-timetrade-offs. We report SEU sensitiv-

ity estimates for fanout dependent delay model as well as forlogical effort based delay model.

Given an appropriate delay library of gates in a circuit, ourmodel is capable of estimating SEU

sensitivities of individual gates in the circuit and these results can be used for classifying gates for

application of mitigation schemes. Future effort includesmodeling with biased input patterns and

also for different SEU widthδ, to study the effect of these factors on SEU sensitivities. We are

also investigating on the effect of threshold voltage and supply voltage on the electrical masking

effect on transient pulses caused by particle bombardment.


VIII. L IST OF SYMBOLS

I- SESj�QL: Soft Error Susceptibility of a nodej with respect to a latch outputQL.

II- Tj i : A Boolean variable to identify the occurrence of an error atoutputi condi-

tional to anSEUat nodej.

III- RH : Particle Hit Rate on a chip.

IV- P(SEUj) : Probability that a particle hit at node j generates an SEU atthat node.

V- P(QLjTj i) : Probability that an error at output node i due to an SEU at node j

causes an erroneous signal at latch outputQL.

VI- Xi;t : Random variable representing the value of signali at timet.

VII- P(xi) : Probability that signalXi takes the valuexi .

VIII- δ : Width of an SEU (SEU duration).

IX- ts : Sampling time.

X- fg : fan-out of a gateg.

XI- SEU0 : An SEU causing a 10 1 transition of a signal.

XII- SEU1 : An SEU causing a 01 0 transition of a signal.

XIII- j; t1s : An SEU1 at node j at timet.

XIV- j; t0s : An SEU0 at node j at timet.

XV- P(T( j t);(i ts)) : Probability of an SEU at nodej at time t causing an error at

outputi at timets.

XVI- P(T( j t);(i ts�δ)) : Probability of an SEU at nodej at timet causing an error at

outputi at timets� δ.

XVII- P(T( j t);i) : Probability of an SEU at nodej at t causing an error at outputi.

XVIII- P(Tj) : Worst case SEU sensitivity of a nodej.


REFERENCES

[1] K. Mohanram and N. A. Touba, ”Cost-Effective Approach for Reducing Soft Error Failure Rate in Logic Circuits,”Interna-

tional Test Conference, pp. 893–901, 2003.

[2] D. Alexandrescu, L. Anghel and M. Nicolaidis, New Methods for Evaluating the Impact of Single Event Transients in VDSM

ICs,Proc. Defect and Fault Tolerance Symposium, pp. 99–107, 2002.

[3] S. Bhanja and N. Ranganathan, “Cascaded Bayesian inferencing for switching activity estimation with correlated inputs,”

Accepted for publication in IEEE Transaction on VLSI, 2004.

[4] P. Shivakumar, et al.,“Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic,”Proc.

International Conference on Dependable Systems and Networks, pp. 389–398, 2002.

[5] M. Violante, “Accurate Single-Event-Transient Analysis via Zero-Delay Logic Simulation,”IEEE Transactions on Nuclear

Science, Vol. 50, No. 6, pp. 2113–2118, 2003.

[6] T. Rejimon and S. Bhanja, “An Accurate Probabilistic Model for Error Detection ,”Proc. IEEE International Conference on

VLSI Design, pp. 717–722, Jan. 2005.

[7] M. Sonza Reorda and M. Violante, “Fault List Compaction through Static Timing Analysis for Efficient Fault Injection

Experiments,”Proc. Defect and Fault Tolerance Symposium, pp. 263–271, 2002.

[8] T. Karnik and P. Hazucha, ”Characterization of soft errors caused by single event upsets in CMOS processes,”IEEE Transac-

tions on Dependable and Secure Computing, Volume: 1-2, pp. 128–143, Apr-Jun. 2004.

[9] V. Degalahal, R. Rajaram, N. Vijaykrishan, Y. Xie and M. JIrwin, ”The effect of threshold voltages on soft error rate,” 5th

International Symposium on Quality Electronic Design, March 2004.

[10] Y. S. Dhillon, A. U. Diril and A. Chatterjee, “Soft-Error Tolerance Analysis and Optimization of nanometer circuits,” Pro-

ceedings of Design, Automation and Test in Europe, Volume: 1, pp. 288–293, Mar. 2005.

[11] S. Krishnaswamy, G. S. Viamontes, I. L. Markov, and J. P.Hayes, “Accurate Reliability Evaluation and Enhancement via

Probabilistic Transfer Matrices”,Design Automation and Test in Europe (DATE), March 2005.

[12] P. K. Samudrala, J. Ramos and S. Katkoori, “Selective Triple Modular Redundancy (STMR) Based Single-Event-Upset (SEU)

Tolerant Synthesis for FPGAs,”IEEE Transactions on Nuclear Science, Vol. 51, No. 5, Oct. 2004.

[13] Chong Zhao, Xiaoliang Bai and S. Dey, “A scalable soft spot analysis methodology for compound noise effects in nano-meter

circuits,” Proceedings of Design Automation Conference, pp. 894–899, Jun. 2004.

[14] T. Rejimon and S. Bhanja, “A Stimulus-Free Probabilistic Model for Single-Event-Upset Sensitivity,”Proc. IEEE Interna-

tional Conference on VLSI Design, Jan. 2006.

[15] M. Zhang and N. R. Shanbhag, “A Soft Error Rate Analysis (SERA) Methodology”International Conference on Computer

Aided Design, November, 2004.

[16] N. Seifertet al. “Impact of Scaling on Soft-Error Rates in Commercial Microprocessors”IEEE Transactions on Nuclear

Science,Volume: 49, No. 6, pp. 3100–3106, Dec. 2002.


[17] P. Hazuchaet al. “Measurement and Analysis of SER-Tolerant Latch in a 90-nm Dual-V �T CMOS Process”IEEE Transac-

tions on Solid-State Circuits,Volume: 39, No. 9, pp. 1536–1543, Sept. 2004.

[18] P. Hazucha and C. Stevenson, ”Impact of CMOS technologyscaling on the atmospheric neutron soft error rate,”IEEE Trans-

actions on Nuclear Science, Volume: 47-6 , pp. 2586–2594, Dec. 2000.

[19] URL http://www.hugin.com

[20] "GeNie", URL http://www.sis.pitt.edu/˜genie/genie2

[21] S. Manich and J. Figueras,“Maximizing the weighted switching activity in combinational CMOS circuits under the variable

delay model,”European Design and Test Conference, pp. 597–602, 1997.

[22] M. Nicolaidis,“Time Redundancy based Soft-Error Tolerance to Rescue nanometer Technologies,”VLSI Test Symposium, pp.

86–94, 1999.

[23] M. Omana, G. Papasso, D. Rossi, C. Metra,“A Model for Transient Fault Propagation in Combinatorial Logic,”On-Line

Testing Symposium, pp. 111-115, 2003.

[24] M. Henrion, “Propagation of uncertainty by probabilistic logic sampling in Bayes’ networks,”Uncertainty in Artificial Intel-

ligence, 1988.

[25] J. Pearl, “Probabilistic Reasoning in Intelligent Systems: Network of Plausible Inference,” Morgan Kaufmann Publishers,

Inc., 1988.

[26] P. Robinson, W. Lee, R. Aguero and S. Gabriel,“Anomalies due to single event upsets,” Journal of Spacecraft and Rockets,”

Journal of Spacecraft and Rockets, vol. 31, no. 2, pp. 166–171, Mar-Apr 1994.

[27] J.T. Wallmark and S.M. Marcus,“Minimum size and maximum packaging density of non-redundant semiconductor devices,”

Proceedings of IRE, vol. 50, pp. 286–298, March 1962.

[28] G.H. Johnson, J.H. Hohl, R.D. Schrimpf and K.F. Galloway,“Simulating Single-event burnout in n-channel power MOSFETs,”

IEEE Transactions on Electron Devices, vol. 40, pp. 1001–1008, 1993.

[29] R. G. Cowell, A. P. David, S. L. Lauritzen, D. J. Spiegelhalter, “Probabilistic Networks and Expert Systems”, Springer-Verlag

New York, Inc., 1999.

[30] I. Sutherland, R. Sproull and D. Harris, “Logical Effort: Designing Fast CMOS Circuits”,Morgan Kaufmann, February 1999.

[31] N. Ramalingam and S. Bhanja, “Causal Probabilistic Input Dependency Learning for Switching Model in VLSI Circuits”,

ACM GLSVLSI, 2005.

[32] R. Marculescu, D. Marculescu, M. Pedram, ”Sequence Compaction for Power Estimation: Theory and Practice”,IEEE Trans.

on Computer-Aided Design of Integrated Circuits and Systems, vol.18, No.7, pp. 973-993, July 1999.


IX. RESPONSE TOREVIEWERS

Associate Editor’s Comments: This paper requires a major revision. Reviewers 1 and 3 like

the paper but want the presentation to be improved. Reviewer2 is rather critical and has strong

objections. In summary, all three reviewers find the paper hard to understand, point to unnecessary

or complicated notation, and require improved writing. Allcomments should be responded to in

their entirety while preparing the revision. A summary of point-by-point responses outlining the

corresponding changes in the manuscript will be useful.

We thank the AE for compiling all the key issues and we revisedour manuscript significantly

based on the reviewers’ comments addressing all the relevant issues. We first provide the summary

of all the major revisions and then the detailed responses describing the revisions based on each of

the reviewers’ comments.

A. Summary of Revisions

In this subsection, we report all the significant changes made to the original manuscript. We

thank all the reviewers and the AE for helping us re-model andrestructure and most importantly

re-think and re-visit most of the concepts to improve the quality of this paper.

1. We show that TALI-SES model can handle any delay model at logic level and show results

using logical-effort based delay model, which depend on gate input capacitance, parasitic

capacitance as well as fanout. We added two sections:� Section IVB on logical effort based delay modeling� Section VIC, results based on this delay model.

2. We modified our algorithm and re-generated all the resultsbased on the reviewers’ sugges-

tions. Earlier we took the worst case probabilities as the maximum offP( j ; t)�(i; ts);P( j ; t)�(i; ts�1)g.

In this version, we realize that we need to compute joint probability values for computing

the output error probability at an output nodei due to an SEU at nodej at timet. This is


explained below:

The effect of an SEU at nodej at timet may be propagated to an outputi at sampling time

ts or ts� 1, if there are re-convergent paths between nodesj and i. In either case it can

cause a bit-flip at the output. We took the worst case probability which is the maximum

of fP( j ; t)�(i; ts);P( j ; t)�(i; ts�1)g. However, this might not be true for some SEUs. Ideally,

it should be a joint probability as explained below: LetA be an event that an SEU at node

j causes a bit-flip at outputi at timets and letB be an event that an SEU at nodej causes

a bit-flip at outputi at timets� 1. P(A= 1) is the probability of occurrence of error and

at timets. P(A = 0) is the probability that SEU doesn’t cause an error atts. P(B) can be

explained in a similar way. SEU sensitivity of nodej w.r.t. outputi is the joint probability

P(A[B) = P(A= 1;B= 0)+P(A= 0;B= 1)+P(A= 1;B= 1)We re-ran all benchmark circuits by taking joint probabilit y values P(T( j ;t) (i;ts) [T( j ;t) (i;ts�1)) as discussed in section IV D in page 20 and modified our results(in Ta-

ble II, Table III and Table IV)

3. We appended excel sheets assupplemental documentsgiving exhaustive SEU sensitivity

estimates of individual nodes in benchmark circuits.

4. We modified bar charts in Fig. 11 (a) and (b) as suggested by the reviewers.

5. Last but not the least, we modified the write-up and cleareda few confusions in the nota-

tions.

B. Detailed Response

In this subsection, we respond to each of the reviewer separately and also indicate the revisions

in the main manuscript.

Response to Reviewer 1’s comments:

A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 42� This is a good paper introducing the use Bayesian networks for estimating SET probability.

It may improve the state of the art in this domain.

We thank you very much for reviewing this work and for appreciating our effort. We

appreciate your valuable comments and give appropriate modifications and new results as

suggested by you.� The delay model is simplistic.

We agree with the reviewer that the delay model is simplistic. However, time space transfor-

mation and Bayesian network modeling can be done for any typeof delay models, without

affecting model structure. Given an accurate delay libraryfor gates in a logic circuit, our

model can be used to estimate SEU sensitivities of gates in a circuits, capturing logical

masking effect, circuit structure, input pattern and SEU duration . We have added new re-

sults by using logical effort based delay model consideringinput capacitance, fanout and

parasitic capacitance. We included two additional sections. (1) Section IVB which explains

modeling using logical effort based delay model. (2) Section VIC giving estimated results

with this delay model.

Calculation of gate delays based on logical effort has been explained in several works and

we used the reference [30]. Here we summarize it briefly.

Delay of a logic gate can be expressed as the sum of two components, effort delay and

parasitic delay. Effort delay is the product of logical effort and electrical effort, where

logical effort is defined as the relative ability of a gate topology to deliver current and

electrical effort is the ratio of output capacitance to input capacitance. Electrical effort is

sometimes called fanout. Mathematically, gate delay is expressed asd = f + p = gh+ p

wheref is effort delay,p is the parasitic delay,g is the logical effort andh is electrical effort.

Logical effort is defined to be 1 for an inverter. Hence logical effort is the ratio of input


TABLE VI

GATE DELAYS BASED ON LOGICAL EFFORT

Gate Type DelayInverter f anout+Pinv

n-input NAND n+23 � f anout+ nPinv

n-input NOR 2n+13 � f anout+Pinv

2-input XOR 4� f anout+ 4nPinv

capacitance of a gate to the input capacitance of an inverterdelivering the same output

current. It can be estimated counting capacitance in units of transistor width. Parasitic

delay represents delay of a gate driving no load and it depends on diffusion capacitance.

Parasitic delay of an inverter,Pinv � 1. From the above considerations, we compute basic

CMOS gate delays and use these delay values in our model. Table below shows the delay

expressions for basic gates.� The presentation needs some improvements. At the bottom of page 12 you say that gate

(10,4) is additional. This is not justified from the previousdiscussion. If you add this gate

because of a transient pulse of a one unit duration, this aspect is discussed in page 14. So,

saying in page 12 that you add page (10,4) is confusing for thereader. As a matter of fact:

1- Remove from page 12 the above statement.

Done

2- Either remove from figure 3 gate (10,4) and add later a new figure in which you add gate

(10,4) or mention in page 11 that the reasons for adding this gate in figure 3 are given later.

Similar problems with gate (19,4).

Thank you very much for pointing this out and helping us to clear this. We have mentioned

in page 12 that the reasons for adding duplicate gates like (10,4) and (19,4) are explained

later. We explain the reasons in page 13, which we repeat below.

The duplicate gates are introduced due to difference in pathdelays between the input signals


of a gate. Addition of these gates serves two purposes.

1. Model the effect of any SEUs originated at an input having lesser path delay during

the period while it waits for the arrival of the other input signal. Example: Duplicate

gate(10;5) captures the effect of an SEU originated at the output of gate10, at time

t = 5.

2. Model the masking effect of some of the SEUs generated in the signal path of the input

having lesser path delay. Example: Duplicate gate(10;5) mask the effect of an SEU

originated at the output of gate 10, at timet = 2.

We have explained these points in page 13.� In page 17 explain why the output error probability due to node j is equal to the maximum

of 0 SEU and 1 SEU and not the sum.

An SEU occurring at nodej at timet, which is eitherSEU0 or SEU1, (not both)depending

on the location of particle bombardment, can cause a bit-flipat the output with probability

P(T0j ; t) or P(T1

j ; t). We take the worst case probability which is the maximum of the above

two probabilities. This is explained in page 21.� Same page, explain why you take the maximum of soft error probability at time ts and at

time ts-d.

We are extremely thankful for the reviewer for pointing thisout. We thought about it and

found out that a correction is needed here which we explain below. We appreciate the

reviewer for his effort to understand our work and to give us helpful suggestions.

The effect of an SEU at nodej at time t may be propagated to an outputi at time ts

or ts� 1, if there are re-convergent paths between nodesj and i. In either case it can

cause a bit-flip at the output. We took the worst case probability which is the maximum

of fP( j ; t)�(i; ts);P( j ; t)�(i; ts�1)g. However, this might not be true for some SEUs. Ideally,


it should be a joint probability as explained below: LetA be an event that an SEU at node

j causes a bit-flip at outputi at timets and letB be an event that an SEU at nodej causes

a bit-flip at outputi at timets� 1. P(A= 1) is the probability of occurrence of error and

at timets. P(A = 0) is the probability that SEU doesn’t cause an error atts. P(B) can be

explained in a similar way. SEU sensitivity of nodej w.r.t. outputi is the joint probability

P(A[B) = P(A= 1;B= 0)+P(A= 0;B= 1)+P(A= 1;B= 1) We modified our results

by taking joint probability values, as described by the above equation.

We have explained this in pages 20 and 21.� Same page, what is LIPEM.

We are sorry that we gave the wrong model name. It should be TALI-SES, instead of

LIPEM. It is corrected. Thank you for pointing out that mistake.� Consider the example of figure 4. Probability of X6 uses the probability of parent nodes x4,

x5. These probabilities are dependent. How the model takes into account such dependen-

cies? Perhaps this is described in references given in the paper, but for self-sufficiency, it is

better to add a sentence on this aspect.

Thank you for the reviewer’s comment and we’ve added an explanation to this in page 18,

which we further explain below:

These dependencies are taken care of during Bayesian inferencing. The first step in Bayesian

inference scheme is moralization where an undirected edge is added between each pair of

disconnected vertices with common children and, when this has been completed, all di-

rected edges are replaced by undirected ones. The resultinggraph structure is called moral

graph. This step ensures that every parent child set is a complete sub graph. In example

figure 4, nodes x4 and x5 are connected by undirected edges during moralization, thereby

preserving their dependencies.


We have explained this in detail in Section V A under Bayesianinference schemes.

Response to Reviewer 2’s comments:

My detailed comments are grouped into several categories listed below:

1. Methodology and problem formulation:� doesn’t include electrical masking, which can also affect susceptibility of nodes - equation

3, not really the best way to represent latching-window masking; it should depend on setup

and hold time and duration of the glitch, too; - this equationdoesn’t cover all cases when

a glitch is not masked (very inaccurate)

We agree with the reviewer that our model doesnt cover electrical masking effect. We would

like to point out that our focus here is to model the effect of logical masking, gate delays,

circuit re-convergence, temporal nature of SEUs and input pattern. Modeling Electrical

masking needs circuit level simulation or estimation techniques, which we are currently

trying to integrate with our current approach as a future direction. There are works done on

soft error rate analysis modeling electrical masking. (Example: Dhillonet. al [10]). Our

model can be fused with any such model to get the exact SEU sensitivity values.

We also agree with the reviewer that the empirical equation 3doesn’t consider latching

window masking effects which is dependent on set-up and holdtime requirements. How-

ever, this equation is good enough for modeling logical masking effect and other timing

factors such as temporal nature of SEUs and gates delays. Theprobabilities estimated with

our TALI model, when multiplied with other factors such as (1) particle bombardment rate

(2) probability of occurrence of an SEU when a particle hit occurs at a node (3) electrical

masking factor and (4) latching probability will give the overall soft error susceptibility of

individual gates in a circuit.� a lot of inconsistency in notation, same things defined more than once in a different way


(e.g. Tj�i, pages 2 and 8; also, sometimes they use P(Tj�ijSEU), sometimes P(Tj�i)Thank you for pointing out the inconsistency. We are now using the termP(Tj i) as the

conditional probability of getting an erroneous signal atith output conditional to an SEU

at nodej. The conditional aspect is implemented by using an extra node in the TALI-SES

model to inject an erroneous signal (SEU) at nodej.� assumption about gate delay - page 3, they assume gate delay is proportional to fan-out

-page 10, the authors assume is equal to fan-out; in any case,it is pretty inaccurate, it will

depend on fan-in and parasitic delay also

In this work, we assumed that gate delay is equal to fan-out. We’ve corrected it in page

3. Our model can handle any delay model. We can plug-in an average delay value for

each gate type. We incorporate logical effort-based delay model, which takes care of input

capacitance, parasitic capacitance as well as fanout. We included two additional sections.

(1) Section IVB which explains modeling using logical effort based delay model. (2) Sec-

tion VIC giving estimated results with this delay model. Thus we show that, given an

appropriate delay library of gates in a circuit, our model iscapable of estimating SEU sen-

sitivities of individual gates in the circuit.

We have explained logical effort based delay modeling underresponse to reviewer 1’s com-

ments.

2. Experimental results:� Figure 9. - I would say this represents logical masking, but it doesnt’ really say much about

the circuit and the soft-error susceptibility of differentgates

We agree with the reviewer that the probabilities estimatedby our model do not cover

all factors which contribute to the soft error susceptibility. We haven’t modeled electrical

masking effect which need circuit level simulation. As we mentioned before, the scope of


this work is limited to logical level modeling and we model logical masking effect, circuit

re-convergence, inputs, timing issues such as gate delays,SEU duration, path delays, etc.

In this work, we estimate the SEU sensitivity of gates in a circuit in terms of the probability

that an SEU generated at an internal node produces an erroneous signal at the latch inputs.

This probability, when multiplied with the particle bombardment rate, electrical masking

probability and the latching probability will give the actual soft error susceptibility of in-

dividual gates. The existing estimation techniques for SEUsensitization (logical masking)

do not handle circuits with re-convergence. Our estimationtechnique is accurate and ef-

ficient when compared with simulation. This model can be combined with other latching

and electrical masking models to get the exact soft error susceptibility values.� Figure (10a) - these three bars are redundant - one bar with two colors will give same

information

We modified the charts by giving number of SEUs relative to thenumber of gates� Figure 10 - maybe SEU numbers relative to number of gates in the circuit would explain

better the influence of SEU in different circuits, instead ofpresenting just SEU numbers

Done� worst-case probabilities average would be better, maybe, min, median, too

We have appended excel sheets showing exhaustive SEU sensitivity values (average, mini-

mum, maximum and median values) of individual gates in the benchmark circuits.� except for the smallest circuit, no results showing probabilities that they claimed they cal-

culated they just show number of SEUs .

As said before, we have appended excel sheets showing all probabilities.� Table III no units in last column

Time is in seconds. We’ve added that.

A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 49� fonts in figures are too small

We modified the figure.

3. Missing references: - Krishnaswamy et al. ”Accurate Reliability Evaluation and Enhance-

ment via Probabilistic Transfer Matrices”, DATE 2005

- Dhillon et al. ”Soft-Error Tolerance Analysis and Optimization of Nanometer Circuits”

DATE 2005

- Bahar et al. A Probabilistic-Based Design Methodology forNanoscale Computation ICCAD

2003

Thank you for your suggestions. We added the first two references. However, the third refer-

ence mentioned here is not related to our work.

4. Minor things:

page 2, last paragraph: do you mean ”SEU” duration? Also, (4)for inputs should be actually

(5)

It should be SEU duration. We corrected the errors.

page 6, last paragraph, last line: ”doping as well as circuitstructure on the SER of a product”

doesnt make any sense

It actually means that the authors propose a methodology to quantify the impact of various

factors (such as supply voltage, transistor size, circuit topology, doping as well as circuit structure)

on the Soft Error Rate (SER) of a chip. We’ve made it clear in page 6.

page 13, 3rd paragraph: ”SEUs which satisfy Eq. 3 affect circuit outputs resulting in soft

errors” is misleading since the authors say the opposite when eq. (3) is introduced

We thank you for bringing it to our attention. It should be “SEUs which do not satisfy Eq. 3

affect circuit outputs resulting in soft errors”. It is corrected.

Reviewer 3 Comments:

A STIMULUS-FREE PROBABILISTIC MODEL FOR SINGLE-EVENT-UPSET SENSITIVITY 50� Author proposed a probabilistic framework to study the single-event-upset in logic circuit.

This model has some nice and interesting features, such as the consideration of the effect

of inputs, gate delays, SEU durations, and the exploration of input space that leads to zero

error. Although most of the technique adopted in this paper is not new, author did a good

job in constructing such a probabilistic framework with above considerations using both

accurate and approximated inference of Bayesian network.

Thank you for appreciating our effort.� Few problems still exists in this paper. The writing of the paper should be further improved,

especially section 3.

We’ve modified the writing especially section III and made itclearer.

In page 8, the 2nd line. It says ”P(Tj i) is the probability that an SEU generated at an

internal node j caused an erroneous signal at output i.” It looks like P(T j i) here is the

conditional probability that a error occurs when an SEU happens. However, in equation 1,

P(Tj ijSEUj) is used to represent such a conditional probability. It confused me here.

Thank you for pointing out the inconsistency. We are now using the termP(Tj i) as the

conditional probability of getting an erroneous signal atith output conditional to an SEU

at nodej. The conditional aspect is implemented by using an extra node in the TALI-SES

model to inject an erroneous signal (SEU) at nodej.

Another problem is the example time-space transformation of circuit. In Fig 3, for t=6, the

input of 22,6 is 10,5 and 16,5. But, isn’t gate delay equal to its fanout here? Then, the input

of 22,6 should be 10,5 and 16,4. Same thing happens in Figure 4, 6. Maybe I did not get it

correctly, but it is worth some more explanations here. Also, is the input delay also equal

to the fanouts?

Output signal evaluation time of a gate is equal to the sum of the gate delay (in this example,


we take gate delay = fanout) and the latest arrival time of itsinput signals. Taking fan-out

of primary output gates (here, gates 22 and 23) as 1, signal 22;6 is derived from signals

10;5 and 16;5. Ideally, gate delay depends on input capacitance and parasitic capacitance.

We give results for logical effort based delay model and showthat TALI-SES can be used

with any type of delay models.

Documents

SUBMITTED FOR PUBLICATION TO: , FEBRUARY 14, …bhanja/Sanjukta Bhanja/pdfs/SEU_TVLSI.pdf · A Stimulus-free Probabilistic Model for Single-Event-Upset Sensitivity Abstract With device