ES-Module_8

7/30/2019 ES-Module_8

http://slidepdf.com/reader/full/es-module8 1/95

Module

8 Testing of Embedded

SystemVersion 2 EE IIT, Kharagpur 1

w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Lesson

38 Testing Embedded

SystemsVersion 2 EE IIT, Kharagpur 2

w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Instructional Objectives

After going through this lesson the student would be able to

• Distinguish between the terms testing and verification

• Describe the common types of faults that occur in embedded systems

• Explain the various types of models that are used to represent the faults

• Describe the methodology of testing systems with embedded cores

• Distinguish among terms like DFT, BIST and on-line testing

• Explain the need and mechanism of Automatic Test Pattern Generation in the context of

testing embedded hard-ware software systems

Testing Embedded Systems

1. Introduction

What is testing?

• Testing is an organized process to verify the behavior, performance, and reliability of adevice or system against designed specifications.

• It ensures a device or system to be as defect-free as possible.

• Expected behavior, performance, and reliability must be both formally described and

measurable.

Verification vs. Testing [1]

• Verification or debugging is the process of removing defects ("bugs") in the design phaseto ensure that the synthesized design, when manufactured will behave as expected.

• Testing is a manufacturing step to ensure that the manufactured device is defect free.

• Testing is one of the detective measures, and verification one of the corrective measuresof quality.

Verification Testing

Verifies the correctness of design. Verifies correctness of manufactured

system.

Performed by simulation, hardware

emulation, or formal methods.

Two-part process:

1. Test generation: software processexecuted once during design.

2. Test application: electrical tests

applied to hardware.

Version 2 EE IIT, Kharagpur 3

w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Performed once prior to manufacturing. Test application performed on every

manufactured device.

Responsible for quality of design. Responsible for quality of devices.

What is an "embedded system"?

Embedded systems are electronically controlled system where hardware and software arecombined [2-3]. These are computers incorporated in consumer products or other devices to

perform application-specific functions. The enduser is usually not even aware of their existence.Embedded systems can contain a variety of computing devices, such as microcontrollers,

application-specific integrated circuits, and digital signal processors. Most systems used in real

life as power plant system, medical instrument system, home appliances, air traffic controlstation, routers and firewalls, telecommunication exchanges, robotics and industrial automation,

smart cards, personal digital assistant (PDA) and cellular phone are example of embedded

system.

Real-Time System

Most, if not all, embedded systems are "real-time". The terms "real-time" and "embedded" areoften used interchangeably. A real-time system is one in which the correctness of a computation

not only depends on its logical correctness, but also on the time at which the result is produced.

• In hard real time systems if the timing constraints of the system are not met, system crashcould be the consequence. For example, in mission-critical application where failure is

not an option, time deadlines must be followed.

• In case of soft real time systems no catastrophe will occur if deadline fails and the time

limits are negotiable.

In spite of the progress of hardware/software codesign, hardware and software in embedded

system are usually considered separately in the design process. There is a strong interaction between hardware and software in their failure mechanisms and diagnosis, as in other aspects of

system performance. System failures often involve defects in both hardware and software.

Software does not “break” in the traditional sense, however it can perform inappropriately due tofaults in the underlying hardware, as well as specification or design flaws in either the hardware

or the software. At the same time, the software can be exploited to test for and respond to the

presence of faults in the underlying hardware. It is necessary to understand the importance of thetesting of embedded system, as its functions have been complicated. However the studies related

to embedded system test are not adequate.

2. Embedded Systems Testing

Test methodologies and test goals differ in the hardware and software domains. Embedded

software development uses specialized compilers and development software that offer means for debugging. Developers build application software on more powerful computers and eventually

test the application in the target processing environment.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


In contrast, hardware testing is concerned mainly with functional verification and self-test after

chip is manufactured. Hardware developers use tools to simulate the correct behavior of circuitmodels. Vendors design chips for self-test which mainly ensures proper operation of circuit

models after their implementation. Test engineers who are not the original hardware developers

test the integrated system.

This conventional, divided approach to software and hardware development does not address theembedded system as a whole during the system design process. It instead focuses on these two

critical issues of testing separately. New problems arise when developers integrate thecomponents from these different domains.

In theory, unsatisfactory performance of the system under test should lead to a redesign. In practice, a redesign is rarely feasible because of the cost and delay involved in another complete

design iteration. A common engineering practice is to compensate for problems within the

integrated system prototype by using software patches. These changes can unintentionally affectthe behavior of other parts in the computing system.

At a higher abstraction level, executable specification languages provide an excellentmeans to assess embedded-systems designs. Developers can then test system-level prototypes

with either formal verification techniques or simulation. A current shortcoming of many

approaches is, however, that the transition from testing at the system level to testing at the

implementation level is largely ad hoc. To date, system testing at the implementation level hasreceived attention in the research community only as coverification, which simulates both

hardware and software components conjointly. Coverification runs simulations of specifications

on powerful computer systems. Commercially available coverification tools link hardwaresimulators and software debuggers in the implementation phase of the design process.

Since embedded systems are frequently employed in mobile products, they are exposed

to vibration and other environmental stresses that can cause them to fail. Some embedded

systems, such as those in automotive applications, are exposed to extremely harsh environments.These applications are preparing embedded systems to meet new and more stringent

requirements of safety and reliability is a significant challenge for designers. Critical applications

and applications with high availability requirements are the main candidates for on-line testing.

3. Faults in Embedded Systems

Incorrectness in hardware systems may be described in different terms as defect, error

and faults. These three terms are quite bit confusing. We will define these terms as follows [1]:

Defect: A defect in a hardware system is the unintended difference between the implemented

hardware and its intended design. This may be a process defects, material defects, age defects or package effects.

Error: A wrong output signal produced by a defective system is called an error. An error is an“effect” whose cause is some “defect”. Errors induce failures, that is, a deviation from

appropriate system behavior. If the failure can lead to an accident, it is a hazard .

Fault: A representation of a “defect” at the abstraction level is called a fault. Faults are physicalor logical defects in the design or implementation of a device.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


3.1 Hardware Fault Model (Gate Level Fault Models)

As the complexity and integration of hardware are increasing with technology, defects

are too numerous and very difficult to analyze. A fault model helps us to identify the targets for

testing and analysis of failure. Further, the effectiveness of the model in terms of its relation toactual failures should be established by experiments. Faults in a digital system can be classified

into three groups: design, fabrication, and operational faults. Design faults are made by humandesigners or CAD software (simulators, translators, or layout generators), and occur during the

design process. These faults are not directly related to the testing process. Fabrication defectsare due to an imperfect manufacturing process. Defects on hardware itself, bad connections,

bridges, improper semiconductor doping and irregular power supply are the examples of physicalfaults. Physical faults are also called as defect-oriented faults. Operational or logical faults are

occurred due to environmental disturbances during normal operation of embedded system. Such

disturbances include electromagnetic interference, operator mistakes, and extremes of temperature and vibration. Some design defects and manufacturing faults escape detection and

combine with wearout and environmental disturbances to cause problems in the field.

Hardware faults are classified as stuck-at faults, bridging faults, open faults, power disturbance faults, spurious current faults, memory faults, transistor faults etc. The most

commonly used fault model is that of the “stuck-at fault model” [1]. This is modeled by having aline segment stuck at logic 0 or 1 (stuck-at 1 or stuck-at 0).

Stuck-at Fault: This is due to the flaws on hardware, and they represent faults of the signal

lines. A signal line is the input or output of a logic gate. Each connecting line can have two types

of faults: stuck-at-0 (s-a-0) or stuck-at-1 (s-a-1). In general several stuck-at faults can besimultaneously present in the circuit. A circuit with n lines can have 3

n–1 possible stuck line

combinations as each line can be one of the three states: s-a-0, s-a-1 or fault free. Even a

moderate value of n will give large number of multiple stuck-at faults. It is a common practice,

therefore to model only single stuck-at faults. An n-line circuit can have at most 2n single stuck-

at faults. This number can be further reduced by fault collapsing technique.

Single stuck-at faults is characterized by the following properties:

1. Fault will occur only in one line.

2. The faulty line is permanently set to either 0 or 1.

3. The fault can be at an input or output of a gate.

4. Every fan-out branch is to be considered as a separate line.

Figure 38.1 gives an example of a single stuck-at fault. A stuck-at-1 fault as marked at the output

of OR gate implies that the faulty signal remains 1 irrespective of the input state of the OR gate.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Faultv Response

True ResponseAND

AND

OR

Stuck-at-1

0(1)

0 (1)

0

0

1

1

Fig. 38.1 An example of a stuck-at fault

Bridging faults: These are due to a short between a group of signal. The logic value of theshorted net may be modeled as 1-dominant (OR bridge), 0-dominant (AND bridge), or

intermediate, depending upon the technology in which the circuit is implemented.

Stuck-Open and Stuck-Short faults: MOS transistor is considered as an ideal switch and twotypes of faults are modeled. In stuck-open fault a single transistor is permanently stuck in theopen state and in stuck-short fault a single transistor is permanently shorted irrespective of its

gate voltage. These are caused by bad connection of signal line.

Power disturbance faults: These are caused by inconsistent power supplies and affect the

whole system.

Spurious current faults: that exposed to heavy ion affect whole system.Operational faults are usually classified according to their duration:

Permanent faults exist indefinitely if no corrective action is taken. These are mainlymanufacturing faults and are not frequently occur due to change in system operation or

environmental disturbances.Intermittent faults appear, disappear, and reappear frequently. They are difficult to predict, but

their effects are highly correlated. Most of these faults are due to marginal design or manufacturing steps. These faults occur under a typical environmental disturbance.

Transient faults appear for an instant and disappear quickly. These are not correlated with each

other. These are occurred due random environmental disturbances. Power disturbance faults and

spurious current faults are transient faults.

3.2 Software-Hardware Covalidation Fault Model

A design error is a difference between the designer’s intent and an executable specification of the design. Executable specifications are often expressed using high-level hardware-software

languages. Design errors may range from simple syntax errors confined to a single line of adesign description, to a fundamental misunderstanding of the design specification which may

impact a large segment of the description. A design fault describes the behavior of a set of design

errors, allowing a large set of design errors to be modeled by a small set of design faults. The

majority of covalidation fault models are behavioral-level fault models. Existing covalidationfault models can be classified by the style of behavioral description upon which the models are

based. Many different internal behavioral formats are possible [8]. The covalidation fault models


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


currently applied to hardware-software designs have their origins in either the hardware [9] or

the software [10] domains.

3.2.1 Textual Fault Models

A textual fault model is one, which is applied directly to the original textual behavioral

description. The simplest textual fault model is the statement coverage metric introduced insoftware testing [10] which associates a potential fault with each line of code, and requires that

each statement in the description be executed during testing. This coverage metric is accepted ashaving limited accuracy in part because fault effect observation is ignored. Mutation analysis is a

textual fault model which was originally developed in the field of software test, and has also been applied to hardware validation. A mutant is a version of a behavioral description which

differs from the original by a single potential design error. A mutation operator is a function

which is applied to the original program to generate a mutant.

3.2.2 Control-Dataflow Fault Models

A number of fault models are based on the traversal of paths through the contol data flow graph(CDFG) representing the system behavior. In order to apply these fault models to a hardware-

software design, both hardware and software components must be converted into a CDFG

description. Applying these fault models to the CDFG representing a single process is a wellunderstood task. Existing CDFG fault models are restricted to the testing of single processes. The

earliest control-dataflow fault models include the branch coverage and path coverage [10]

models used in software testing.The branch coverage metric associates potential faults with each direction of each

conditional in the CDFG. The branch coverage metric has been used for behavioral validation for

coverage evaluation and test generation [11, 12]. The path coverage metric is a more demandingmetric than the branch coverage metric because path coverage reflects the number of control-

flow paths taken. The assumption is that an error is associated with some path through thecontrol flow graph and all control paths must be executed to guarantee fault detection.

Many CDFG fault models consider the requirements for fault activation without

explicitly considering fault effect observability. Researchers have developed observability-based

behavioral fault models [13, 14] to alleviate this weakness.

3.2.3 State Machine Fault Models

Finite state machines (FSMs) are the classic method of describing the behavior of a sequential

system and fault models have been defined to be applied to state machines. The commonly used

fault models are state coverage which requires that all states be reached, and transition coverage

which requires that all transitions be traversed. State machine transition tours, paths covering

each transition of the machine, are applied to microprocessor validation [15]. The most

significant problem with the use of state machine fault models is the complexity resulting fromthe state space size of typical systems. Several efforts have been made to alleviate this problem

by identifying a subset of the state machine which is critical for validation [16].


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


3.2.4 Application-Specific Fault Models

A fault model which is designed to be generally applicable to arbitrary design types may not be

as effective as a fault model which targets the behavioral features of a specific application. To

justify the cost of developing and evaluating an application-specific fault model, the market for the application must be very large and the fault modes of the application must be well

understood. For this reason, application-specific fault models are seen in microprocessor test and validation [17,18].

3.3 Interface Faults

To manage the high complexity of hardware-software design and covalidation, efforts have been

made to separate the behavior of each component from the communication architecture [19].

Interface covalidation becomes more significant with the onset of core-based designmethodologies which utilize pre-designed, pre-verified cores. Since each core component is pre-

verified, the system covalidation problem focuses on the interface between the components. A

case study of the interface-based covalidation of an image compression system has been

presented [20].

4. Testing of Embedded Core-Based System-on-Chips (SOCs)

The system-on-chip test is a single composite test comprised of the individual core tests of each

core, the UDL tests, and interconnect tests. Each individual core or UDL test may involve

surrounding components. Certain operational constraints (e.g., safe mode, low power mode, bypass mode) are often required which necessitates access and isolation modes.

In a core-based system-on-chip [5], the system integrator designs the User Defined Logic

(UDL) and assembles the pre-designed cores provided by the core vendor. A core is typically

hardware description of standard IC e.g., DSP, RISC processor, or DRAM core. Embedded coresrepresent intellectual property (IP) and in order to protect IP, core vendors do not release the

detailed structural information to the system integrator. Instead a set of test pattern is provided by

the core vendor that guarantees a specific fault coverage. Though the cores are tested as part of overall system performance by the system integrator, the system integrator deals the core as a

black box. These test patterns must be applied to the cores in a given order, using a specific clock

strategy.

The core internal test developed by a core provider need to be adequately described,

ported and ready for plug and play, i.e., for interoperability, with the system chip test. For an

internal test to accompany its corresponding core and be interoperable, it needs to be described inan commonly accepted, i.e., standard, format. Such a standard format is currently being

developed by IEEE PI 500 and referred to as standardization of a core test description language

[22].

In SOCs cores are often embedded in several layers of user-defined or other core-based

logic, and direct physical access to its peripheries is not available from chip I/Os. Hence, an

electronic access mechanism is needed. This access mechanism requires additional logic, such asa wrapper around the core and wiring, such as a test access mechanism to connect core

peripheries to the test sources and sinks. The wrapper performs switching between normal mode


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


and the test mode(s) and the wiring is meant to connect the wrapper which surrounds the core to

the test source and sink. The wrapper can also be utilized for core isolation. Typically, a coreneeds to be isolated from its surroundings in certain test modes. Core isolation is often required

on the input side, the output side, or both.

test accessmechnism

test accessmechnism

wrapper

sinksource

embeddedcore

Fig. 38. 2 Overview of the three elements in an embedded-core test approach: (1) testpattern source, (2) test access mechanism, and (3) core test wrapper [5].

A conceptual architecture for testing embedded-core-based SOCs is shown in Figure 38.2 It

consists of three structural elements:

1. Test Pattern Source and Sink

The test pattern source generates the test stimuli for the embedded core, and the test pattern sink compares the response(s) to the expected response(s). Test pattern source as well as sink can be

implemented either off-chip by external Automatic Test Equipment (ATE), on-chip by Built-In

Self-Test (or Embedded ATE), or as a combination of both. Source and sink do not need to be of the same type, e.g., the source of an embedded core can be implemented off-chip, while the sink

of the same core is implemented on-chip. The choice for a certain type of source or sink is

determined by (1) The type of circuitry in the core, (2) The type of pre-defined tests that come

with the core and (3) Quality and Cost considerations. The type of circuitry of a certain core and

the type of predefined tests that come with the core determine which implementation options areleft open for test pattern source and sink. The actual choice for a particular source or sink is in

general determined by quality and cost considerations. On-chip sources and sinks provide better accuracy and performance related defect coverage, but at the same time increase the silicon area

and hence might reduce manufacturing yield.

2. Test Access Mechanism

The test access mechanism takes care of on-chip test pattern transport. It can be used (1) to

transport test stimuli from the test pattern source to the core-under-test, and (2) to transport test

responses from the core-under-test to the test pattern sink. The test access mechanism is by

definition, implemented on-chip. Although for one core often the same type of' test accessmechanism is used for both stimulus as well as response transportation, this is not required and

various combinations may co-exist. Designing a test access mechanism involves making a trade-

off between the transport capacity (bandwidth) of the mechanism and the test application cost itinduces. The bandwidth is limited by the bandwidth of source and sink and the amount of silicon

area one wants to spend on the test access mechanism itself.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


3. Core Test Wrapper

The core test wrapper forms the interface between the embedded core and its system chip

environment. It connects the core terminals both to the rest of the IC, as well as to the test access

mechanism. By definition, the core test wrapper is implemented on-chip.The core test wrapper should have the following mandatory modes.

• Normal operation (i.e., non-test) mode of' the core. In this mode, the core is connected toits system-IC environment and the wrapper is transparent.

• Core test mode. In this mode the test access mechanism is connected to the core, such

that test stimuli can be applied at the core's inputs and responses can be observed at thecore's outputs.

• Interconnect test mode. In this mode the test access mechanism is connected to theinterconnect wiring and logic, such that test stimuli can be applied at the core's outputs

and responses can be observed at the core's inputs.

Apart from these mandatory modes, a core test wrapper might have several optional modes, e.g.,

a detach mode to disconnect the core from its system chip environment and the test access

mechanism, or a bypass mode for the test access mechanisms. Depending on the implementationof the test access mechanism, some of the above modes may coincide. For example, if the test

access mechanism uses existing functionality, normal operation and core test mode maycoincide.

Pre-designed cores have their own internal clock distribution system. Different cores

have different clock propagation delays, which might result in clock skew for inter-corecommunication. The system-IC designer should take care of this clock skew issue in the

functional communication between cores. However, clock skew might also corrupt the data

transfer over the test access mechanism, especially if this mechanism is shared by multiple cores.The core test wrapper is the best place to have provisions for clock skew prevention in the test

access paths between the cores.

In addition to the test integration and interdependence issues, the system chip composite

test requires adequate test scheduling. Effective test scheduling for SOCs is challenging because

it must address several conflicting goals: (1) total SOC testing time minimization, (2) power

dissipation, (3) precedence constraints among tests and (4) area overhead constraints [2]. Also,test scheduling is necessary to run intra-core and inter-core tests in certain order not to impact the

initialization and final contents of individual cores.

5. On-Line Testing

On-line testing addresses the detection of operational faults, and is found in computers thatsupport critical or high-availability applications [23]. The goal of on-line testing is to detect fault

effects, that is, errors, and take appropriate corrective action. On-line testing can be performed by

external or internal monitoring, using either hardware or software; internal monitoring is referred

to as self-testing. Monitoring is internal if it takes place on the same substrate as the circuit under test (CUT); nowadays, this usually means inside a single IC—a system-on-a-chip (SOC).

There are four primary parameters to consider in the design of an on-line testing scheme:


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


• Error coverage ( EC ): This is defined as the fraction of all modeled errors that are detected,

usually expressed in percent. Critical and highly available systems require very good error detection or error coverage to minimize the impact of errors that lead to system failure.

• Error latency ( EL): This is the difference between the first time the error is activated and the

first time it is detected. EL is affected by the time taken to perform a test and by how often tests

are executed. A related parameter is fault latency (FL), defined as the difference between the

onset of the fault and its detection. Clearly, FL ≥ EL, so when EL is difficult to determine, FL isoften used instead.

• Space redundancy (SR): This is the extra hardware or firmware needed to perform on-line

testing.

• Time redundancy (TR): This is the extra time needed to perform on-line testing.An ideal on-line testing scheme would have 100% error coverage, error latency of 1

clock cycle, no space redundancy, and no time redundancy. It would require no redesign of the

CUT, and impose no functional or structural restrictions on the CUT. To cover all of the faulttypes described earlier, two different modes of on-line testing are employed: concurrent testing

which takes place during normal system operation, and non-concurrent testing which takes place

while normal operation is temporarily suspended. These operating modes must often beoverlapped to provide a comprehensive on-line testing strategy at acceptable cost.

5.1 Non-concurrent testing

This form of testing is either event-triggered (sporadic) or time-triggered (periodic), and is

characterized by low space and time redundancy. Event-triggered testing is initiated by keyevents or state changes in the life of a system, such as start-up or shutdown, and its goal is to

detect permanent faults. It is usually advisable to detect and repair permanent faults as soon as

possible. Event-triggered tests resemble manufacturing tests.Time-triggered testing is activated at predetermined times in the operation of the system. It is

often done periodically to detect permanent faults using the same types of tests applied by eventtriggered testing. This approach is especially useful in systems that run for extended periods,where no significant events occur that can trigger testing. Periodic testing is also essential for

detecting intermittent faults. Periodic testing can identify latent design or manufacturing flaws

that only appear under the right environmental conditions.

5.2 Concurrent testing

Non-concurrent testing [23] cannot detect transient or intermittent faults whose effects disappear quickly. Concurrent testing, on the other hand, continuously checks for errors due to such faults.

However, concurrent testing is not by itself particularly useful for diagnosing the source of

errors, so it is often combined with diagnostic software. It may also be combined with non-concurrent testing to detect or diagnose complex faults of all types.

A common method of providing hardware support for concurrent testing, especially for detecting control errors, is a watchdog timer. This is a counter that must be reset by the system

on a repetitive basis to indicate that the system is functioning properly. A watchdog timer is based on the assumption that the system is fault-free—or at least alive—if it is able to perform

the simple task of resetting the timer at appropriate intervals, which implies that control flow is

correctly traversing timer reset points.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


For critical or highly available systems, it is essential to have a comprehensive approach

to on-line testing that covers all expected permanent, intermittent, and transient faults. In recentyears, built-in-self-test (BIST) has emerged as an important method for testing manufacturing

faults, and it is increasingly promoted for on-line testing as well.

6. Test Pattern Generation

6.1 Test Plan

Test plans are generated to verify the device specification, which comprise of the decision on test

type, fault coverage, test time etc. For example, the test pattern generator and response analyzer may reside on an automatic test equipment (ATE) or on-chip, depending on the test environment.

In the case of production testing in an industry, ATE may be the option, while on-site testing

may require on-chip testers (BIST).

6.2 Test Programming

The test program comprises modules for the generation of the test vectors and the corresponding

expected responses from a circuit with normal behavior. CAD tools are used to automate the

generation of optimized test vectors for the purpose [1,24]. Figure. 38.3 illustrates the basic steps

in the development of a test program.

Chip specifications Test generation Logic design

(from simulators)

Test plan Physical design

TestProgram

Generator

Test types

Test program

Timing specs Pin assignments

Vectors

Fig. 38.3 Test program generation

6.3 Test Pattern Generation

Test pattern generation is the process of generating a (minimal) set of input patterns to stimulate

the inputs of a circuit, such that detectable faults can be sensitized and their effects can be

propagated to the output. The process can be done in two phases: (1) derivation of a test, and (2)application of a test. For (1), appropriate models for the circuit (gate or transistor level) and

faults are to be decided. Construction of the test is to be accomplished in a manner such that the

output signal from a faulty circuit is different from that of a good circuit. This can be

computationally very expensive, but the task is to be performed offline and only once at the end of the design stage. The generation of a test set can be obtained either by algorithmic methods


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


(with or without heuristics), or by pseudo-random methods. On the other hand, for (2), a test is

subsequently applied many times to each integrated circuit and thus must be efficient both inspace (storage requirements for the patterns) and in time. The main considerations in evaluating

a test set are: (i) the time to construct a minimal test set; (ii) the size of the test set; (iii) the time

involved to carry out the test; and (iv) the equipment required (if external). Most algorithmic test pattern generators are based on the concept of sensitized paths.

The Sensitized Path Method is a heuristic approach to generating tests for generalcombinational logic networks. The circuit is assumed to have only a single fault in it. The

sensitized path method consists of two parts:

1. The creation of a SENSITIZED PATH from the fault to the primary output. This involves

assigning logic values to the gate inputs in the path from the fault site to a primary output, such

that the fault effect is propagated to the output.

2. The JUSTIFICATION operation, where the assignments made to gate inputs on the sensitized path is traced back to the primary inputs. This may require several backtracks and iterations.

In the case of sequential circuits the same logic is applied but before that the sequential elements

are explicitly driven to a required state using scan based design-for-test (DFT) circuitry [1,24].

The best-known algorithms are the D-algorithm, PODEM and FAN [1,24]. Three steps can be

identified in most automatic test pattern generation (ATPG) programs: (a) listing the signals on

the inputs of a gate controlling the line on which a fault should be detected; (b) determining the primary input conditions necessary to obtain these signals (back propagation) and sensitizing the

path to the primary outputs such that the signals and faults can be observed; (c) repeating this

procedure until all detectable faults in a given fault set have been covered.

6.4 ATPG for Hardware-Software Covalidation

Several automatic test generation (ATG) approaches have been developed which vary in the

class of search algorithm used, the fault model assumed, the search space technique used, and thedesign abstraction level used. In order to perform test generation for the entire system, both

hardware and software component behaviors must be described in a uniform manner. Although

many behavioral formats are possible, ATG approaches have focused on CDFG and FSM

behavioral models.

Two classes of search algorithms have been explored, fault directed and coverage

directed . Fault directed techniques successively target a specific fault and construct a testsequence to detect that fault. Each new test sequence is merged with the current test sequence

(typically through concatenation) and the resulting fault coverage is evaluated to determine if test

generation is complete. Fault directed algorithms have the advantage that they are complete in

the sense that a test sequence will be found for a fault if a test sequence exists, assuming thatsufficient CPU time is allowed. For test generation, each CDFG path can be associated with a set

of constraints which must be satisfied to traverse the path. Because the operations found in a

hardware-software description can be either boolean or arithmetic, the solution method chosenmust be able to handle both types of operations. Constraint logic programming (CLP) techniques

[27] are capable to handle a broad range of constraints including non-linear constraints on both

boolean and arithmetic variables. State machine testing has been accomplished by defining atransition tour which is a path which traverses each state machine transition at least once

26ransition tours have been generated by iteratively improving an existing partial tour by


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


concatenating on to it the shortest path to an uncovered transition [26 A significant limitation to

state machine test generation techniques is the time complexity of the state enumeration process performed during test generation.

Coverage directed algorithms seek to improve coverage without targeting any specificfault. These algorithms heuristically modify an existing test set to improve total coverage, and

then evaluate the fault coverage produced by the modified test set. If the modified test setcorresponds to an improvement in fault coverage then the modification is accepted. Otherwise

the modification is either rejected or another heuristic is used to determine the acceptability of the modification. The modification method is typically either random or directed random. An

example of such a technique is presented in [25] which uses a genetic algorithm to successively

improve the population of test sequences.

7. Embedded Software Testing

7.1 Software Unit Testing

The unit module is either an isolated function or a class. This is done by the development team,typically the developer and is done usually in the peer review mode. Test data /test cases are

developed based on the specification of the module. The test case consists of either:

• Data-intensive testing: applying a large range of data variation for function parameter values, or

• Scenario-based testing: exercising different method invocation sequences to perform all possible use cases as found in the requirements.

Points of Observation are returned value parameters, object property assessments, and source

code coverage. Since it is not easy to track down trivial errors in a complex embedded system,every effort should be made to locate and remove them at the unit-test level.

7.2 Software Integration Testing

All the unit modules are integrated together. Now the module to be tested is a set of functions or a cluster of classes. The essence of integration testing is the validation of the interface. The same

type of Points of Control applies as for unit testing (data-intensive main function call or method-

invocation sequences), while Points of Observation focus on interactions between lower-levelmodels using information flow diagrams.

First, performance tests can be run that should provide a good indication about the validity of the

architecture. As for functional testing, the earlier is the better. Each forthcoming step will theninclude performance testing. White-box testing is also the method used during that step.

Therefore software integration testing is the responsibility of the developer.

7.3 Software Validation Testing

This can be considered one of the activities that occur toward the end of each softwareintegration. Partial use-case instances, which also called partial scenarios, begin to drive the test

implementation. The test implementation is less aware of and influenced by the implementation

details of the module. Points of Observation include resource usage evaluation since the module


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


is a significant part of the overall system. This is considered as white-box testing. Therefore,

software validation testing is also the responsibility of the developer.

7.4 System Unit Testing

Now the module to be tested is a full system that consists of user code as tested during software

validation testing plus all real-time operating system (RTOS) and platform-related pieces such astasking mechanisms, communications, interrupts, and so on. The Point of Control protocol is no

longer a call to a function or a method invocation, but rather a message sent/received using theRTOS message queues, for example. Test scripts usually bring the module under test into the

desired initial state; then generate ordered sequences of samples of messages; and validatemessages received by comparing (1) message content against expected messages and (2) date of

reception against timing constraints. The test script is distributed and deployed over the various

virtual testers. System resources are monitored to assess the system's ability to sustain embedded system execution. For this aspect, grey-box testing is the preferred testing method. In most cases,

only a knowledge of the interface to the module is required to implement and execute

appropriate tests. Depending on the organization, system unit testing is either the responsibility

of the developer or of a dedicated system integration team.

7.5 System Integration Testing

The module to be tested starts from a set of components within a single node and eventually

encompasses all system nodes up to a set of distributed nodes. The Points of Control and

Observations (PCOs) are a mix of RTOS and network-related communication protocols, such asRTOS events and network messages. In addition to a component, a Virtual Tester can also play

the role of a node. As for software integration, the focus is on validating the various interfaces.

Grey-box testing is the preferred testing method. System integration testing is typically theresponsibility of the system integration team.

7.6 System Validation Testing

The module to be tested is now a complete implementation subsystem or the complete embedded system. The objectives of this final aspect are several:

• Meet external-actor functional requirements. Note that an external-actor might either be a

device in a telecom network (say if our embedded system is an Internet Router), or a person (if the system is a consumer device), or both (an Internet Router that can be

administered by an end user).

• Perform final non-functional testing such as load and robustness testing. Virtual testerscan be duplicated to simulate load, and be programmed to generate failures in the system.

• Ensure interoperability with other connected equipment . Check conformance toapplicable interconnection standards. Going into details for these objectives is not in the

scope of this article. Black-box testing is the preferred method: The tester typicallyconcentrates on both frequently used and potentially risky or dangerous use-case

instances.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


8. Interaction Testing Technique between Hardware andSoftware in Embedded Systems

In embedded system where hardware and software are combined, unexpected situation can occur owing to the interaction faults between hardware and software. As the functions of embedded

system get more complicated, it gets more difficult to detect faults that cause such troubles.

Hence, Faults Injection Technique is strongly recommended in a way it observes system behaviors by injecting faults into target system so as to detect interaction faults between

hardware and software in embedded system.

The test data selection technique discussed in [21] first simulates behaviors of embedded system to software program from requirement specification. Then hardware faults, after being

converted to software faults, are injected into the simulated program. And finally, effective test

data are selected to detect faults caused by the interactions between hardware and software.

9. Conclusion

Rapid advances in test development techniques are needed to reduce the test cost of million-gate

SOC devices. In this chapter a number of state-of-the-art techniques are discussed for testing of

embedded systems. Modular test techniques for digital, mixed-signal, and hierarchical SOCsmust develop further to keep pace with design complexity and integration density. The test data

bandwidth needs for analog cores are significantly different than that for digital cores, therefore

unified top-level testing of mixed-signal SOCs remains major challenge. This chapter alsodescribed granular based embedded software testing technique.

References

[1] M. L. Bushnell and V. D Agarwal, “Essentials of Electronic Testing” Kluwer academicPublishers, Norwell, MA, 2000.

[2] E. A. Lee, “What's Ahead for Embedded Software?”, IEEE Computer, pp 18-26,

September, 2000.

[3] E. A. Lee, “Computing for embedded systems”, proceeding of IEEE Instrumentation and Measurement Technology Conference, Budapest, Hungary, May, 2001.

[4] Semiconductor Industry Association, “International Technology Roadmap for

Semiconductors, 2001 Edition”, http://public.itrs.net/Files/2001ITRS/Home.html [5] Y. Zorian, E.J.Marinissen, and S.Dey, “Testing Embedded-Core Based System Chips”,

IEEE Computer, 32,52-60,1999

[6] M-C Hsueh, T. K.Tsai, and R. K. Lyer, “Fault Injection Techniques and Tools”, IEEE

Computer, pp75-82, April,1997.[7] V. Encontre, “Testing Embedded Systems: Do You Have The GuTs for It?” www-

128.ibm.com/developerworks/rational/library/content/03July/1000/1050/1050.pdf

[8] D. D. Gajski and F. Vahid, “Specification and design of embedded hardware-softwaresystems”, IEEE Design and Test of Computers, vol. 12, pp. 53–67, 1995.

[9] S. Dey, A. Raghunathan, and K. D. Wagner, “Design for testability techniques at the

behavioral and register-transfer level”, Journal of Electronic Testing: Theory and Applications (JETTA), vol. 13, pp. 79–91, October 1998.

[10] B. Beizer, Software Testing Techniques, Second Edition, Van Nostrand Reinhold, 1990.


w.jntuworld.com

www.jntuworld.com

www.jw

http://public.itrs.net/Files/2001ITRS/Home.html

http://public.itrs.net/Files/2001ITRS/Home.html

7/30/2019 ES-Module_8


[11] G. Al Hayek and C. Robach, “From specification validation to hardware testing: A

unified method”, in International Test Conference, pp. 885–893, October 1996.[12] A. von Mayrhauser, T. Chen, J. Kok, C. Anderson, A. Read, and A. Hajjar, “On

choosing test criteria for behavioral level harware design verification”, in High Level

Design Validation and Test Workshop, pp. 124–130, 2000.[13] L. A. Clarke, A. Podgurski, D. J. Richardson, and S. J. Zeil, “A formal evaluation of data

flow path selection criteria”, IEEE Trans. on Software Engineering, vol. SE-15, pp.1318–1332, 1989.

[14] S. C. Ntafos, “A comparison of some structural testing strategies”, IEEE Trans. onSoftware Engineering, vol. SE-14, pp. 868–874, 1988.

[15] J. Laski and B. Korel, “A data flow oriented program testing strategy”, IEEE Trans. on

Software Engineering, vol. SE-9, pp. 33–43, 1983.[16] Q. Zhang and I. G. Harris, “A domain coverage metric for the validation of behavioral

vhdl descriptions”, in International Test Conference, October 2000.

[17] D. Moundanos, J. A. Abraham, and Y. V. Hoskote, “Abstraction techniques for validation coverage analysis and test generation”, IEEE Transactions on Computers, vol.

47, pp. 2–14, January 1998.

[18] N. Malik, S. Roberts, A. Pita, and R. Dobson, “Automaton: an autonomous coverage- based multiprocessor system verification environment”, in IEEE International Workshop

on Rapid System Prototyping, pp. 168–172, June 1997.

[19] K.-T. Cheng and A. S. Krishnakumar, “Automatic functional test bench generation using

the extended finite state machine model”, in Design Automation Conference, pp. 1–6,1993.

[20] J. P. Bergmann and M. A. Horowitz, “Improving coverage analysis and test generation

for large designs”, in International Conference on Computer-Aided Design, pp. 580–583,1999.

[21] A. Sung and B. Choi, “An Interaction Testing Technique between Hardware and Software in Embedded Systems”, Proceedings of Ninth Asia-Pacific Software

Engineering Conference, 2002. 4-6 Dec. 2002 Page(s):457 – 464

[22] IEEE P I500 Web Site. http://grouper.ieee.org/groups/I SOO/.[23] H. Al-Asaad, B. T. Murray, and J. P. Hayes, “Online BIST for embedded systems” IEEE

Design & Test of Computers, Volume 15, Issue 4, Oct.-Dec. 1998 Page(s): 17 – 24

[24] M. Abramovici, M.A. Breuer, AND A.D. Friedman, “Digital Systems Testing and Testable Design”, IEEE Press 1990.

[25] F. Corno, M. Sonze Reorda, G. Squillero, A. Manzone, and A. Pincetti, “Automatic test

bench generation for validation of RT-level descriptions: an industrial experience”, in

Design Automation and Test in Europe, pp. 385–389, 2000.[26] R. C. Ho, C. H. Yang, M. A. Horowitz, and D. L. Dill, “Architecture validation for

processors”, in International ymposium on Computer Architecture, pp. 404–413, 1995.

[27] P. Van Hentenryck, Constraint Satisfaction in Logic Programming, MIT Press, 1989.

Problems

1. How testing differs from verification?

2. What is embedded system? Define hard real-time system and soft real-time systemwith example.

3. Why testing embedded system is difficult?

4. How hardware testing differs from software testing?


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


5. What is co-testing?

6. Distinguish between defects, errors and faults with example.7. Calculate the total number of single and multiple stuck-at faults for a logic circuit

with n lines.

8. In the circuit shown in Figure 38.4 if any of the following tests detect the fault x1 s-

a-0?

a) (0,1,1,1)

b) (1,0,1,1)

c) (1,1,0,1)

d) (1,0,1,0)

z

x1

x2

x3

x4

Fig. P1

9. Define the following fault models using examples where possible:

a)

Single and multiple stuck-at fault b) Bridging fault

c) Stuck-open and stuck-short fault

d) Operational fault10. What is meant by co-validation fault model?

11. Describe different software fault model?

12. Describe the basic structure of core-based testing approach for embedded system.

13. What is concurrent or on-line testing? How it differs from non-concurrent testing?14. Define error coverage, error latency, space redundancy and time redundancy in view

of on-line testing?

15. What is a test vector? How test vectors are generated? Describe different techniquesfor test pattern generation.

16. Define the following for software testing:

a) Software unit testing b) Software integration testing

c) Software validation testing

d) System unit testinge) System integration testing

f) System validation testing


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Module



w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Lesson39

Design for Testability Version 2 EE IIT, Kharagpur 2

w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8




• Explain the meaning of the term ‘Design for Testability’ (DFT)

• Describe some adhoc and some formal methods of incorporating DFT in a system leveldesign

• Explain the scan-chain based method of DFT

• Highlight the advantages and disadvantages of scan-based designs and discussalternatives

Design for Testability

1. Introduction

The embedded system is an information processing system that consists of hardware and software components. Nowadays, the number of embedded computing systems in areas such astelecommunications, automotive electronics, office automation, and military applications aresteadily growing. This market expansion arises from greater memory densities as well asimprovements in embeddable processor cores, intellectual-property modules, and sensingtechnologies. At the same time, these improvements have increased the amount of softwareneeded to manage the hardware components, leading to a higher level of system complexity.Designers can no longer develop high-performance systems from scratch but must usesophisticated system modeling tools.

The increased complexity of embedded systems and the reduced access to internal nodes hasmade it not only more difficult to diagnose and locate faulty components, but also the functionsof embedded components may be difficult to measure. Creating testable designs is key todeveloping complex hardware and/or software systems that function reliably throughout their operational life. Testability can be defined with respect to a fault. A fault is testable if thereexists a well-specified procedure (e.g., test pattern generation, evaluation, and application) toexpose it, and the procedure is implementable with a reasonable cost using current technologies.Testability of the fault therefore represents the inverse of the cost in detecting the fault. A circuitis testable with respect to a fault set when each and every fault in this set is testable.

Design-for-testability techniques improve the controllability and observability of internal nodes,so that embedded functions can be tested. Two basic properties determine the testability of anode: 1) controllability, which is a measure of the difficulty of setting internal circuit nodes to 0

or 1 by assigning values to primary inputs (PIs), and 2) observability, which is a measure of thedifficulty of propagating a node’s value to a primary output (PO) [1-3]. A node is said to betestable if it is easily controlled and observed. For sequential circuits, some have added predictability, which represents the ability to obtain known output values in response to giveninput stimuli. The factors affecting predictability include initializability, races, hazards,oscillations, etc. DFT techniques include analog test busses and scan methods. Testability canalso be improved with BIST circuitry, where signal generators and analysis circuitry areimplemented on chip [1, 3-4]. Without testability, design flaws may escape detection until a


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


product is in the hands of users; equally, operational failures may prove difficult to detect and diagnose.

Increased embedded system complexity makes thorough assessment of system integrity bytesting external black-box behavior almost impossible. System complexity also complicates testequipment and procedures. Design for testability should increase a system’s testability, resultingin improved quality while reducing time to market and test costs.

Traditionally, hardware designers and test engineers have focused on proving the correctmanufacture of a design and on locating and repairing field failures. They have developed several highly structured and effective solutions to this problem, including scan design and self test. Design verification has been a less formal task, based on the designer’s skills. However,designers have found that structured design-for-test features aiding manufacture and repair cansignificantly simplify design verification. These features reduce verification cycles from weeksto days in some cases.

In contrast, software designers and test engineers have targeted design validation and verification. Unlike hardware, software does not break during field use. Design errors, rather than incorrect replication or wear out, cause operational bugs. Efforts have focused on improving

specifications and programming styles rather than on adding explicit test facilities. For example,modular design, structured programming, formal specification, and object orientation have all proven effective in simplifying test.

Although these different approaches are effective when we can cleanly separate a design’shardware and software parts, problems arise when boundaries blur. For example, in the earlydesign stages of a complex system, we must define system level test strategies. Yet, we may nothave decided which parts to implement in hardware and which in software. In other cases,software running on general-purpose hardware may initially deliver certain functions that wesubsequently move to firmware or hardware to improve performance. Designers must ensure atestable, finished design regardless of implementation decisions. Supporting hardware-softwarecodesign’ requires “cotesting” techniques, which draw hardware and software test techniques

together into a cohesive whole.

2. Design for Testability Techniques

Design for testability (DFT) refers to those design techniques that make the task of subsequenttesting easier. There is definitely no single methodology that solves all embedded system-testing problems. There also is no single DFT technique, which is effective for all kinds of circuits. DFTtechniques can largely be divided into two categories, i.e., ad hoc techniques and structured (systematic) techniques.DFT methods for digital circuits:

Ad-hoc methods Structured methods:

• Scan • Partial Scan • Built-in self-test (discussed in Lesson 34) • Boundary scan (discussed in Lesson 34)


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


7/30/2019 ES-Module_8


Monostables and self-resetting logic should be avoided. A monostable (one-shot)multivibrator produces a pulse of constant duration in response to the rising or fallingtransition of the trigger input. Its pulse duration is usually controlled externally by aresistor and a capacitor (with current technology, they also can be integrated on chip).One-shots are used mainly for 1) pulse shaping, 2) switch-on delays, 3) switch-off delays,4) signal delays. Since it is not controlled by clocks, synchronization and precise duration

control are very difficult, which in turn reduces testability by ATE. Counters and dividersare better candidates for delay control.

Redundant gates must be avoided.

High fanin/fanout combinations must be avoided as large fan-in makes the inputs of thegate difficult to observe and makes the gate output difficult to control.

Gated clocks should be avoided. These degrade the controllability of circuit nodes.

The above guidelines are from experienced practitioners. These are not complete or universal. Infact, there are drawbacks for these methods:

There is a lack of experts and tools.

Test generation is often manual

This method cannot guarantee for high fault coverage.

It may increase design iterations.

This is not suitable for large circuits

2.2 Scan Design Approaches for DFT

2.2.1 Objectives of Scan Design

Scan design is implemented to provide controllability and observability of internal statevariables for testing a circuit.

It is also effective for circuit partitioning.

A scan design with full controllability and observability turns the sequential test probleminto a combinational one.

2.2.2 Scan Design Requirements

Circuit is designed using pre-specified design rules.

Test structure (hardware) is added to the verified design.

• One (or more) test control (TC) pin at the primary input is required.

• Flip-flops are replaced by scan flip-flops (SFF) and are connected so that they behave as a shift register in the test mode. The output of one SFF is connected tothe input of next SFF. The input of the first flip-flop in the chain is directlyconnected to an input pin (denoted as SCANIn), and the output of the last flip-flop is directly connected to an output pin (denoted as SCANOUT). In this way,all the flip-flops can be loaded with a known value, and their value can be easily


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


accessed by shifting out the chain. Figure 39.1 shows a typical circuit after thescan insertion operation.

• Input/output of each scan shift register must be available on PI/PO.

Combinational ATPG is used to obtain tests for all testable faults in the combinationallogic.

Shift register tests are applied and ATPG tests are converted into scan sequences for usein manufacturing test.

Combinational

Logic

Primary

Inputs

Primary

Outputs

SFF

SFF

SFF

T C

SCANIN

SCANOUT

CLK

Fig. 39.1 Scan structure to a design

Fig. 39.1 shows a scan structure connected to design. The scan flip-flips (FFs) must beinterconnected in a particular way. This approach effectively turns the sequential testing problem

into a combinational one and can be fully tested by compact ATPG patterns. Unfortunately, thereare two types of overheads associated with this technique that the designers care about verymuch. These are the hardware overhead (including three extra pins, multiplexers for all FFs, and extra routing area) and performance overhead (including multiplexer delay and FF delay due toextra load).

2.2.3 Scan Design Rules

Only clocked D-type master-slave flip-flops for all state variables should be used.

At least one PI pin must be available for test. It is better if more pins are available.

All clock inputs to flip-flops must be controlled from primary inputs (PIs). There will beno gated clock. This is necessary for FFs to function as a scan register.

Clocks must not feed data inputs of flip-flops. A violation of this can lead to a racecondition in the normal mode.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


2.2.4 Scan Overheads

The use of scan design produces two types of overheads. These are area overhead and performance overhead. The scan hardware requires extra area and slows down the signals.

IO pin overhead: At least one primary pin necessary for test.

Area overhead: Gate overhead = [4 nsff /(ng+10nff )] x 100%, where ng = number of

combinational gates; nff = number of flip-flops; nsff = number of scan flip-flops; For full

scan number of scan flip-flops is equal to the number of original circuit flip-flops.Example: ng = 100k gates, nff = 2k flip-flops, overhead = 6.7%. For more accurate

estimation scan wiring and layout area must be taken into consideration. Performance overhead: The multiplexer of the scan flip-flop adds two gate-delays in

combinational path. Fanouts of the flip-flops also increased by 1, which can increase theclock period.

2.3 Scan Variations

There have been many variations of scan as listed below, few of these are discussed here.

MUXed Scan

Scan path

Scan-Hold Flip-Flop

Serial scan

Level-Sensitive Scan Design (LSSD)

Scan set Random access scan

2.3.1 MUX Scan

It was invented at Stanford in 1973 by M. Williams & Angell.

In this approach a MUX is inserted in front of each FF to be placed in the scan chain.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


7/30/2019 ES-Module_8


C1

DI

L2

DO

L1

SO

SIC2

Fig. 39.3 Logic diagram of the two-port raceless D-FF

This approach gives a lower hardware overhead (due to dense layout) and less performance penalty (due to the removal of the MUX in front of the FF) compared to theMUX Scan Approach. The real figures however depend on the circuit style and technology selected, and on the physical implementation.

2.3.3 Level-Sensitive Scan Design (LSSD)

This approach was introduced by Eichelberger and T. Williams in 1977 and 1978.

It is a latch-based design used at IBM.

It guarantees race-free and hazard-free system operation as well as testing.

It is insensitive to component timing variations such as rise time, fall time, and delay. It is

faster and has a lower hardware complexity than SR modification.

It uses two latches (one for normal operation and one for scan) and three clocks.Furthermore, to enjoy the luxury of race-free and hazard-free system operation and test,the designer has to follow a set of complicated design rules.

A logic circuit is level sensitive (LS) iff the steady state response to any allowed inputchange is independent of the delays within the circuit. Also, the response is independentof the order in which the inputs change


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


L +LD

CL

DC

DC +L

LL

0

1

0 00 1

1 0

1 1

Fig. 39.4 A polarity-hold latch

A

B

SI

C

DI +L1

+L2

+L1

+L2B

ASICDI

L1

L2

Fig. 39.5 The polarity-hold shift-register latch (SRL)

LSSD requires that the circuit be LS, so we need LS memory elements as defined above. Figure39.4 shows an LS polarity-hold latch. The correct change of the latch output ( L) is not dependenton the rise/fall time of C , but only on C being `1' for a period of time greater than or equal to data propagation and stabilization time. Figure 39.5 shows the polarity-hold shift-register latch (SRL)used in LSSD as the scan cell.

The scan cell is controlled in the following way:

• Normal mode: A= B=0, C =0→ 1.

• SR (test) mode: C =0, AB=10→ 01 to shift SI through L1

and L2.

Advantages of LSSD

1. Correct operation independent of AC characteristics is guaranteed.2. FSM is reduced to combinational logic as far as testing is concerned.3. Hazards and races are eliminated, which simplifies test generation and fault simulation.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Drawbacks of LSSD

1. Complex design rules are imposed on designers. There is no freedom to vary from theoverall schemes. It increases the design complexity and hardware costs (4-20% morehardware and 4 extra pins).

2. Asynchronous designs are not allowed in this approach.

3. Sequential routing of latches can introduce irregular structures.

4. Faults changing combinational function to sequential one may cause trouble, e.g., bridgingand CMOS stuck-open faults.

5. Test application becomes a slow process, and normal-speed testing of the entire testsequence is impossible.

6. It is not good for memory intensive designs.

2.3.4 Random Access Scan

This approach was developed by Fujitsu and was used by Fujitsu, Amdahl, and TI.

It uses an address decoder. By using address decoder we can select a particular FF and either set it to any desired value or read out its value. Figure 39.6 shows a random accessstructure and Figure 39.7 shows the RAM cell [1,6-7].

Combinational

LogicRAM

nff bite

Address

Decoder

Address

SCANIN

TC

CK

SCANOUT

PI

PO

Select

Log2 nff bites

Fig. 39.6 The Random Access structure


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Scan flip-flop

SF

Q To comb.

logic

D

SD

From comb. logic

SCANIN

TC

CK

SESCAN

OUT

Fig. 39.7 The RAM cell

The difference between this approach and the previous ones is that the state vector cannow be accessed in a random sequence. Since neighboring patterns can be arranged sothat they differ in only a few bits, and only a few response bits need to be observed, thetest application time can be reduced.

In this approach test length is reduced.

This approach provides the ability to `watch' a node in normal operation mode, which isimpossible with previous scan methods.

This is suitable for delay and embedded memory testing. The major disadvantage of the approach is high hardware overhead due to address

decoder, gates added to SFF, address register, extra pins and routing

2.3.5 Scan-Hold Flip-Flop

Special type of scan flip-flop with an additional latch designed for low power testingapplication.

It was proposed by DasGupta et al [5]. Figure 39.8 shows a hold latch cascaded with theSFF.

The control input HOLD keeps the output steady at previous state of flip-flop.

For HOLD = 0, the latch holds its state and for HOLD = 1, the hold latch becomestransparent.

For normal mode operation, TC = HOLD =1 and for scan mode, TC = 1 and Hold = 0.

Hardware overhead increases by about 30% due to extra hardware the hold latch.

This approach reduces power dissipation and isolate asynchronous part during scan.

It is suitable for delay test [8].


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


SFF

D

S

T

CK

HO

Q

Q

To SD of

next SHFF

Fig. 39.8 Scan-hold flip-flop (SHFF)

Partial Scan Design

In this approach only a subset of flip-flops is scanned. The main objectives of thisapproach are to minimize the area overhead and scan sequence length. It would be possible to achieve required fault coverage

In this approach sequential ATPG is used to generate test patterns. Sequential ATPG hasnumber of difficulties such as poor initializability, poor controllability and observability

of the state variables etc. Number of gates, number of FFs and sequential depth give littleidea regarding testability and presence of cycles makes testing difficult. Thereforesequential circuit must be simplified in such a way so that test generation becomes easier.

Removal of selected flip-flops from scan improves performance and allows limited scandesign rule violations.

It also allows automation in scan flip-flop selection and test generation

Figure 39.9 shows a design using partial scan architecture [1].

Sequential depth is calculated as the maximum number of FFs encountered from PI lineto PO line.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


7/30/2019 ES-Module_8


References

[1] M. L. Bushnell and V. D Agarwal, “Essentials of Electronic Testing” Kluwer academicPublishers, Norwell, MA, 2000.

[2] M. Abramovici, M.A. Breuer, and A.D. Friedman, “Digital Systems Testing and TestableDesign”, IEEE Press 1990.

[3] V.D. Agrawal, C.R. Kime, and K.K. Saluja, “ATutorial on Built-In Self-Test, Part 1:Principles,” IEEE Design and Test of Computers,Vol. 10, No. 1, Mar. 1993, pp. 73-82.

[4] V.D. Agrawal, C.R. Kime, and K.K. Saluja, “ATutorial on Built-In Self-Test, Part 2:Applications,” IEEE Design and Test of Computers, Vol. 10, No. 2, June 1993, pp. 69-77.

[5] S. DasGupta, R. G. Walther, and T. W. Williams, “ An Enhencement to LSSD and SomeApplications of LSSD in Reliability,” in Proc. Of the International Fault-TolerantComputing Symposium.

[6] B. R. Wilkins, Testing Digital Circuits, An Introduction, Berkshire, UK: Van Nostrand Reinhold, 1986[RAM].

[7] T.W.Williams, editor, VLSI Testing. Amsterdam, The Netherlands: North-Holand, 1986[RAM].

[8] A.Krstic and K-T. Cheng, Delay Fault Testing for VLSI Circuits. Boston: Kluwer Academic Publishers, 1998.

Review Questions

1. What is Design-for-Testability (DFT)? What are the different kinds of DFT techniquesused for digital circuit testing?

2. What are the things that must be followed for ad-hoc testing? Describe drawbacks of ad-hoc testing.

3. Describe a full scan structure implemented in a digital design. What are the scanoverheads?

4. Suppose that your chip has 100,000 gates and 2,000 flip-flops. A combinational ATPG produced 500 vectors to fully test the logic. A single scan-chain design will require about106 clock cycles for testing. Find the scan test length if 10 scan chains are implemented.Given that the circuit has 10 PIs and 10 POs, and only one extra pin can be added for test,how much more gate overhead will be needed for the new design?

5. For a circuit with 100000 gates and 2000 flip-flops connected in a single chain, what will be the gate overhead for a scan design where scan-hold flip-flops are used?

6. Calculate the syndromes for the carry and sum outputs of a full adder cell. Determine

whether there is any single stuck fault on any input for which one of the outputs issyndrome-untestable. If there is, suggest an implementation possibly with added inputs,which makes the cell syndrome-testable.

7. Describe the operation of a level-sensitive scan design implemented in a digital design.What are design rules to be followed to make the design race-free and hazard-free? Whatare the advantages and disadvantages of LSSD?


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


8. Consider the random-access scan architecture. How would you organize the test data tominimize the total test time? Describe a simple heuristic for ordering these data.

9. Make a comparison of different scan variations in terms of scan overhead.

10. Consider the combinational circuit below which has been portioned into 3 cones (two

CONE X’s and one CONE Y) and one Exclusive-OR gate.

CONE X

CONE X

CONE Y

A

B

C

D

E

G

H

J

K

F

For those two cones, we have the following information.

• CONE X has a structure which can be tested 100% by using the following 4 vectors and its output is also specified.

A / G B / H C / F OUTPUT

0 0 1 0

0 1 1 01 1 0 1

1 0 0 1

• CONE Y has a structure which can be tested 100% by using the following 4 vectors and its output is also specified.

C D E OUTPUT

0 0 1 0

0 1 0 1

1 0 1 1

1 1 1 0

Derive a smallest test set to test this circuit so that each partition is applied the required 4test vectors. Also, the XOR gate should be exhaustively tested.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Fill in the blank entries below. (You may not add additional vectors).

A B C D E F G H J K

0 0 1 1 0

0 1 1 0

1 1 0 1 1

1 0 0 1


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Module



w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Lesson

40Built-In-Self-Test (BIST)for Embedded Systems


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8




• Explain the meaning of the term ‘Built-in Self-Test (BIST)’

• Identify the main components of BIST functionality

• Describe the various methods of test pattern generation for designing embedded systems

with BIST

• Define what is a Signature Analysis Register and describe some methods to designing

such units

• Explain what is a Built-in Logic Block Observer (BILBO) and describe how to use this block for designing BIST

Built-In-Self-Test (BIST) for Embedded Systems

1. Introduction

BIST is a design-for-testability technique that places the testing functions physically with the

circuit under test (CUT), as illustrated in Figure 40.1 [1]. The basic BIST architecture requires

the addition of three hardware blocks to a digital circuit: a test pattern generator, a responseanalyzer, and a test controller. The test pattern generator generates the test patterns for the CUT.

Examples of pattern generators are a ROM with stored patterns, a counter, and a linear feedback

shift register (LFSR). A typical response analyzer is a comparator with stored responses or anLFSR used as a signature analyzer. It compacts and analyzes the test responses to determine

correctness of the CUT. A test control block is necessary to activate the test and analyze the

responses. However, in general, several test-related functions can be executed through a test

controller circuit.

Hard warepattern generator

M

U

X

CUT

Test Controller

ROM

OutputResponse

Compactor

Comparator

Test

PO

SignatureGood/Faulty

ReferenceSignature

Fig. 40.1 A Typical BIST Architecture

As shown in Figure 40.1, the wires from primary inputs (PIs) to MUX and wires from circuitoutput to primary outputs (POs) cannot be tested by BIST. In normal operation, the CUT

receives its inputs from other modules and performs the function for which it was designed.

During test mode, a test pattern generator circuit applies a sequence of test patterns to the CUT,


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


and the test responses are evaluated by a output response compactor. In the most common type

of BIST, test responses are compacted in output response compactor to form (fault) signatures.The response signatures are compared with reference golden signatures generated or stored on-

chip, and the error signal indicates whether chip is good or faulty.

Four primary parameters must be considered in developing a BIST methodology for embeddedsystems; these correspond with the design parameters for on-line testing techniques discussed in

earlier chapter [2].

Fault coverage: This is the fraction of faults of interest that can be exposed by the test

patterns produced by pattern generator and detected by output response monitor. In

presence of input bit stream errors there is a chance that the computed signature matches

the golden signature, and the circuit is reported as fault free. This undesirable property iscalled masking or aliasing.

Test set size: This is the number of test patterns produced by the test generator, and is

closely linked to fault coverage: generally, large test sets imply high fault coverage. Hardware overhead : The extra hardware required for BIST is considered to be overhead.

In most embedded systems, high hardware overhead is not acceptable.

Performance overhead : This refers to the impact of BIST hardware on normal circuit performance such as its worst-case (critical) path delays. Overhead of this type is

sometimes more important than hardware overhead.

Issues for BIST

Area Overhead: Additional active area due to test controller, pattern generator, response

evaluator and testing of BIST hardware.

Pin Overhead: At least 1 additional pin is needed to activate BIST operation. Input MUX

adds extra pin overheads. Performance overhead: Extra path delays are added due to BIST.

Yield loss increases due to increased chip area. Design effort and time increases due to design BIST. The BIST hardware complexity increases when the BIST hardware is made testable.

Benefits of BIST

It reduces testing and maintenance cost, as it requires simpler and less expensive ATE.

BIST significantly reduces cost of automatic test pattern generation (ATPG). It reduces storage and maintenance of test patterns. It can test many units in parallel.

It takes shorter test application times.

It can test at functional system speed.

BIST can be used for non-concurrent, on-line testing of the logic and memory parts of a system[2]. It can readily be configured for event-triggered testing, in which case, the BIST control can

be tied to the system reset so that testing occurs during system start-up or shutdown. BIST canalso be designed for periodic testing with low fault latency. This requires incorporating a testing

process into the CUT that guarantees the detection of all target faults within a fixed time.

On-line BIST is usually implemented with the twin goals of complete fault coverage and low

fault latency. Hence, the test generation (TG) and response monitor (RM) are generally designed


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


to guarantee coverage of specific fault models, minimum hardware overhead, and reasonable set

size. These goals are met by different techniques in different parts of the system.

TG and RM are often implemented by simple, counter-like circuits, especially linear-feedback

shift registers (LFSRs) [3]. The LFSR is simply a shift register formed from standard flip-flops,

with the outputs of selected flip-flops being fed back (modulo-2) to the shift register’s inputs.

When used as a TG, an LFSR is set to cycle rapidly through a large number of its states. These

states, whose choice and order depend on the design parameters of the LFSR, define the test patterns. In this mode of operation, an LFSR is seen as a source of (pseudo) random tests that

are, in principle, applicable to any fault and circuit types. An LFSR can also serve as an RM bycounting (in a special sense) the responses produced by the tests. An LFSR RM’s final contents

after applying a sequence of test responses forms a fault signature, which can be compared to a

known or generated good signature, to see if a fault is present. Ensuring that the fault coverage issufficiently high and the number of tests is sufficiently low are the main problems with random

BIST methods. Two general approaches have been proposed to preserve the cost advantages of

LFSRs while making the generated test sequence much shorter. Test points can be inserted in the

CUT to improve controllability and observability; however, they can also result in performanceloss. Alternatively, some determinism can be introduced into the generated test sequence, for

example, by inserting specific “seed” tests that are known to detect hard faults.

A typical BIST architecture using LFSR is shown in Figure 40.2 [4]. Since the output patterns of

the LFSR are time-shifted and repeated, they become correlated; this reduces the effectiveness of the fault detection. Therefore a phase shifter (a network of XOR gates) is often used to

decorrelate the output patterns of the LFSR. The response of the CUT is usually compacted by a

multiple input shift register (MISR) to a small signature, which is compared with a known fault-free signature to determine whether the CUT is faulty.

LFSR Phaseshifter

Scan chain 1 (/bits)

Scan chain 2 (/bits)

Scan chain n (/bits)

MISR..

.

.

.

.

Fig. 40.2 A generic BIST architecture based on an LFSR, an MISR, and a phase shifter

2. BIST Test Pattern Generation Techniques

2.1 Stored patterns

An automatic test pattern generation (ATPG) and fault simulation technique is used to generate

the test patterns. A good test pattern set is stored in a ROM on the chip. When BIST is activated,test patterns are applied to the CUT and the responses are compared with the corresponding

stored patterns. Although stored-pattern BIST can provide excellent fault coverage, it has limited

applicability due to its high area overhead.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


2.2 Exhaustive patterns

Exhaustive pattern BIST eliminates the test generation process and has very high fault coverage.

To test an n-input block of combinational logic, it applies all possible 2n-input patterns to the

block. Even with high clock speeds, the time required to apply the patterns may make exhaustive pattern BIST impractical for a circuit with n>20.

Clock

ResetQ1 Q2 Q3

DQ1 DQ2 DQ3

Fig. 40.3 Exhaustive pattern generator

2.3 Pseudo-exhaustive patterns

In pseudo-exhaustive pattern generation, the circuit is partitioned into several smaller sub-

circuits based on the output cones of influence, possibly overlapping blocks with fewer than n

inputs. Then all possible test patterns are exhaustively applied to each sub-circuit. The main goal

of pseudo-exhaustive test is to obtain the same fault coverage as the exhaustive testing and, at the

same time, minimize the testing time. Since close to 100% fault coverage is guaranteed, there isno need for fault simulation for exhaustive testing and pseudo-exhaustive testing. However,

such a method requires extra design effort to partition the circuits into pseudo-exhaustive testablesub-circuits. Moreover, the delivery of test patterns and test responses is also a major consideration. The added hardware may also increase the overhead and decrease the

performance.

Five-BitBinary

Counter1

2

3

6h

X1

X2

X3

Five-BitBinary

Counter2

4

5

7f

X6

X7

X8

1

X4

X50 for Counter 1

1 for Counter 2

2-Bit2-1

MUX

Fig. 40.4 Pseudo-exhaustive pattern generator


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Circuit partitioning for pseudo-exhaustive pattern generation can be done by cone segmentation

as shown in Figure 40.4. Here, a cone is defined as the fan-ins of an output pin. If the size of thelargest cone in K, the patterns must have the property to guarantee that the patterns applied to

any K inputs must contain all possible combinations. In Figure 40.4, the total circuit is divided

into two cones based on the cones of influence. For cone 1 the PO h is influenced by X1, X2, X3,X4 and X5 while PO f is influenced by inputs X4, X5, X6, X7 and X8. Therefore the total test

pattern needed for exhaustive testing of cone 1 and cone 2 is (2

5

+2

5

) = 64. But the originalcircuit with 8 inputs requires 28= 256 test patterns exhaustive test.

2.4 Pseudo-Random Pattern Generation

A string of 0’s and 1’s is called a pseudo-random binary sequence when the bits appear to be

random in the local sense, but they are in someway repeatable. The linear feedback shift register

(LFSR) pattern generator is most commonly used for pseudo-random pattern generation. Ingeneral, this requires more patterns than deterministic ATPG, but less than the exhaustive test. In

contrast with other methods, pseudo-random pattern BIST may require a long test time and

necessitate evaluation of fault coverage by fault simulation. This pattern type, however, has the

potential for lower hardware and performance overheads and less design effort than the preceding methods. In pseudorandom test patterns, each bit has an approximately equal

probability of being a 0 or a 1. The number of patterns applied is typically of the order of 103

to10

7and is related to the circuit's testability and the fault coverage required.

Linear feedback shift register reseeding [5] is an example of a BIST technique that is based on

controlling the LFSR state. LFSR reseeding may be static, that is LFSR stops generating patterns

while loading seeds, or dynamic, that is, test generation and seed loading can proceedsimultaneously. The length of the seed can be either equal to the size of the LFSR (full

reseeding) or less than the LFSR (partial reseeding). In [5], a dynamic reseeding technique that

allows partial reseeding is proposed to encode test vectors. A set of linear equations is solved toobtain the seeds, and test vectors are ordered to facilitate the solution of this set of linear

equations.

D FF D FF D FF D FF

Xn-1 Xn-2 X1 X0

hn-1 hn-2 h2 h1

Fig. 40.5 Standard Linear Feedback Shift Register

Figure 40.5 shows a standard, external exclusive-OR linear feedback shift register. There are n

flip-flops (Xn-1,……X0) and this is called n-stage LFSR. It can be a near-exhaustive test pattern

generator as it cycles through 2n-1 states excluding all 0 states. This is known as a maximal

length LFSR. Figure 40.6 shows the implementation of a n-stage LFSR with actual digital

circuit. [1]


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


7/30/2019 ES-Module_8


Inversion

D QX7

D D D D D D DQ Q Q Q Q Q QX6 X5 X4 X3 X2 X1 X0

Weight W1 select W2

1/16 1/8 1/4 1/21 of 4 MUX

Fig. 40.7 Weighted pseudo-random pattern generator

LFSR

1/8 3/4 1/2 7/8 1/2

123

D

Q

61

D

Q

228

D

Q

25

D

Q

193

D

Q

D

QD

Q

D

Q

92114

0

0

0.8 0.80.6 0.4 0.5 0.3 0.3

(a) (b)

Fig. 40.8 weighted pseudorandom patterns.

Figure 40.7 shows a weighted pseudo-random pattern generator implemented with programmable probabilities of generating zeros and ones at the PIs. As we know, LFSR generates pattern with equal probability of 1s and 0s. As shown in Figure 40.8 (a), if a 3-input

AND gate is used, the probability of 1s becomes 0.125. If a 2-input OR gate is used, the

probability becomes 0.75. Second, one can use cellular automata to produce patterns of desired

weights as shown in Figure 40.8(b).

2.7 Cellular Automata for Pattern Generation

Cellular automata are excellent for pattern generation, because they have a better randomness

distribution than LFSRs. There is no shift induced bit value correlation. A cellular automaton is acollection of cells with regular connections. Each pattern generator cell has few logic gates, a

flip-flop and is connected only to its local neighbors. If C i is the state of the current CA cell, C i+1

and C i-1 are the states of its neighboring cells. The next state of cell Ci is determined by (Ci-1, Ci ,and Ci+1). The cell is replicated to produce cellular automaton. The two commonly used CA

structures are shown in Figure 40.9.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


7/30/2019 ES-Module_8


patterns, then the CUT response to RM will be 1 billion bits. This is not manageable in practice.

So it is necessary to compact this enormous amount of circuit responses to a manageable sizethat can be stored on the chip. The response analyzer compresses a very long test response into a

single word. Such a word is called a signature. The signature is then compared with the prestored

golden signature obtained from the fault-free responses using the same compression mechanism.If the signature matches the golden copy, the CUT is regarded fault-free. Otherwise, it is faulty.

There are different response analysis methods such as ones count, transition count, syndromecount, and signature analysis.

Compression: A reversible process used to reduce the size of the response. It is difficult in hard

ware.

Compaction: An irreversible (lossy) process used to reduce the size of the response.

a) Parity compression: It computes the parity of a bit stream.

b) Syndrome: It counts the number of 1’s in the bit stream.c) Transition count: It counts the number of times 0→1 and 1→0 condition occur in the

bit stream.

d) Cyclic Redundancy Check (CRC): It is also called signature. It computes CRC check word on the bit stream.

Signature analysis – Compact good machine response into good machine signature. Actual

signature generated during testing, and compared with good machine signature.

Aliasing: Compression is like a function that maps a large input space (the response) into a small

output space (signature). It is a many-to-one mapping. Errors may occur in the in the input bit

stream. Therefore, a faulty response may have the signature that matches the to the golden

signature and the circuit is reported as the fault-free one. Such a situation is referred as thealiasing or masking. The aliasing probability is the possibility that a faulty response is treated asfault-free. It is defined as follows:

Let us assume that the possible input patterns are uniformly distributed over the possible mapped

signature values. There are 2m

input patterns, 2r signatures and 2

n-r input patterns map into given

signature. Then the aliasing or masking probability

Number of erroneos input that map into the golden signatureP(M)=

Number of faulty input responses

m-r 2 -1= m2 -1

m-r 2for largem

21= r 2

m≈

The aliasing probability is the major considerations in response analysis. Due to the n-to-1mapping property of the compression, it is unlikely to do diagnosis after compression. Therefore,

the diagnosis resoluation is very poor after compression. In addition to the aliasing probability,

hardware overhead and hardware compatibility are also important issues. Here, hardware

compatibility is referred to how well the BIST hardware can be incorporated in the CUT or DFT.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


7/30/2019 ES-Module_8


Test

PatternCUT

CounterClock

DFF

Fig. 40.11 Transition count compression circuit structure

For N-bit test length with r transitions the masking probability is shown as follows:

For the test length of N, there are N-1 transitions.

Number of masking sequences = 1 N

r

⎛ ⎞−⎜ ⎟

⎝ ⎠

Hence, is the number of sequences that has r transitions.1 N

r

−⎛ ⎜

⎝ ⎠

⎞⎟

⎞⎟

Since the first output can be either one or zero, therefore, the total number must be multiplied by

2. Therefore total number of sequences with same transition counts : 2 . Again, only one

of them is fault-free.

1 N

r

−⎛ ⎜⎝ ⎠

Masking probabilities: ( )( )

12

12 1

2( )

2 1 N

N

P M N π

−

−⎛ ⎞−⎜ ⎟

⎝ ⎠= ≅−

3.3 Syndrome Testing

Syndrome is defined as the probability of ones of the CUT output response. The syndrome is 1/8 for a 3-input AND gate and 7/8 for a 3-input OR gate if the inputs has equal probability of ones

and zeros. Figure 40.12 shows a BIST circuit structure for the syndrome count. It is very similar

to ones count and transition count. The difference is that the final count is divided by the number of patterns being applied. The most distinguished feature of syndrome testing is that the

syndrome is independent of the implementation. It is solely determined by its function of the

circuit.

test

atternCUT

Syndrome CounterClock

random

CounterSyndrome

Fig. 40.12 Syndrome testing circuit structure


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


The originally design of syndrome test applies exhaustive patterns. Hence, the syndrome is

S K n= / 2 , where n is the number of inputs and K is the number of minterms. A circuit is

syndrome testable if all single stuck-at faults are syndrome detectable. The interesting part of syndrome testing is that any function can be designed as being syndrome testable.

3.4 LFSR Structure

External and internal type LFSR is used. Both types use D type flip-flop and exclusive-OR logic as shown in Figure 40.13.

In external type LFSR, XOR gates are placed outside the shift path. It is also called type 1

LFSR [1]. In internal type LFSRs, also called type 2 LFSR, XOR gates are placed in between the

flip-flops.

(b) Internal Type(a) External Type

D3 D2 D1 D0D3 D2 D1 D0

Fig. 40.13 Two types of LFSR

One of the most important properties of LFSRs is their recurrence relationship. The recurrence

relation guarantees that the states of a LFSR are repeated in a certain order. For a given sequence

of numbers a0, a1, a2,…………an,…….. We can define a generating function:G(x) = a0 + a1x + a2x

2+ …………+ amx

m+ ……

=0

m

m

m

a xα

=∑

{ } { }0 1 2, , ,......

where 1or 0dependingon theout put stageand time .m

i i

a a a a

a t

=

=

The initial states are a-n, a-n+1,…….,a-2, a-1. The recurrent relation defining {am}is

1

where 0,meansoutputisnot fed back

1,otherwise

n

m i m i

i

i

a c a

c

−=

=

=

=

∑


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


( )

( )( )

0 1

1 0

1

1

1 0

1

1

1

1

....

....

1

nm

i m i

m i

ni m

i m i

i m

ni i

i i m

i m

ni i

i i

i

ni

i

i

G x c a x

c x a x

c x a x a x a x

c x a x a x

G x

c x

α

α

α

−= =

−= =

−− −

= =

−− −

=

=

=

=

m⎡ ⎤= + + +⎢ ⎥⎣ ⎦

+ +

=

−

∑ ∑

∑ ∑

∑ ∑

∑

∑

G(x) has been expressed in terms of the initial state and the feedback coefficients. The

denominator of the polynomial G(x), ( )1

1n

i

i

i

f x c=

= − x∑ is called the characteristic polynomial of

the LFSR.

3.5 LFSR for Response Compaction: Signature Analysis

It uses cyclic redundancy check code (CRCC) generator (LFSR) for response compacter

In this method, data bits from circuit Pos to be compacted as a decreasing order coefficient polynomial

CRCC divides the PO polynomial by its characteristic polynomial that leaves remainder

of division in LFSR. LFSR must be initialized to seed value (usually 0) before testing. After testing, signature in LFSR is compared to known good machine signature

For an output sequence of length N, there is a total of 2

N

-1 faulty sequence. Let the inputsequence is represented as P(x) as P(x)=Q(X)G(x)+R(x). G(x) is the characteristic polynomial;

Q(x) is the quotient; and R(x) is the remainder or signature. For those aliasing faulty sequence,

the remainder R(x) will be the same as the fault-free one. Since, P(x) is of order N and G(x) is of order r , hence Q(x) has an order of N-r . Hence, there are 2 N-r

possible Q(x) or P(x). One of themis fault-free. Therefore, the aliasing probability is shown as follows:

2 1( ) 2

2 1

N r r

N P M

−−−

= ≅− for large N. Masking probabilities is independent of input sequence.

Figure 40.14 illustrates a modular LFSR as a response compactor.

CLOCK

D Q

X1X0

D D D DQ Q Q Q01010001

X2 X3 X4

Characteristics Polynomial x5

+ x3

+ x + 1

x2

x

3 x

4 x1

Fig. 40.14 Modular LFSR as a response compactor


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Any divisor polynomial G(x) with two or more non-zero coefficients will detect allsingle-bit errors.

3.6 Multiple-Input Signature Register (MISR)

The problem with ordinary LFSR response compacter is too much hardware overhead if

one of these is put on each primary output (PO). Multiole-input signature register (MISR) is the solution that compacts all outputs into one

LFSR. It works because LFSR is linear and obeys superposition principle. All responses are superimposed in one LFSR. The final remainder is XOR sum of

remainders of polynomial divisions of each PO by the characteristic polynomial.

L

F

S

R

C

U

T

M

I

S

R

Signature

Analyzer

Goldensignature

.

.

.

.

.

.

m

Testpatterns

Response

Ri(x)

Si(x)

Fig. 40.15 Multiple input signature register

Figure 40.15 illustrates a m-stage MISR. After test cycle i, the test responses are stable on CUT

outputs, but the shifting clock has not yet been applied.

Ri(x)= (m-1)th polynomial representing the test responses after test cycle i.Si(x)=polynomial representing the state of the MISR after test cycle i.

( )

( )

( ) ( ) ( ) ( )

( )

1 2

, 1 , 2 ,1 ,0

1 2

, 1 , 2 ,1 ,0

1

........

........

mod

is thecharacteristicpolynomial

m m

i i m i m i i

m m

i i m i m i

i i i

R x r x r x r x r

S x S x S x S x S

S x R x xS x G x

G x

− −− −

− −− −

+

= + + + +

= + + + +

= +⎡ ⎤⎣ ⎦

i

Assume initial state of MISR is 0. So,

( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

0

1 0 0 0

2 1 1 1 0

1 2

0 1 2 1

0

mod

mod mod

.

.

....... modn n

n n

S x

S x R x xS x G x R x

S x R x xS x G x R x R x G x

S x x R x x R x xR x R x G x− −− −

=

= + =⎡ ⎤⎣ ⎦

= + = +⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦

⎡ ⎤= + + + +⎣ ⎦n

This is the signature left in MISR after n patterns are applied. Let us consider a n-bit response

compactor with m-bit error polynomial. Then the error polynomial is of (m+n-2) degree that


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


gives (2m+n-1

-1) non-zero values. G(x) has 2n-1

-1 nonzero multiples that result m polynomials of degree <=m+n-2.

Probability of masking

1

1

2 1( )

2 1

1

2

n

m n

m

P M −

+ −

−=

−

≈

3.7 Logic BIST Architecture

Test-per-clock system

• More hardware, less test time.

• BILBO: Built in logic bloc observer Test-per-scan system.

• Less hardware, more test time.

• STUMPS: Self-Test using a MISR and Parallel Shift register. Circular self-test path

• Lowest hardware, lowest fault coverage.

3.7.1 Test-Per-Clock BIST

Two different test-per-clock BIST structures are shown in Figure 40.16. For every test clock,LFSR generates a test vector and Signature Analyzer (MISR) compresses a response vector. In

every clock period some new set of faults is tested. This system requires more hardware. It takes

less test time. It can be used for exhaustive test, pseudo-exhaustive test, pseudorandom testing,and weight pseudorandom testing.

LFSR LFSR

CUT CUT

MISR MISR

Shift Register

Fig. 40.16 Test-Per-Clock BIST structure

3.7.2 Built-in Logic Block Observer (BILBO)[1]

Built-in logic block observation is a well known approach for pipelined architecture. It adds

some extra hardware to the existing registers (D flip-flop, pattern generator , response

compacter , & scan chain) to make them multifunctional. All FFs are reset to 0. The circuit

diagram of a BILBO module is shown in Figure 40.17. The BILBO has two control signals (B1

and B2).


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


7/30/2019 ES-Module_8


7/30/2019 ES-Module_8


4. BIST for Structured Circuits

Structured design techniques are the keys to the high integration of VLSI circuits. The structured

circuits include read only memories (ROM), random access memories (RAM), programmablelogic array (PLA), and many others. In this section, we would like to focus on PLAs because

they are tightly coupled with the logic circuits. While, memories are usually categorized as

different category. Due to the regularity of the structure and the simplicity of the design, PLAsare commonly used in digital systems. PLAs are efficient and effective for the implementation of

arbitrary logic functions, combinational or sequential. Therefore, in this section, we would like todiscuss the BIST for PLAs.

A PLA is conceptually a two level AND-OR structure realization of Boolean function. Figure

40.21 shows a general structure of a PLA. A PLA typically consists of three parts, input

decoders, the AND plane, the OR plane, and the output buffer. The input decoders are usuallyimplemented as single-bit decoders which produce the direct and the complement form of inputs.

The AND plane is used to generate all the product terms. The OR plane sum the required product

terms to form the output bits. In the physical implementation, they are implemented as NAND- NAND or NOR-NOR structure.

AND Plane

First NOR Plane

Input Decoders

. . .

. . .

PLA Inputs

OR Plane

Second NOR Plane

Output Buffers

. . .

. . .

PLA Outputs

productlines

.

.

Fig. 40.21 A general structure of a PLA.

As mentioned earlier in the fault model section, PLAs has the following faults, stuck-at faults,

bridging faults, and crosspoint faults. Test generation for PLAs is more difficult than that for the

conventional logic. This is because that PLAs have more complicated fault models. Further, atypical PLA may have as many as 50 inputs, 67 inputs, and 190 product terms [10-11].

Functional testing of such PLAs can be a difficult task. PLAs often contain unintentional and

unidentifiable redundancy which might cause fault masking. Further more, PLAs are oftenembedded in the logic which complicates the test application and response observation.

Therefore, many people proposed the use of BIST to handle the test of PLAs.

5. BIST Applications

Manufactures are increasingly employing BIST in real products. Examples of such applicationsare given to illustrate the use of BIST in semiconductor, communications, and computer

industrial.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


5.1 Exhaustive Test in the Intel 80386 [12]

Intel 80386 has BIST logic for the exhaustive test of three control PLAs and three control

ROMs. For PLAs, the exhaustive patterns are generated by LFSRs embedded in the inputregisters. For ROMs, the patterns are generated by the microprogram counter which is part of the

normal logic. The largest PLA has 19 input bits. Hence, the test length is 512K clock cycles. The

test responses are compressed by MISRs at the outputs. The contents of MISRs are continuouslyshifted out to an LFSR. At the end of testing, the contents of LFSRs are compared.

5.2 Pseudorandom Test in the IBM RISC/6000 [13]

The RISC/6000 has extensive BIST structure to cover the entire system. In accord with

their tradition, RISC/6000 has full serial scan. Hence, the BIST it uses is the pseudorandomtesting in the form of STUMPS. For embedded RAMs, it performs self-test and delay testing. For

the BIST, it has a on chip processor (COP) on each chip. In COP, there are an LFSR for pattern

generation, a MISR for response compression, and a counter for address counting in RAM bist.The COP counts for less than 3% of the chip area.

5.3 Embedded Cache Memories BIST of MC68060 [14]

MC68060 has two test approaches for embedded memories. First it has adhoc direct memory

access for manufacturing testing because it has the only memory approach that meets all the

design goals. The adhoc direct memory acess uses additional logic to make address, data in, dataout, and control line for each memory accessible through package pins. An additional set of

control signals selects which memory is activated. The approach makes each memory visiblethrough the chip pins as though it is a stand-alone memory array. For the burn-in test, it builds

the BIST hardware around the adhoc test logic. The two-scheme approach is used because itmeets the burn-in requirements with little additional logic.

5.4 ALU Based Programmable MISR of MC68HC11 [15]

Broseghini and Lenhert implemented an ALU-Based self-test system on a MC68HC11 Familymicrocontroller. A fully programmable pseudorandom pattern generator and MISR are used to

reduce test length and aliasing probabilities. They added microcodes to configure ALU into aLFSR or MISR. It transforms the adder into a LFSR by forcing the carry input to 0. With such a

feature, the hardware overhead is minimized. The overhead is only 25% as compare to the

implementation by dedicated hardware.

References

[1] M. L. Bushnell and V. D Agarwal, “Essentials of Electronic Testing” Kluwer academic

Publishers, Norwell, MA, 2000.

[2] H. Al-Asaad, B. T. Murray, and J. P. Hayes, “Online BIST for embedded systems” IEEEDesign & Test of Computers, Volume 15, Issue 4, Oct.-Dec. 1998 Page(s): 17 – 24

[3] M. Abramovici, M.A. Breuer, AND A.D. Friedman, “Digital Systems Testing and

Testable Design”, IEEE Press 1990.

[4] R. Zurawski, “Embedded Systems Handbook”, Taylor & Francis, 2005.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


7/30/2019 ES-Module_8


7/30/2019 ES-Module_8


7/30/2019 ES-Module_8


Lesson

41Boundary Scan Methods

and StandardsVersion 2 EE IIT, Kharagpur 2

w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8




• Explain the meaning of the term Boundary Scan •

List the IEEE 1149 series of standards with their important features

• Describe the architecture of IEEE 1149.1 boundary scan and explain the functionality of each of its components

• Explain, with the help of an example, how a board-level design can be equipped with the boundary scan feature

• Describe the advantages and disadvantages of the boundary scan technique Boundary Scan Methods and Standards

1. Boundary Scan History and FamilyBoundary Scan is a family of test methodologies aiming at resolving many test problems: fromchip level to system level, from logic cores to interconnects between cores, and from digitalcircuits to analog or mixed-mode circuits. It is now widely accepted in industry and has beenconsidered as an industry standard in most large IC system designs. Boundary-scan, as defined by the IEEE Std. 1149.1 standard [1-3], is an integrated method for testing interconnects on printed circuit board that is implemented at the IC level. Earlier, most Printed Circuit Board (PCB) testing was done using bed-of-nail in-circuit test equipment. Recent advances with VLSItechnology now enable microprocessors and Application Specific Integrated Circuits (ASICs) to be packaged into fine pitch, high count packages. The miniaturization of device packaging, the

development of surface-mounted packaging, double-sided and multi-layer board to accommodatethe extra interconnects between the increased density of devices on the board reduces the physical accessibility of test points for traditional bed-of-nails in-circuit tester and poses a greatchallenge to test manufacturing defects in future. The long-term solution to this reduction in physical probe access was to consider building the access inside the device i.e. a boundary scanregister. In 1985, a group of European companies formed Joint European Test Action Group(JETAG) and by 1988 the Joint Test Action Group (JTAG) was formed by several companies totackle these challenges. The JTAG has developed a specification for boundary-scan testing thatwas standardized in 1990 by IEEE as the IEEE Std. 1149.1-1990. In 1993 a new revision to theIEEE Std. 1149.1 standard was introduced (1149.1a) and it contained many clarifications,corrections, and enhancements. In 1994, a supplement that contains a description of the

boundary-scan Description Language (BSDL) was added to the standard. Since that time, thisstandard has been adopted by major electronics companies all over the world. Applications arefound in high volume, high-end consumer products, telecommunication products, defensesystems, computers, peripherals, and avionics. Now, due to its economic advantages, smaller companies that cannot afford expensive in-circuit testers are using boundary-scan. Figure 41.1gives an overview of the boundary scan family, now known as the IEEE 1149.x standards.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Number Description Year

IEEE 1149.1 Testing of digital chips and interconnections

between chips

Std 1149.1 – 1990

IEEE 1149.1a Added supplement A. Rewrite of the chapter

describing boundary register

Std 1149.1a – 1993

IEEE 1149.1b Supplement B - formal description of the

boundary-scan Description Language (BSDL)

Std 1149.1b – 1994

IEEE 1149.1c Corrections, clarifications and enhancements of

IEEE Std 1149.1a and Std 1149.1b. Combines

1149.1a & 1149.1b

Std 1149.1 –2001

IEEE 1149.2 Extended Digital Serial Interface. It has merged

with 1149.1 group.

Obsolete

IEEE 1149.3 Direct Access Testability Interface Obsolete

IEEE 1149.4 Test Mixed-Signal and Analog assemblies Std. 1149.4 – 1999

IEEE 1149.5 Standard Module Test and Maintenance (MTM)

Bus Protocol. Deals with test at system level,

1149.2 has merged with.

Std. 1149.5 –1995

IEEE 1149.6 Includes AC-coupled and/or differential nets. Std 1149.6 - 2002

IEEE 1532 It is a derivative standard for in-system

programming (ISP) of digital devices.

2000

Fig. 41.1 IEEE 1149 Family

The Std. 1149.1, usually referred to as the digital boundary scan, is the one that has been used widely. It can be divided into two parts: 1149.1a, or the digital Boundary Scan Standard, and 1149.1b, or the Boundary Scan Description Language (BSDL) [1,6]. Std. 1149.1 defines the chiplevel test architecture for digital circuits, and Std. 1149.1b is a hardware description languageused to describe boundary scan architecture. The 1149.2 defines the extended digital seriesinterface in the chip level. It has merged with 1149.1 group. The 1149.3 defines the direct accessinterface in contrast to 1149.2. Unfortunately this work has been discontinued. 1149.4 IEEEStandard deals with Mixed-Signal Test Bus [4]. This standard extends the test structure defined in IEEE Std. 1149.1 to allow testing and measurement of mixed-signal circuits. The standard describes the architecture and the means of control and access to analog and digital test data.The Std.1149.5 defines the bus protocol at the module level. By combining this level and Std.1149.1a one can easily carry out the testing of a PC board.

1149.6 IEEE Standard for Boundary-Scan Testing of Advanced Digital Networks is released in2002. This standard augments 1149.1 for the testing of conventional digital networks and 1149.4for analog networks. The 1149.6 standard defines boundary-scan structures and methods


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


required to test advanced digital networks that are not fully covered by IEEE Std. 1149.1, suchas networks that are AC-coupled, differential, or both.1532 IEEE Standard is developed for In-System Configuration of Programmable Devices [5].This extension of 1149.1 standardizes programming access and methodology for programmableintegrated circuit devices. Devices such as CPLDs and FPGAs, regardless of vendor, thatimplement this standard may be configured (written), read back, erased and verified, singly or

concurrently, with a standardized set of resources based upon the algorithm descriptioncontained in the 1532 BSDL file. JTAG Technologies programming tools contain support for 1532-compliant devices and automatically generate the applications.Clearly the testing of mixed-mode circuits at the various levels of integration will be a criticaltest issue for the system-on-chip design. Therefore there is a demand to combine all the boundaryscan standards into an integrated one.

2. Boundary Scan Architecture

The boundary-scan test architecture provides a means to test interconnects between integrated circuits on a board without using physical test probes. It adds a boundary-scan cell that includes

a multiplexer and latches, to each pin on the device. Figure 41.2 [1] illustrates the main elementsof a universal boundary-scan device.

The Figure 41.2 shows the following elements:

• Test Access Port (TAP) with a set of four dedicated test pins: Test Data In (TDI), TestMode Select (TMS), Test Clock (TCK), Test Data Out (TDO) and one optional test pinTest Reset (TRST*).

• A boundary-scan cell on each device primary input and primary output pin, connected internally to form a serial boundary-scan register (Boundary Scan).

• A TAP controller with inputs TCK, TMS, and TRST*.

• An n-bit (n >= 2) instruction register holding the current instruction.

• A 1-bit Bypass register (Bypass).• An optional 32-bit Identification register capable of being loaded with a permanent

device identification code.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Boundary-Scan Register

1149.1 Chip Architecture

Internal Register

Any Digital Chip

Bypass Register

TDOTDI

Identification Register

Instruction Register

TMS

TCK

1

1

1

TAP

Controller

TRST* (optional)

Fig. 41.2 Main Elements of a IEEE 1149.1 Device Architecture

The test access ports (TAP), which define the bus protocol of boundary scan, are the additionalI/O pins needed for each chip employing Std.1149.1a. The TAP controller is a 16-state final statemachine that controls each step of the operations of boundary scan. Each instruction to be carried out by the boundary scan architecture is stored in the Instruction Register. The various controlsignals associated with the instruction are then provided by a decoder. Several Test DataRegisters are used to stored test data or some system related information such as the chip ID,company name, etc.

2.1 Bus Protocol

The Test Access Ports (TAPs) are genral purpose ports and provide access to the test function of the IC between the application circuit and the chip’s I/O pads. It includes four mandatory pinsTCK, TDI, TDO and TMS and one optional pin TRST* as described below. All TAP inputs and outputs shall be dedicated connections to the component (i.e., the pins used shall not be used for any other purpose).

• Test Clock Input (TCK): a clock independent of the system clock for the chip so that testoperations can be synchronized between the various parts of a chip. It also synchronizesthe operations between the various chips on a printed circuit board. As a convention, the


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


test instructions and data are loaded from system input pins on the rising edge of TCK and driven through system output pins on its falling edge. TCK is pulsed by theequipment controlling the test and not by the tested device. It can be pulsed at anyfrequency (up to a maximum of some MHz). It can be even pulsed at varying rates.

• Test Data Input (TDI): an input line to allow the test instruction and test data to be loaded into the instruction register and the various test data registers, respectively.

• Test Data Output (TDO): an output line used to serially output the data from the JTAGregisters to the equipment controlling the test.

• Test Mode Selector (TMS): the test control input to the TAP controller. It controls thetransitions of the test interface state machine. The test operations are controlled by thesequence of 1s and 0s applied to this input. Usually this is the most important input thathas to be controlled by external testers or the on-board test controller.

Test Reset Input (TRST*): The optional TRST* pin is used to initialize the TAP controller, thatis, if the TRST* pin is used, then the TAP controller can be asynchronously reset to a Test-Logic-Reset state when a 0 is applied at TRST*. This pin can also be used to reset the circuitunder test, however it is not recommended for this application.

2.2 Boundary Scan Cell

The IEEE Std. 1149.1a specifies the design of four test data registers as shown in Figure41.2. Two mandatory test data registers, the bypass and the boundary-scan resisters, must beincluded in any boundary scan architecture. The boundary scan register, though may be a littleconfusing by its name, refers to the collection of the boundary scan cells. The other registers,such as the device identification register and the design-specific test data registers, can be added optionally.

0

1

0

1

D Q D Q

Clk Clk

Data In

(PI) Data Out

(PO)CaptureScan Cell

UpdateHold Cell

Scan in ShiftDR ClockDR(SI)

UpdateDR

Scan Out(SO)

= 0, Functional modeMode = 1, Test mode

(for BC_1)

C

SU

Basic Boundary – Scan Cell (BC 1)

Fig. 41.3 Basic Boundary Scan Cell


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Figure 41.3 [1] shows a basic universal boundary-scan cell, known as a BC_1. The cell has four modes of operation: normal, update, capture, and serial shift. The memory elements are two D-type flip-flops with front-end and back-end multiplexing of data. It is important to note that thecircuit shown in Figure 41.3 is only an example of how the requirement defined in the Standard could be realized. The IEEE 1149.1 Standard does not mandate the design of the circuit, only its

functional specification. The four modes of operation are as follows:

1) During normal mode also called serial mode, Data_In is passed straight through toData_Out.

2) During update mode, the content of the Update Hold cell is passed through to Data_Out.Signal values already present in the output scan cells to be passed out through the deviceoutput pins. Signal values already present in the input scan cells will be passed into theinternal logic.

3) During capture mode, the Data_In signal is routed to the input Capture Scan cell and thevalue is captured by the next ClockDR. ClockDR is a derivative of TCK. Signal valueson device input pins to be loaded into input cells, and signal values passing from the

internal logic to device output pins to be loaded into output cells

4) During shift mode, the Scan_Out of one Capture Scan cell is passed to the Scan_In of thenext Capture Scan cell via a hard-wired path.

The Test ClocK, TCK, is fed in via yet another dedicated device input pin and the various modesof operation are controlled by a dedicated Test Mode Select (TMS) serial control signal. Notethat both capture and shift operations do not interfere with the normal passing of data from the parallel-in terminal to the parallel-out terminal. This allows on the fly capture of operationalvalues and the shifting out of these values for inspection without interference. This application of the boundary-scan register has tremendous potential for real-time monitoring of the operationalstatus of a system — a sort of electronic camera taking snapshots — and is one reason why TCK

is kept separate from any system clocks.

2.3 Boundary Scan Path

At the device level, the boundary-scan elements contribute nothing to the functionality of theinternal logic. In fact, the boundary-scan path is independent of the function of the device. Thevalue of the scan path is at the board level as shown in Figure 41.4 [1].The figure shows a board containing four boundary-scan devices. It is seen that there is an edge-connector input called TDI connected to the TDI of the first device. TDO from the first device is permanently connected to TDI of the second device, and so on, creating a global serial scan pathterminating at the edge connector output called TDO. TCK is connected in parallel to each

device TCK input. TMS is connected in parallel to each device TMS input. All cell boundarydata registers are serially loaded and read from this single chain.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Boundary-scan cellChip 1 Chip 2

TMS TMSTDI

TCK TCK

Chip 3Chip 4

TMS TMS

TCK TCK

Serial

The advantage of this configuration is that only two pins on the PCB/MCM are needed for boundary scan data register support. The disadvantage is very long shifting sequences to deliver test patterns to each component, and to shift out test responses. This leads to expensive time on

the external tester. As shown in Figure 41.5 [1], the single scan chain is broken into two parallel boundary scan chains, which share a common test clock (TCK). The extra pin overhead is onemore pin. As there are two boundary scan chains, so the test patterns are half as long and testtime is roughly halved. Here both chains share common TDI and TDO pins, so when the top twochips are being shifted, the bottom two chips must be disabled so that they do not drive their TDO lines. The opposite must hold true when the bottom two chips are being tested.

data in

Serial

data outTDO

Serial test interconnect System interconnect

TCK

TMS

Fig. 41.4 MCM with Serial Boundary Scan Chain


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


T D I

T D O

T C K

T M S 1

T M S 2

TDI TDO TDI TDI TDITDO TDO TDO

Fig. 41.5 MCM with two parallel boundary scan chains

2.4 TAP Controller

The operation of the test interface is controlled by the Test Access Port (TAP) controller. This is

a 16-state finite state-machine whose state transitions are controller by the TMS signal; the state-transition diagram is shown in Figure 41.7. The TAP controller can change state only at therising edge of TCK and the next state is determined by the logic level of TMS. In other words,the state transition in Figure 41.6 follows the edge with label 1 when the TMS line is set to 1,otherwise the edge with label 0 is followed. The output signals of the TAP controller corresponding to a subset of the labels associated with the various states. As shown in Figure41.2, the TAP consists of four mandatory terminals plus one optional terminal. The main functions of the TAP controller are:

• To reset the boundary scan architecture,

• To select the output of instruction or test data to shift out to TDO,

• To provide control signals to load instructions into Instruction Register,

• To provide signals to shift test data from TDI and test response to TDO, and

• To provide signals to perform test functions such as capture and application of test data.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


TAP Controller

TMSTCK

TRST*

ClockDR

ShiftDR

UpdateDR

Reset*

Select

ClockIR

ShiftIR

UpdateIR

Enable

16-state FSM

TAP Controller

(Moore machine)

Fig. 41.6 Top level view of TAP Controller

Figure 41.6 shows a top-level view of TAP Controller. TMS and TCK (and the optional TRST*)go to a 16-state finite-state machine controller, which produces the various control signals. Thesesignals include dedicated signals to the Instruction register (ClockIR, ShiftIR, UpdateIR) and generic signals to all data registers (ClockDR, ShiftDR, UpdateDR). The data register thatactually responds is the one enabled by the conditional control signals generated at the paralleloutputs of the Instruction register, according to the particular instruction.

The other signals, Reset, Select and Enable are distributed as follows:

• Reset is distributed to the Instruction register and to the target Data Register

• Select is distributed to the output multiplexer

• Enable is distributed to the output driver amplifier

It must be noted that the Standard uses the term Data Register to mean any target register exceptthe Instruction register


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


7/30/2019 ES-Module_8


4. Capture-DR: In this state, data can be loaded in parallel to the data registers selected by thecurrent instruction.

5. Shift-DR: In this state, test data are scanned in series through the data registers selected bythe current instruction. The TAP controller may stay at this state as long as TMS=0. For each clock cycle, one data bit is shifted into (out of) the selected data register through TDI(TDO).

6. Exit-DR: All parallel-loaded (from the Capture-DR state) or shifted (from the Shift-DR state) data are held in the selected data register in this state.

7. Pause-DR: The BS pauses its function here to wait for some external operations. For example, when a long test data is to be loaded to the chip(s) under test, the external tester may need to reload the data from time to time. The Pause-DR is a state that allows the boundary scan architecture to wait for more data to shift in.

8. Exit2-DR: This state represents the end of the Pause-DR operation, allows the TAPcontroller to go back to ShiftDR state for more data to shift in.

9. Update-DR: The test data stored in the first stage of boundary scan cells is loaded to the

second stage in this state.

2.5 Bypass and Identification Registers

Figure 41.8 shows a typical design for a Bypass register. It is a 1-bit register, selected by theBypass instruction and provides a basic serial-shift function. There is no parallel output (whichmeans that the Update_DR control has no effect on the register), but there is a defined effect withthe Capture_DR control — the register captures a hard-wired value of logic 0.

D Q

Clk

0

From TDI

To TDO

ShiftDR

ClockDR

Fig. 41.8 Bypass register

2.6 Instruction Register

As shown in Figure 41.9, an Instruction register has a shift scan section that can be connected between TDI and TDO, and a hold section that holds the current instruction. There may be somedecoding logic beyond the hold section depending on the width of the register and the number of different instructions. The control signals to the Instruction register originate from the TAPcontroller and either cause a shift-in/shift-out through the Instruction register shift section, or cause the contents of the shift section to be passed across to the hold section (parallel Update


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


operation). It is also possible to load (Capture) internal hard-wired values into the shift section of the Instruction register. The Instruction register must be at least two-bits long to allow coding of the four mandatory instructions — Extest, Bypass, Sample, Preload — but the maximum lengthof the Instruction register is not defined. In capture mode, the two least significant bits mustcapture a 01 pattern. (Note: by convention, the least-significant bit of any register connected between the device TDI and TDO pins, is always the bit closest to TDO.) The values captured

into higher-order bits of the Instruction register are not defined in the Standard. One possible useof these higher-order bits is to capture an informal identification code if the optional 32-bitIdentification register is not implemented. In practice, the only mandated bits for the Instructionregister capture is the 01 pattern in the two least-significant bits. We will return to the value of capturing this pattern later in the tutorial.

Decode Logic

Hold register(Holds current instruction)

Scan Register

Scan-in new instruction/scan-out capture bits)

TAPController

From

TDI

To

TDO

IR Control Higher order bits:current instruction, status bits, informal ident,results of a power-up self test, …

0 1

DR select and control signals routed to selected target register

Instruction Register

Fig. 41.9 Instruction register

2.7 Instruction Set

The IEEE 1149.1 Standard describes four mandatory instructions: Extest, Bypass, Sample, and Preload, and six optional instructions: Intest, Idcode, Usercode, Runbist, Clamp and HighZ.

Whenever a register is selected to become active between TDI and TDO, it is always possible to perform three operations on the register: parallel Capture followed by serial Shift followed by parallel Update. The order of these operations is fixed by the state-sequencing design of the TAPcontroller. For some target Data registers, some of these operations will be effectively nulloperations, no ops.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Standard Instructions

Instruction Selected Data Register

Mandatory:ExtestBypassSamplePreload

Optional:

IntestIdcodeUsercodeRunbistClampHighZ

Boundary scan (formerly all-0s code)Bypass (initialized state, all-1s code)Boundary scan (device in functional mode)Boundary scan (device in function mode)

Boundary scanidentification (initialized state if present)Identification (for PLDs)Result register Bypass (output pins in safe state)Bypass (output pins in high-Z state)

NB. All unused instruction codes must default to Bypass

EXTEST: This instruction is used to test interconnect between two chips. The code for Extestused to be defined to be the all-0s code. The EXTEST instruction places an IEEE 1149.1compliant device into an external boundary test mode and selects the boundary scan register to be connected between TDI and TDO. During this instruction, the boundary scan cells associated with outputs are preloaded with test patterns to test downstream devices. The input boundarycells are set up to capture the input data for later analysis.

BYPASS: A device's boundary scan chain can be skipped using the BYPASS instruction,allowing the data to pass through the bypass register. The Bypass instruction must be assigned anall-1s code and when executed, causes the Bypass register to be placed between the TDI and

TDO pins. This allows efficient testing of a selected device without incurring the overhead of traversing through other devices. The BYPASS instruction allows an IEEE 1149.1 compliantdevice to remain in a functional mode and selects the bypass register to be connected betweenthe TDI and TDO pins. The BYPASS instruction allows serial data to be transferred through adevice from the TDI pin to the TDO pin without affecting the operation of the device.

SAMPLE/PRELOAD: The Sample and Preload instructions, and their predecessor theSample/Preload instruction, selects the Boundary-Scan register when executed. The instructionsets up the boundary-scan cells either to sample (capture) values or to preload known values intothe boundary-scan cells prior to some follow-on operation. During this instruction, the boundaryscan register can be accessed via a data scan operation, to take a sample of the functional dataentering and leaving the device. This instruction is also used to preload test data into the boundary-scan register prior to loading an EXTEST instruction.

INTEST: With this command the boundary scan register (BSR) is connected between the TDIand the TDO signals. The chip's internal core-logic signals are sampled and captured by the BSR cells at the entry to the "Capture_DR" state as shown in TAP state transition diagram. Thecontents of the BSR register are shifted out via the TDO line at exits from the "Shift_DR" state.As the contents of the BSR (the captured data) are shifted out, new data are sifted in at the entriesto the "Shift_DR" state. The new contents of the BSR are applied to the chip's core-logic signalsduring the "Update_DR" state.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


IDCODE: This is used to select the Identification register between TDI and TDO, preparatory toloading the internally-held 32-bit identification code and reading it out through TDO. The 32 bitsare used to identify the manufacturer of the device, its part number and its version number.

USERCODE: This instruction selects the same 32-bit register as IDCODE, but allows analternative 32 bits of identity data to be loaded and serially shifted out. This instruction is used for dual-personality devices, such as Complex Programmable Logic Devices and Field

Programmable Gate Arrays.

RUNBIST: An important optional instruction is RunBist. Because of the growing importance of internal self-test structures, the behavior of RunBist is defined in the Standard. The self-testroutine must be self-initializing (i.e., no external seed values are allowed), and the execution of RunBist essentially targets a self-test result register between TDI and TDO. At the end of theself-test cycle, the targeted data register holds the Pass/Fail result. With this instruction one cancontrol the execution of the memory BIST by the TAP controller, and hence reducing thehardware overhead for the BIST controller.

CLAMP: Clamp is an instruction that uses boundary-scan cells to drive preset values established initially with the Preload instruction onto the outputs of devices, and then selects the Bypass

register between TDI and TDO (unlike the Preload instruction which leaves the device with the boundary-scan register still selected until a new instruction is executed or the device is returned to the Test_Logic Reset state). Clamp would be used to set up safe guarding values on theoutputs of certain devices in order to avoid bus contention problems, for example.HIGH-Z: It is similar to Clamp instruction, but it leaves the device output pins in a high-impedance state rather than drive fixed logic-1 or logic-0 values. HighZ also selects the Bypassregister between TDI and TDO.

3. On Board Test Controller

So far the test architecture of boundary scan inside the chip under test has been discussed. A

major problem remains is "Who is going to control the whole boundary scan test procedure?" Ingeneral there are two solutions for this problem: using an external tester and using a special on- board controller. The former is usually expensive because of the involving of an IC tester. Thelatter provides an economic way to complete the whole test procedure. As clear from the abovedescription, in addition to the test data, the most important signal that a test controller has to provide is the TMS signal. There exist two methods to provide this signal in a board: the star configuration and the ring configuration as shown in Figure 41.10. In the star configuration theTMS is broadcast to all chips. Hence all chips must execute the same operation at any time. For the ring structure, the test controller provides one independent TMS signal for each chip,therefore great flexibility of the test procedure is facilitated.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


TDITCKTMSTDO

# 1

TDITCKTMSTDO

# 2

TDITCKTMSTDO

# N

Application chips

Busmaster

TD0

TDI

TMS

TCK

(a)

TDITCKTMSTDO

# 1

TDITCKTMSTDO

# 2

TDITCKTMSTDO

# N

Application chips

Busmaster

TD0TDI

TMS1

(b)

TMS2

TMSN

TCK

Fig. 41.10 BUS master for chips with BS: (a) star structure, (b) ring structure

4. How Boundary Scan Testing Is DoneIn a board design there usually can be many JTAG compliant devices. All these devices can beconnected together to form a single scan chain as illustrated in Figure 41.11, "Single BoundaryScan Chain on a Board." Alternatively, multiple scan chains can be established so parallelchecking of devices can be performed simultaneously.Figure 41.11, "Single Boundary Scan Chain on a Board," illustrates the on onboard TAPcontrollers connected to an offboard TAP control device, such as a personal computer, through aTAP access connector. The offboard TAP control device can perform different tests during board manufacturing without the need of bed-of-nail equipment.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Figure 11

TDO

TMS

TDI

L

O

G

I

C

BP

IR

DR

TAP

TCK

TDI

L

O

G

I

C

BP

IR

DR

TAP

TCK

TDI

L

O

G

I

C

BP

IR

DR

TAPTCK

TDO TDO

TMS TMS

TAP Control Device

(Test Software on PC/WS)

Test Connector

Fig. 41.11 Single Boundary Scan Chain on a Board

5. Simple Board Level Test Sequence

One of the first tests that should be performed for a PCB test is called the infra-structure test.This test is used to determine whether all the components are installed correctly. This test relieson the fact that the last two bits of the instruction register (IR) are always ``01''. By shifting outthe IR of each device in the chain, it can be determined whether the device is properly installed.This is accomplished through sequencing the TAP controller for IR read.After the infra-structure test is successful, the board level interconnect test can begin. This isaccomplished through the EXTEST command. This test can be used to check out ``opens'' and ``shorts'' on the PCB. The test patterns are preloaded into the output pins of the driving devices.Then they are propagated to the receiving devices and captured in the input boundary scan cells.The result can then be shifted out through the TDO pin for analysis.These patterns can be generated and analyzed automatically, via software programs. This featureis normally offered through tools like Automatic Test Pattern Generation (ATPG) or BoundaryScan Test Pattern Generation (BTPG).

6. Boundary Scan Description Language

Boundary Scan Description Language (BSDL) has been approved as the IEEE Std. 1149.1b(the original boundary scan standard is IEEE Std. 1149.1a) [1,6]. This VHDL compatible


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


language can greatly reduce the effort to incorporate boundary scan into a chip, and hence isquite useful when a designer wishes to design boundary scan in his own style. Basically for those parts that are mandatory to the Std. 1149.1a such as the TAP controller and the BYPASSregister, the designer does not need to describe them; they can be automatically generated. Thedesigner only has to describe the specifications related to his own design such as the length of boundary scan register, the user-defined boundary scan instructions, the decoder for his own

instructions, the I/O pins assignment. In general these descriptions are quite easy to prepare. Infact, currently many CAD tools already implement the boundary scan generation procedure and thus it may even not needed for a designer to write the BSDL file: the tools can automaticallygenerate the needed boundary scan circuitry for any circuit design as long as the I/O of thedesign is specified.

Any manufacturer of a JTAG compliant device must provide a BSDL file for that device. TheBSDL file contains information on the function of each of the pins on the device - which areused as I/Os, power or ground. BSDL files describe the Boundary Scan architecture of a JTAG-compliant device, and are written in VHDL. The BSDL file includes:

1. Entity Declaration: The entity declaration is a VHDL construct that is used to identify thename of the device that is described by the BSDL file.

2. Generic Parameter: The Generic parameter specifies which package is described by theBSDL file.

3. Logical Port Description: lists all of the pads on a device, and states whether that pin is aninput(in bit;), output(out bit;), bidirectional (inout bit;) or unavailable for boundary scan (linkage bit;).

.4. Package Pin Mapping: The Package Pin Mapping shows how the pads on the device die arewired to the pins on the device package.

5. Use statements: The use statement calls VHDL packages that contain attributes, types,constants, etc. that are referenced in the BSDL File.

6. Scan Port Identification: The Scan Port Identification identifies the JTAG pins: TDI, TDO,TMS, TCK and TRST (if used).

7. TAP description: provides additional information on the device's JTAG logic; the InstructionRegister length, Instruction Opcodes, device IDCODE, etc. These characteristics are devicespecific.

8. Boundary Register description: provides the structure of the Boundary Scan cells on thedevice. Each pin on a device may have up to three Boundary Scan cells, each cell consisting of aregister and a latch.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


C

OR

E

L

O

G

IC

D6

D5

D4D3

D2

D1

Q6

Q5

Q4Q3

Q2

Q1

C

OR

E

L

O

GI

C

D6

D5

D4

D3

D2

D1

Q6

Q5

Q4

Q3

Q2

Q1

CLK CLK

TAPController

12

11

10

9

8

7

6

5

4

3

2

1

0

1

6

7

12 11

10

9

8

13

14

15

16

17

5432

TDI TCK TMS TDO(a)(b)

Fig. 41.12 Example to illustrate BSDL (a) core logic (b) after BS insertion

7. Benefits and Penalties of Boundary Scan

The decision whether to use boundary-scan usually involves economics. Designers often hesitateto use boundary-scan due to the additional silicon involved. In many cases it may appear that the penalties outweigh the benefits for an ASIC. However, considering an analysis spanning allassembly levels and all test phases during the system's life, the benefits will usually outweigh the

penalties.

Benefits

The benefits provided by boundary-scan include the following:

• lower test generation costs• reduced test time• reduced time to market• simpler and less costly testers• compatibility with tester interfaces• high-density packaging devices accommodation

By providing access to the scan chain I/Os, the need for physical test points on the board iseliminated or greatly reduced, leading to significant savings as a result of simpler board layouts,less costly test fixtures, reduced time on in-circuit test systems, increased use of standard interfaces, and faster time-to-market. In addition to board testing, boundary-scan allows programming almost all types of CPLDs and flash memories, regardless of size or package type,on the board, after PCB assembly. In-system programming saves money and improvesthroughput by reducing device handling, simplifying inventory management, and integrating the programming steps into the board production line.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Penalties

The penalties incurred in using boundary-scan include the following:

• extra silicon due to boundary scan circuitry• added pins• additional design effort

• degradation in performance due to gate delays through the additional circuitry• increased power consumption

Boundary Scan Example

Since boundary-scan design is new to many designers, an example of gate count for a circuitwith boundary scan is discussed here. This provides an estimate for the circuitry sizes required toimplement the IEEE 1149.1 standard, but without the extensions defined in the standard. Theexample uses a library-based gate array design environment. The gate counts given are based oncommercial cells and relate to a 10000 gate design in a 40-pin package. Table 1 gives the gaterequirement.

Logic Element Gate Equivalent

Variable SizeBoundary-scan Register (40 cells)Fixed SizesTAP controller Instruction Register (2 bits)Bypass Register Miscellaneous Logic

680 Approx

131289

20 Approx

Total 868 Approx

Table: 1 Gate requirements for a Gate Array Boundary-scan Design

It must be noted that in Table 1 the boundary-scan implementation requires 868 gates, requiringan estimated 8 percent overhead. It also be noted that the cells used in this example were created prior to publication of the IEEE 1149.1 standard. If specific cell designs had been available tosupport the standard or if the vendor had placed the boundary-scan circuitry in areas of the ASICnot available to the user, then the design would have required less.

9. Conclusion

Board level testing has become more complex with the increasing use of fine pitch, high pincount devices. However with the use of boundary scan the implementation of board level testingis done more efficiently and at lower cost. This standard provides a unique opportunity tosimplify the design debug and test processes by enabling a simple and standard means of automatically creating and applying tests at the device, board, and system levels. Boundary scanis the only solution for MCMs and limited-access SMT/ML boards. The standard supportsexternal testing with an ATE. The IEEE 1532-2000 In-System Configuration (ISC) standard makes use of 1149.1 boundary-scan structures within the CPLD and FPGA devices.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


7/30/2019 ES-Module_8


C

DIC1 IC2 F

A

B

TDO

TDI

E

This circuit has two primary inputs, two primary outputs and two nets that connect the ICs one tothe other. There is only 1 TAP, which connects the TDI and TDO of both ICs. Prepare a test planfor this circuit.

15. Consider a board composed of 100 40-pin Boundary-Scan devices, 2,000 interconnects,an 8-bit Instruction Register per device, a 32-bit Identification Register per device, and a10 MHz test application rate. Compute the test time to execute a test session.

16. What is BSDL. What are the different BSDL files?


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Module



w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


Lesson

42On-line Testing of

Embedded SystemsVersion 2 EE IIT, Kharagpur 2

w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8




• Explain the meaning of the term On-line Testing

• Describe the main issues in on-line testing and identify applications where on-line testingare required for embedded systems

• Distinguish among concurrent and non-concurrent testing and their relations with BIST

and on-line testing

• Describe an application of on-line testing for System-on-Chip

On-line Testing of Embedded Systems

1. Introduction

EMBEDDED SYSTEMS are computers incorporated in consumer products or other devicesto perform application-specific functions. The product user is usually not even aware of theexistence of these systems. From toys to medical devices, from ovens to automobiles, the range

of products incorporating microprocessor-based, software controlled systems has expanded

rapidly since the introduction of the microprocessor in 1971. The lure of embedded systems isclear: They promise previously impossible functions that enhance the performance of people or

machines. As these systems gain sophistication, manufacturers are using them in increasingly

critical applications— products that can result in injury, economic loss, or unacceptable

inconvenience when they do not perform as required.Embedded systems can contain a variety of computing devices, such as microcontrollers,

application-specific integrated circuits, and digital signal processors. A key requirement is that

these computing devices continuously respond to external events in real time. Makers of embedded systems take many measures to ensure safety and reliability throughout the lifetime of

products incorporating the systems. Here, we consider techniques for identifying faults during

normal operation of the product—that is, online-testing techniques. We evaluate them on the basis of error coverage, error latency, space redundancy, and time redundancy.

2. Embedded-system test issues

Cost constraints in consumer products typically translate into stringent constraints on productcomponents. Thus, embedded systems are particularly cost sensitive. In many applications, low

production and maintenance costs are as important as performance.

Moreover, as people become dependent on computer-based systems, their expectations of these systems’ availability increase dramatically. Nevertheless, most people still expectsignificant downtime with computer systems—perhaps a few hours per month. People are much

less patient with computer downtime in other consumer products, since the items in question did

not demonstrate this type of failure before embedded systems were added. Thus, complexconsumer products with high availability requirements must be quickly and easily repaired. For

this reason, automobile manufacturers, among others, are increasingly providing online detection

and diagnosis, capabilities previously found only in very complex and expensive applications


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


such as aerospace systems. Using embedded systems to incorporate functions previously

considered exotic in low-cost, everyday products is a growing trend.Since embedded systems are frequently components of mobile products, they are exposed to

vibration and other environmental stresses that can cause them to fail. Embedded systems in

automotive applications are exposed to extremely harsh environments, even beyond thoseexperienced by most portable devices. These applications are proliferating rapidly, and their

more stringent safety and reliability requirements pose a significant challenge for designers.Critical applications and applications with high availability requirements are the main candidates

for online testing.Embedded systems consist of hardware and software, each usually considered separately in

the design process, despite progress in the field of hardware-software co design. A strong

synergy exists between hardware and software failure mechanisms and diagnosis, as in other aspects of system performance. System failures often involve defects in both hardware and

software. Software does not “break” in the common sense of the term. However, it can perform

inappropriately due to faults in the underlying hardware or specification or design flaws in either hardware or software. At the same time, one can exploit the software to test for and respond to

the presence of faults in the underlying hardware.

Online software testing aims at detecting design faults (bugs) that avoid detection before theembedded system is incorporated and used in a product. Even with extensive testing and formal

verification of the system, some bugs escape detection. Residual bugs in well-tested software

typically behave as intermittent faults, becoming apparent only in rare system states. Online

software testing relies on two basic methods: acceptance testing and diversity [1]. Acceptancetesting checks for the presence or absence of well-defined events or conditions, usually

expressed as true-or-false conditions (predicates), related to the correctness or safety of

preceding computations. Diversity techniques compare replicated computations, either withminor variations in data (data diversity) or with procedures written by separate, unrelated design

teams (design diversity). This chapter focuses on digital hardware testing, including techniques by which hardware tests itself, built-in self-test (BIST). Nevertheless, we must consider the role

of software in detecting, diagnosing, and handling hardware faults. If we can use software to test

hardware, why should we add hardware to test hardware? There are two possible answers. First,it may be cheaper or more practical to use hardware for some tasks and software for others. In an

embedded system, programs are stored online in hardware-implemented memories such as

ROMs (for this reason, embedded software is sometimes called firmware). This program storagespace is a finite resource whose cost is measured in exactly the same way as other hardware. A

function such as a test is “soft” only in the sense that it can easily be modified or omitted in the

final implementation.

The second answer involves the time that elapses between a fault’s occurrence and a problemarising from that fault. For instance, a fault may induce an erroneous system state that can

ultimately lead to an accident. If the elapsed time between the fault’s occurrence and the

corresponding accident is short, the fault must be detected immediately. Acceptance tests candetect many faults and errors in both software and hardware. However, their exact fault coverage

is hard to measure, and even when coverage is complete, acceptance tests may take a long time

to detect some faults. BIST typically targets relatively few hardware faults, but it detects themquickly.

These two issues, cost and latency, are the main parameters in deciding whether to use

hardware or software for testing and which hardware or software technique to use. This decision

requires system-level analysis. We do not consider software methods here. Rather, we emphasizethe appropriate use of widely implemented BIST methods for online hardware testing. These

methods are components in the hardware-software trade-off.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


3. Online testing

Faults are physical or logical defects in the design or implementation of a digital device.

Under certain conditions, they lead to errors—that is, incorrect system states. Errors induce

failures, deviations from appropriate system behavior. If the failure can lead to an accident, it is ahazard. Faults can be classified into three groups: design, fabrication, and operational. Design

faults are made by human designers or CAD software (simulators, translators, or layoutgenerators) during the design process. Fabrication defects result from an imperfect

manufacturing process. For example, shorts and opens are common manufacturing defects inVLSI circuits. Operational faults result from wear or environmental disturbances during normal

system operation. Such disturbances include electromagnetic interference, operator mistakes, and extremes of temperature and vibration. Some design defects and manufacturing faults escape

detection and combine with wear and environmental disturbances to cause problems in the field.

Operational faults are usually classified by their duration:

• Permanent faults remain in existence indefinitely if no corrective action is taken. Many

are residual design or manufacturing faults. The rest usually occur during changes in

system operation such as system start-up or shutdown or as a result of a catastrophicenvironmental disturbance such as a collision.

• Intermittent faults appear, disappear, and reappear repeatedly. They are difficult to predict, but their effects are highly correlated. When intermittent faults are present, the

system works well most of the time but fails under atypical environmental conditions.

• Transient faults appear and disappear quickly and are not correlated with each other.They are most commonly induced by random environmental disturbances.

One generally uses online testing to detect operational faults in computers that support critical or high-availability applications. The goal of online testing is to detect fault effects, or errors, and

take appropriate corrective action. For example, in some critical applications, the system shuts

down after an error is detected. In other applications, error detection triggers a reconfiguration

mechanism that allows the system to continue operating, perhaps with some performancedegradation. Online testing can take the form of external or internal monitoring, using either

hardware or software. Internal monitoring, also called self-testing, takes place on the same

substrate as the circuit under test (CUT). Today, this usually means inside a single IC—a systemon a chip. There are four primary parameters to consider in designing an online-testing scheme:

• error coverage —the fraction of modeled errors detected, usually expressed as a percentage. Critical and highly available systems require very good error coverage to

minimize the probability of system failure.

• error latency —the difference between the first time an error becomes active and the first

time it is detected. Error latency depends on the time taken to perform a test and howoften tests are executed. A related parameter is fault latency, the difference between the

onset of the fault and its detection. Clearly, fault latency is greater than or equal to error

latency, so when error latency is difficult to determine, test designers often consider faultlatency instead.

• space redundancy —the extra hardware or firmware needed for online testing.

• time redundancy —the extra time needed for online testing.

The ideal online-testing scheme would have 100% error coverage, error latency of 1 clock

cycle, no space redundancy, and no time redundancy. It would require no redesign of the CUT

and impose no functional or structural restrictions on it. Most BIST methods meet some of theseconstraints without addressing others. Considering all four parameters in the design of an online-


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


testing scheme may create conflicting goals. High coverage requires high error latency, space

redundancy, and/or time redundancy. Schemes with immediate detection (error latency equaling1) minimize time redundancy but require more hardware. On the other hand, schemes with

delayed detection (error latency greater than 1) reduce time and space redundancy at the expense

of increased error latency. Several proposed delayed-detection techniques assumeequiprobability of input combinations and try to establish a probabilistic bound on error latency

[2]. As a result, certain faults remain undetected for a long time because tests for them rarelyappear at the CUT’s inputs.

To cover all the operational fault types described earlier, test engineers use two differentmodes of online testing: concurrent and non-concurrent. Concurrent testing takes place during

normal system operation, and non-concurrent testing takes place while normal operation is

temporarily suspended. One must often overlap these test modes to provide a comprehensiveonline-testing strategy at acceptable cost.

4. Non-concurrent testing

This form of testing is either event-triggered (sporadic) or time-triggered (periodic) and is

characterized by low space and time redundancy. Event triggered testing is initiated by keyevents or state changes such as start-up or shutdown, and its goal is to detect permanent faults.

Detecting and repairing permanent faults as soon as possible is usually advisable. Event-triggered tests resemble manufacturing tests. Any such test can be applied online, as long as the

required testing resources are available. Typically, the hardware is partitioned into components,

each exercised by specific tests. RAMs, for instance, are tested with manufacturing tests such as

March tests [3].Time-triggered testing occurs at predetermined times in the operation of the system. It detects

permanent faults, often using the same types of tests applied by event-triggered testing. The

periodic approach is especially useful in systems that run for extended periods during which nosignificant events occur to trigger testing. Periodic testing is also essential for detecting

intermittent faults. Such faults typically behave as permanent faults for short periods. Since theyusually represent conditions that must be corrected, diagnostic resolution is important. Periodictesting can identify latent design or manufacturing flaws that appear only under certain

environmental conditions. Time-triggered tests are frequently partitioned and interleaved so that

only part of the test is applied during each test period.

5. Concurrent testing

Non-concurrent testing cannot detect transient or intermittent faults whose effects disappear quickly. Concurrent testing, on the other hand, continuously checks for errors due to such faults.

However, concurrent testing is not particularly useful for diagnosing the source of errors, so test

designers often combine it with diagnostic software. They may also combine concurrent and non-concurrent testing to detect or diagnose complex faults of all types.

A common method of providing hardware support for concurrent testing, especially for

detecting control errors, is a watchdog timer [4]. This is a counter that the system resetsrepeatedly to indicate that the system is functioning properly. The watchdog concept assumes

that the system is fault-free—or at least alive—if it can reset the timer at appropriate intervals.The ability to perform this simple task implies that control flow is correctly traversing timer-reset

points. One can monitor system sequencing very precisely by guarding the watchdog- reset

operations with software-based acceptance tests that check signatures computed while control


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


flow traverses various checkpoints. To implement this last approach in hardware, one can

construct more complex hardware watchdogs.A key element of concurrent testing for data errors is redundancy. For example, the

duplication-with-comparison (DWC) technique5 detects any single error at the expense of 100%

space redundancy. This technique requires two copies of the CUT, which operate in tandem withidentical inputs. Any discrepancy in their outputs indicates an error. In many applications,

DWC’s high hardware overhead is unacceptable. Moreover, it is difficult to prevent minor timing variations between duplicated modules from invalidating comparison.

A possible lower-cost alternative is time redundancy. A technique called double execution, or retry, executes critical operations more than once at diverse time points and compares their

results. Transient faults are likely to affect only one instance of the operation and thus can be

detected. Another technique, re-computing with shifted operands (RESO) [5] achieves almost thesame error coverage as DWC with 100% time redundancy but very little space redundancy.

However, no one has demonstrated the practicality of double execution and RESO for online

testing of general logic circuits.A third, widely used form of redundancy is information redundancy—the addition of

redundant coded information such as a parity-check bit[5]. Such codes are particularly effective

for detecting memory and data transmission errors, since memories and networks are susceptibleto transient errors. Coding methods can also detect errors in data computed during critical

operations.

6. Built-in self-test

For critical or highly available systems, a comprehensive online-testing approach that coversall expected permanent, intermittent, and transient faults is essential. In recent years, BIST has

emerged as an important method of testing manufacturing faults, and researchers increasingly

promote it for online testing as well.BIST is a design-for-testability technique that places test functions physically on chip with

the CUT, as illustrated in Figure 42.1. In normal operating mode, the CUT receives its inputsfrom other modules and performs the function for which it was de-signed. In test mode, a test pattern generator circuit applies a sequence of test patterns to the CUT, and a response monitor

evaluates the test responses. In the most common type of BIST, the response monitor compacts

the test responses to form fault signatures. It compares the fault signatures with reference

signatures generated or stored on chip, and an error signal indicates any discrepancies detected.We assume this type of BIST in the following discussion.

In developing a BIST methodology for embedded systems, we must consider four primary

parameters related to those listed earlier for online-testing techniques:

• fault coverage —the fraction of faults of interest that the test patterns produced by the testgenerator can expose and the response monitor can detect. Most monitors produce a fault-

free signature for some faulty response sequences, an undesirable property called aliasing.

• test set size —the number of test patterns produced by the test generator. Test set size is

closely linked to fault coverage; generally, large test sets imply high fault coverage.

However, for online testing, test set size must be small to reduce fault and error latency.

• hardware overhead —the extra hardware needed for BIST. In most embedded systems,high hardware overhead is not acceptable.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


• performance penalty —the impact of BIST hardware on normal circuit performance, suchas worst-case (critical) path delays. Overhead of this type is sometimes more important

than hardware overhead.

System designers can use BIST for non-concurrent, online testing of a system’s logic and

memory[6]. They can readily configure the BIST hardware for event-triggered testing, tying the

BIST control to the system reset so that testing occurs during system start-up or shutdown. BIST

can also be designed for periodic testing with low fault latency. This requires incorporating a test process that guarantees the detection of all target faults within a fixed time.

Designers usually implement online BIST with the goals of complete fault coverage and low

fault latency. Hence, they generally design the test generator and the response monitor toguarantee coverage of specific fault models, minimum hardware overhead, and reasonable test

set size. Different parts of the system meet these goals by different techniques.

Test generator and response monitor implementations often consist of simple, counter likecircuits; especially linear- feedback shift registers [5]. An LFSR is formed from standard flip-

flops, with outputs of selected flip-flops being fed back (modulo 2) to its inputs. When used as a

test generator, an LFSR is set to cycle rapidly through a large number of its states. These states,whose choice and order depend on the LFSR’s design parameters, define the test patterns. In this

mode of operation, an LFSR is a source of pseudorandom tests that are, in principle, applicableto any fault and circuit types. An LFSR can also serve as a response monitor by counting (in a

special sense) the responses produced by the tests. After receiving a sequence of test responses,an LFSR response monitor forms a fault signature, which it compares to a known or generated

good signature to determine whether a fault is present.

Ensuring that fault coverage is sufficiently high and the number of tests is sufficiently loware the main problems with random BIST methods. Researchers have proposed two general

approaches to preserve the cost advantages of LFSRs while greatly shortening the generated test

sequence. One approach is to insert test points in the CUT to improve controllability and observability. However, this approach can result in performance loss. Alternatively, one can

introduce some determinism into the generated test sequence—for example, by inserting specific

“seed tests” known to detect hard faults.Some CUTs, including data path circuits, contain hard-to detect faults that are detectable by

only a few test patterns, denoted T hard . An N -bit LSFR can generate a sequence that eventually

includes 2 N

- 1 patterns (essentially all possibilities). However, the probability that the tests in

T hard will appear early in the sequence is low. In such cases, one can use deterministic testing,which tailors the generated test sequence to the CUT’s functional properties, instead of random

testing. Deterministic testing is especially suited to RAMs, ROMs, and other highly regular

components. A deterministic technique called transparent BIST [3] applies BIST to RAMs while preserving the RAM contents—a particularly desirable feature for online testing. Keeping

hardware overhead acceptably low is the main difficulty with deterministic BIST.

A straightforward way to generate a specific test set is to store it in a ROM and address each

stored test pattern with a counter. Unfortunately, ROMs tend to be much too expensive for storing entire test sequences. An alternative method is to synthesize a finite-state machine that

directly generates the test set. However, the relatively large test set size and test vector width, as

well as the test set’s irregular structure, are much more than current FSM synthesis programs canhandle.

Another group of test generator design methods, loosely called deterministic, attempt toembed a complete test set in a specific generated sequence. Again the generated tests must meet

the coverage, overhead, and test size constraints we’ve discussed. An earlier article [7] presents a

representative BIST design method for data path circuits that meets these requirements. The test


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


generator’s structure, based on a twisted-ring counter, is tailored to produce a regular,

deterministic test sequence of reasonable size. One can systematically rescale the test generator as the size of anon-bit-sliced data path CUT, such as a carry-look-ahead adder, changes. Instead

of using an LFSR, a straightforward way to compress test response data and produce a fault

signature is to use an FSM or an accumulator. However, FSM hardware overhead and accumulator aliasing are difficult parameters to control. Keeping hardware overhead acceptably

low and reducing aliasing are the main difficulties in response monitor design.

Circuit under test

(CUT) ResponsemonitorTest

generator

Multi-plexerTest

patternsequence

InputsOutputs

Error

Control

Fig. 42.1 A General BIST Scheme

An Example

IEEE 1149.4 based Architecture for OLT of a Mixed Signal SoC

Analog/mixed signal blocks like DCDC converters, PLLs, ADCs, etc. and digital modules

like application specific processors, micro controllers, UATRs, bus controllers etc. typically exist

in SoCs. The have been used as cores of the SoC benchmark “Controller for Electro-HydraulicActuators” which is being used as the case study. It is to be noted that this case study is used

only for illustration and the architecture is generic which applies for all Mixed Signal SoCs.All the digital blocks like instruction specific processor, microcontroller, bus controller etc.

have been designed with OLT capability using the CAD tool descried in [8]. Further, all these

digital cores are IEEE 1149.1 compliant. In other words, all the digital cores are designed with a

blanket comprising an on-line monitor and IEEE 1149.1 compliance circuitry. For the analogmodules the observer have been designed using ADCs and digital logic [9]. The test blanket for

the analog/mixed signal cores comprises IEEE 1149.4 circuitry. A dedicated test controller is

designed and placed on-chip that schedules the various lines tests during the operation of theSoC. The block diagram of the SoC being used as the case study is illustrated in Figure 42.2.

The basic functionality of the SoC under consideration is discussed below.

Electronic Controller Electro Hydraulic system

Actuator systems are vital in the flight control system, providing the motive force necessaryto move the flight control surfaces. Hydraulic actuators are very common in space vehicle and

flight control systems, where force/ weight consideration is very much important. This system

positions the control surface of aircraft meeting performance requirement which acting againstexternal loads. The actuator commands are processed in four identical analog servo loops, which

command the four coils of force motor driving the hydraulic servo valve used to control the


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


motion of the dual tandem hydraulic jack. The motion of the spool of the hydraulic servo valve

(Master control Valve), regulates the flow of oil to the tandem jacks, thereby determine the ram position. The Spool and ram positions are controlled by means of feedback loops. The actuator

system is controlled by the on-board flight electronics. A lot of work has been done for On-line

fault detection and diagnosis of the mechanical system, however OLT of the electronic systemswere hardly looked into. It is to be noted that as Electro Hydraulic Actuators are mainly used in

mission critical systems like avionics; for reliable operation on-line fault detection and diagnosisis required for both the mechanical and the electronic sub-systems.

The IEEE 1149.1 and 1149.4 circuitry are utilized to perform the BIST of the interconnecting buses in between the cores. It may be noted that on-line tests are carried only for cores, which are

more susceptible to failures. However, the interconnecting buses are tested during startup and at

intervals when cores being connected by them are ideal. The test scheduling logic can bedesigned as suggested in [10].

The following three classes of tests are carried in the SoC:

1. Interconnect test of the interconnecting buses (BIST)

Interconnect testing is to detect open circuits in the interconnect betweens the cores, and to detectand diagnose bridging faults anywhere in the Interconnect --regardless of whether they are

normally carry digital or analog signals. This test is performed by EXTEST instruction and digital test patterns are generated from the pre-programmed test controller.

2. Parametric test of the interconnecting buses (BIST)

Parametric test: Parametric test permits analog measurements using analog stimulus and

responses. This test is also performed by EXTEST instruction. For this only three values of

analog voltages viz., VH=VDD, VLow=VDD/3, VG= VSS are given as test inputs by the controller and the voltages at the output of the line under test is sampled after one bit coarse digitization as

mentioned in the IEEE 1149.4 standard

3. Internal test of the cores (Concurrent tests)

This test is performed by INTEST instruction and this enables the on-line monitors placed on

each of the cores present in the SoC. This test can be enabled concurrently with the SoC

operation and need not be synchronized to start up of the normal operation of the SoC. The

asynchronous startup/shutdown of the on-line testers facilitates power saving and higher reliability of the test circuitry if compared to the functional circuit.

7. References1) M.R. Lyu, ed., Software Fault Tolerance, John Wiley & Sons, New York, 1995.

2) K.K. Saluja, R. Sharma, and C.R. Kime, “A Concurrent Testing Technique for DigitalCircuits,” IEEE Trans. Computer-Aided Design, Vol. 7, No. 12, Dec. 1988, pp. 1250-

1259.

3) M. Nicolaidis, “Theory of Transparent BIST for RAMs,” IEEE Trans. Computers, Vol.

45, No. 10, Oct. 1996, pp. 1141-1156.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


4) A. Mahmood and E. McCluskey, “Concurrent Error Detection Using Watchdog

Processors—A Survey,” IEEE Trans. Computers, Vol. 37, No. 2, Feb. 1988, pp. 160-174.5) B.W. Johnson, Design and Analysis of Fault Tolerant Digital Systems, Addison-Wesley,

Reading, Mass., 1989.

6) B.T. Murray and J.P. Hayes, “Testing ICs: Getting to the Core of the Problem,”Computer , Vol. 29, No. 11, Nov. 1996, pp. 32-45.

7) H. Al-Asaad, J.P. Hayes, and B.T. Murray, “Scalable Test Generators for High-Speed Data Path Circuits,” J. Electronic Testing: Theory and Applications, Vol. 12, No. 1/2,

Feb./Apr. 1998, pp. 111-125 (reprinted in On-Line Testing for VLSI , M. Nicolaidis, Y.Zorian, and D.K. Pradhan, eds., Kluwer, Boston, 1998).

8) “A Formal Approach to On-Line Monitoring of Digital VLSI Circuits: Theory, Design

and Implementation”, Biswas, S Mukhopadhyay, A Patra, Journal of Electronic Testing:Theory and Applications, Vol. 20, October 2005, pp-503-537.

9) S. Biswas, B Chatterjee, S Mukhopadhyay, A Patra, “A Novel Method for On-Line

Testing of Mixed Signal “System On a Chip”: A Case study of Base Band Controller,29th National System Conference, IIT Mumbai, INDIA 2005, pp 2.1-2.23.

10) “An Optimal Test Sequence for the JTAG/IEEE P1149.1 Test Access Port Controller”,

A.T. Dahbura, M.U. Uyar, Chi. W. Yau, International Test Conference, USA, 1998, pp55-62.


w.jntuworld.com

www.jntuworld.com

www.jw

7/30/2019 ES-Module_8


XTALTiming

ClockDivider

ApplicationProcessor

SpecificDATARAM16kB

System Bus Interface

SystemBUS

ADCDAC

Electro HydraulicActuator System

(Simulation inLab-View in a PC)On Chip Test Controller

(JTAG Interface)

TDITMSTCKTDO

VH

VL

VG

AB1 AB2

DC/DCConverterBattery &ChargerPower supply to the cores

Data and Control paths

IEEE 1149.4/1149.1 Boundary Scan Bus

w.jntuworld.com www.jw

Documents

ES-Module_8