[IEEE 2011 IEEE 20th Asian Test Symposium (ATS) - New Delhi, India (2011.11.20-2011.11.23)] 2011 Asian Test Symposium - Power Aware Shift and Capture ATPG Methodology for Low Power

Power Aware Shift and Capture ATPG methodology for Low Power Designs

Shray Khullar, Swapnil Bahl

Technology Research and Development STMicroelectronics Noida, India

[email protected] , [email protected]

Abstract - Power management has emerged as a major design objective, both in functional and test mode, in most of the application domains that employ digital ICs. This paper presents a low power ATPG methodology for managing power both in shift and capture mode. The technique exploits the embedded clock gates and provides a good tradeoff between pattern count and reduction in switching activity without any significant coverage loss. The methodology also presents a novel method of selective scan chain reordering for scan compressed designs to reduce shift switching activity with minimal design flow constraints.

Keywords – capture power, shift power, scan reordering, low power

I. INTRODUCTION Power consumption is increasing with SOC complexity

and rising design performance. Today it is an issue in both functional and manufacturing test domains. In many designs, power consumption during test can be significantly higher than during normal functional mode [1,2]. The reason is that test patterns cause as many nodes switching as possible while a typical functional mode only activates a few modules at the same time. Another reason is that successive functional input vectors applied to a given circuit during system mode have a significant correlation, while the correlation between consecutive test patterns can be very low [3].

In general, the test power problems can be classified into two categories: average and peak power issue. Increased average power causes an increase in the temperature of the circuit-under-test (CUT). This can cause hot spots during the duration of a test session, which may lead to the permanent damage of the CUT. Other thermal effects, e.g., hot-carrier-induced defects, electro migration, or dielectric breakdown, are accelerated gradually and may affect performance. Peak power is the highest value of power at any given instant and can cause false failures during test if it surpasses the thermal and electrical limits of the chip. All of the above cause circuit reliability concerns or parts being damaged during test or in preventing good parts from passing test, leading to yield loss [4].

. In integrated circuits (ICs), there are different design strategies to reduce power consumption during functional mode. Clock gating, Multi-supply voltage (MSV) and Power shut-off (PSO) techniques are now widely used for power management. To reduce power in test mode, recently, a number of viable solutions have been proposed to cope with the power problems during test. Test scheduling algorithms described in try to determine the blocks of a complex design to be activated in parallel at each stage of the test session in

order to reduce the number of concurrently tested modules [5,6]. The average test power is reduced but they increase the test time. Some recent papers have considered the use of clock gating for capture power reduction [7,8].

To reduce power in shift mode, various X-fill techniques have been proposed in the literature [9-10]. Scan chain and pattern re-ordering techniques to reduce the switching activity are presented in [11,12]. These techniques modify either the order in which test patterns of a given test sequence are applied to the CUT or the order in which the scan flops are chained to form the scan chain.

In this paper we propose complete low power test vector generation methodology. The rest of the paper is organized as follows. Section II describes the low power ATPG flow. In section III, capture power reduction technique based on embedded clock gates is discussed. Section IV describes the MT-Fill technique and its limitations in scan compressed designs. In section V, a novel method of restructuring scan chains is presented to reduce shift switching activity by reordering minimal scan cells. Results are presented in Section VI, and finally, section VII draws the conclusion.

II. LOW POWER ATPG FLOW The preferred strategy to test the chip is to keep the

complete chip in full power ON mode. This helps in optimizing the overall test time and pattern count. Additionally, we limit the switching activity of the design during pattern generation process such that test power consumption is within the chip power specification. This strategy is applicable to most of the designs but in few design this strategy may not work. For such power critical designs, we selectively switch ON few power domains at a given time and test the complete chip in a sequential manner.

The power reduction in shift mode is based on utilizing the MT-fill technique along with limiting the care-bits in a given pattern. We will discuss this later in section V. Capture reduction is based on limiting the number of flops clocked by controlling the embedded clock gates. As one technique affects the other, simultaneously enabling both the reduction techniques may affect the overall effectiveness. When both techniques are enabled, we give priority to capture mode power reduction and limit the maximum care bit to 25% to shift mode. The overall flow is described in figure 1. The flow can be seen in three parts – (A) Power budget calculation, (B) Low Power ATPG pattern generation and (C) Power validation of ATPG patterns.

2011 Asian Test Symposium

1081-7735/11 $26.00 © 2011 IEEE

DOI 10.1109/ATS.2011.65

500

A. Power budget calculation Power consumption mainly depends upon frequency,

voltage supply and switching activity. Since, capture frequency and voltage cannot be reduced, we rely on reducing the switching activity i.e. power budget to reduce the power in test mode. In this paper, power budget is defined as the maximum number of registers allowed to toggle in any given clock cycle. To calculate the power budget for capture mode, a vectorless analysis is performed in Power Analysis tool. The design is read in the power analysis tool and capture mode constraints are applied. Power consumption is calculated for various range of switching activity applied on the all design nets. The values range from 5% to 100% with an interval of 5. The power values obtained tend to follow a linear relationship with switching activity. By plotting a regression line, the relationship between the power consumption and switching activity is translated into an equation of the form –

� � � � �� (1)

Where, P = power consumption m = slope of the line Sact = switching activity C = constant Given the chip’s functional power specification, we can

calculate the power budget using (1). A 15% margin is also introduced to cater for the deviation between vectorless and vector based power calculation.

Figure 1. Low Capture Power ATPG Flow

B. Low Power ATPG pattern generation The calculated power budget is given to ATPG engine to

generate low power patterns. ATPG uses this value to limit the switching activity for any given capture clock cycle for each pattern. It filters out any pattern which exceeds the power budget. ATPG tool also reports the switching activity obtained for capture mode in each pattern. To generate the optimal low power ATPG patterns, it is very important to choose the correct power budget. An aggressive budget may impact the coverage and pattern count while over budgeting may result into false silicon failures due to excessive power consumption.

C. Power validation of ATPG patterns To validate the absolute power consumption of ATPG

vectors, the patterns are simulated using the SDF (static delay format) file and then power analysis tool is used to calculate the absolute power. Since this step is very time consuming, it becomes impractical to perform this on the complete ATPG pattern set. Depending upon the design size, few tens of ATPG patterns with highest switching activities are selected and undergo the pattern validation process. Generally, we observe a very good correlation between the vectorless power and the absolute power. An example of power correlation is shown in figure 2.

Figure 2. Vectorless power correlation with absolute power

III. CAPTURE POWER REDUTION The current industrial Low Power ATPG tools utilize the

clock gating cells to limit the number of flops allowed to toggle during capture mode [8].

Normally, clock gating cells are inserted during the synthesis step to reduce the power during functional mode. By inserting clock gating circuitry in the chip’s design, the original clock is subdivided into many gated clocks. Also, the levels of clock gating present for any flop can be more than one. In figure 3, we show an example design with clock CLK driving 100 flip-flops. When clock gate insertion is performed, let us assume 3 clock gating cells are added each driving 25 flops each. The number of flops that be driven by single clock gate depends upon the design’s functionality. From figure 3, it can be seen that in order to apply a clock to a flip-flop driven directly by CLK, CG1, CG2 and CG3 can all be turned off, such that only the 25 flip-flops driven

Power budget calculation

Low power ATPGpattern generation

Power validation of ATPG patterns

501

directly by CLK receive a clock pulse. Similarly, in order to apply a clock pulse to a flip-flop driven by CG3, CG1 and CG2 can be turned off, such that only the 25 flip-flops driven by CG3 and the 25 flip-flops driven directly by the clock receive a clock pulse.

This way low power ATPG tool enables only selected number of clock gates which are required to test faults targeted in a single pattern and disables clock gates that do not contribute to fault testing. In every pattern generation process, the overall switching activity per clock cycle is maintained in strict accordance to the power budget specified. The power budget limitation directly impacts the pattern count and test coverage. A very aggressive power budget may result into coverage drop as the clock gating cells required to be enabled for fault detection may toggle flops higher in number than allowed by the power budget. This also depends on the clock gating granularity in the design. Pattern count may also increase as fewer faults could now be targeted in a given pattern due to restricted togging of flops.

Figure 3. An example design with clock gaters

IV. SHIFT POWER REDUTION TECHNIQUE: MT-FILL One of the most efficient techniques to minimize shift

power during test is Minimum transition-fill or MT-fill [14]. This is purely a software method applied during pattern generation. In this technique, all the scan flops that are unspecified in a vector are assigned bit values in such a way to produce minimum transitions during the shift mode. The unspecified bits are referred as don’t care bits. Similarly, bits that are specified during vector generation are termed as care bits. By filling in the last care bit value in subsequent don’t care positions, we can achieve reduced shift transitions or switching activity. Thus, we can control both the average and peak shift power. As one can interpret, greater the percentage of don’t care bits in a vector, greater is its effectiveness. Fortunately, for most designs MT-fill has been proved very effective. But with the advent of scan compressed designs it has shown serious limitations.

A. Ineffectivenss of MT-FILL in Compression Architecture Scan compression architectures reduces the effective

scan chain length for every pattern and provide the necessary compression in both test data volume and test application time. Let us consider an example of a combinational scan compression architecture shown in figure 4. Using a decompressor block, the architecture is able to feed greater

number of internal scan chains while utilizing few external test inputs. The compressor block at the unload end compacts the unload test data from the greater number of internal scan chains to few external test output pins [13].

Figure 4. Combinational scan compression architecture

Since, each input data is broadcasted to multiple chains, this result in multifold increase in care-bits being pushed in scan chains, leaving very less possibility for techniques like MT-fill to prove effective. Figure 5 & 6 shows a comparison of care-bits profiles between normal scan and scan compression architecture for a 30K scan cell design. It is evident that the average number of care bits in normal scan is less than 1%. The remaining 99% bits can take any value thus increasing the degrees of freedom to make MT-fill successful. However, for the scan compressed design, the assigned bits per vector can be as high as 90% by virtue of scan decompression logic rendering MT-Fill ineffective until specific measures are taken.

Figure 5. Care-bits distribution for normal/internal scan design

Figure 6. Care-bits distribution for scan compressed design

In the next section, we propose a scan chain restructuring technique to enhance the effectiveness of MT-fill.

Patterns

Car

e bi

ts (%

)

Patterns

Car

e bi

ts (%

)

Compressor

Scan Output Channels

Internal Scan Chains

Scan Input Channels

De-Compressor

E1

E2

E3 CG3

CG2

CG1

502

V. SHIFT SWITCHING REDUCTION BY RESTUCTURING SCANC CHAINS

Before discussion the restructuring technique, let us lay down a few terminologies in context of shift switching activity. Power consumed in shift mode directly depends on the number of toggling happening in the scan chain. We can estimate the number of scan cell transitions in a given cycle as a product of total number of cells and their transition probability. This is expressed in (2).

� �� (2)

Where S is the total number of scan cells in the design and �� is the average transition probability of cells. Further, let us assume that the static probabilities for a jth scan cell being loaded with logic ‘1’ and logic ‘0’ are Pj,1 and Pj,0 respectively in a given vector. For example, if Pj,1=0.99 and total vectors are 1000, then the jth scan cell will be loaded with ‘1’ for 990 vectors and loaded with ‘0’ for remaining 10 vectors only. The transition probability �� can be defined as probability of jth cell in scan chain making a transition from logic ‘0’ to logic ‘1’ as

�� Similarly,

��

Combining (3) and (4) we get, �� (5)

Thus the power can be rewritten in the form

�� ! �"# $ %�� &'� (6)

In (6) it is shown that the static probability of cells and

the order in which they appear in a scan chain governs the shift mode power consumption. For example, consider five scan cells as shown in figure 7 and let us assume vector to be loaded is XX101. Value of ‘X’ signifies don’t care bit and can assume any value according to MT-Fill. The scan chain marked as unordered i.e. connected as shown in the figure 7, will at least observe two transitions with ‘X’s filled as ‘1’. However, if the scan chain is reordered and cells S3 and S4 are swapped, the scan chain will observe only one transition with ‘X’s filled as ‘0’. Also, if for all the scan vectors the static probabilities for S3, S5 to be ‘1’ and for S4 to be ‘0’ are very, then it is highly likely for the ordered scan chain to produce minimal transitions.

Figure 7. Impact of static probability of scan cells

The proposed scan chain restructuring technique is based on the simple principle of stitching scan cells with identical and high static probabilities in the same chain. Higher the

probabilities of cells in a chain to attain a value of say logic ‘0’, greater will be the shift switching activity reduction.

We now explain the principle behind the selection of scan cells having higher Pj,1 and Pj,0. As discussed in section IV, when low capture power ATPG is enabled, all the flops controlling the enable pin of clock gaters need to be deterministically specified, i.e., they either enable or disable the clock while respecting the power budget. A clock gate is enabled only in few vectors when faults belonging to its logic are targeted and is disabled for rest of the vectors. Hence, the scan cells driving the enable pins of CGC cells either have high Pj,1 or Pj,0. Figure 8 shows an example of clock gating structure in a design. The scan flops S1 and S2 drive the enable of clock gate CG during capture mode when SE is low. For all the vectors, except when faults occurring in the logic cone of gClk are targeted, S1 and S2 are loaded with ‘1’ and ‘0’ respectively, to disable the clock gating cell CG. S1 has high Pj,1 and S2 has high Pj,0.

Figure 8. Control of CGC enable pins

In the proposed technique, selected scan cells with high Pj,1 are stitched in a single chain. Let us call these cells as static-1 cells and similarly cells with high Pj,0 are stitched together to form a chain of static-0 cells.

Figure 9 describes the scan re-ordering flow. The flow can be segregated in 3 parts – (A) Identification of static-1 and static-0 cells, (B) Chain stitching in scan compressed designs, and (C) generation of low power vectors.

A. Identification of static-1 and static-0 scan cells The identification of static-1 and static-0 cells is a two

step process. Firstly, all the clock gates present in the path of test clocks of the design are located. A custom script forces a value of ‘0’ on the enable pin for all the clock gates in the design and justify it by assigning values to the driving flops. This process not only identifies the driving scan cells of each clock gate’s enable pin but also their polarity to disable the clock gates. On the basis of their polarity, the cells are segregated into static-1 and static-0 cells.

B. Chain stitching in scan compression designs We then stitch static-1 cells and static-0 cells in separate

scan segments. The switching activities for these segments are expected to be very low. The remaining scan cells which are neither static-1 nor static-0 are termed as random cells and are stitched in a non-specific scan order. It has been observed for many designs that approximately 5% of total

D Q

S1

D Q

S2

SE

D Q

CG

EN

Clk

D Q

S2

D Q

S4

D Q

S3 gClk

To other logic

To other logic

S1 S2 S3 S4 S5 Scan In Scan Out

X X 1 0 1 X X 0 1 1

Unordered Ordered

503

flops make up the static-1 and static-0 cells. Next, the required de-compressor and compressor structures are connected to the interface of the scan segments. Figure 10 shows an example of scan chain order with scan compression.

In scan compressed designs, it is important to carefully connect the chains with static-0 and static-1 cells to the de-compressor structure, as they might significantly reduce the degrees of freedom for the care bits in a vector. This is because, owing to the structure of the de-compressor block, some scan segments tend to have dependency between them while some are independent [13]. Since, static-1 and static-0 chains makes up a major portion of the total care bits in a vector, they should be kept in segments that are maximally independent of each other for all modes of scan compression. This way ATPG is able to generated most efficient test vectors.

Figure 9. Flow for low power test

Figure 10. Sample scan reorder in Scan compression design

C. Generation of low power test vectors Apart from constraining the ATPG to respect power

budget during capture and using MT-Fill for don’t care bits, another restriction is imposed during pattern generation. As one can see from the care bit profile shown in figure 5 & 6, the care bit profiles are highly non-uniform. This makes the MT-Fill technique to be less effective for initial vectors and more effective later on. We constraint the tool to keep the number of care bits within a specified care bit limit. This makes the care bit profile very uniform. Figure 11 shows the care bit profile for a 30K scan cell design with care bit limit imposed of 25%. From the figure 11, it is clear that don’t care bits for most of the vectors are approx. 75% and since MT-Fill relies completely upon number of don’t care bits in a vector; the effectiveness of shift switching reduction is uniform too.

Figure 11. Care bits assigned per pattern

VI. RESULTS The proposed scan restructuring technique is based on

the principal that scan cells driving clock gate’s enable have high static probabilities of either being a ‘1’ or ‘0’. To confirm this theory, low power ATPG vectors were generated with 10% capture budget for a scan reordered design using the proposed technique. Figure 12 shows the percentage times a static-1 cell is loaded with a ‘1’ and a static-0 cell is loaded with a ‘0’ for all the patterns. As shown, a good correlation of 90.2% was observed.

Figure 12. Correlation of static-0 and static-1s in generated patterns

Table I shows the quality of results measured for 2 industrial designs. Chip A contains 15K scan flops with 10X scan compression and chip B contains 47K scan flops with 20X scan compression. Result are obtained for 3 scenarios; standard ATPG with no capture or shift power reduction method, standard ATPG with MT-fill to reduce shift power and finally the proposed test generation methodology. As shown in table, with the proposed technique, the average shift switching is reduced further by 52.46% for design A

Reordered scan cells

% C

orre

latio

n C

are

bits

(%)

Patterns

Compressor

Decompressor

1

1

1

1

1

0

0

0

0

0

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

Reordered chain with

static-1cells

Reordered chain with

static-0 cells

Unordered chains

Synthesized Design

Identify static-1 and static-0 cells

Scan chain ordering and scan insertion

Connect scan segments to de-compressor

and compressor blocks

ATPG constraints

Low Power Patterns

ATPG

• Capture Budget • MT-Fill • Care Bit Limit

Identification of static-1 and static-0 cells

Chain stitching in scan compression designs

Generation of low power test vectors

504

and 64.62% for design B over the existing MT-Fill technique without scan restructuring. The peak shift switching also reduces by 20.79% for design A and by 25.22% for design B. The test coverage impact is negligible and up to 19.2% pattern inflation is seen owing to restricted power budget.

TABLE I. QOR FOR STANDARD AND PROPOSED TECHNIQUES

Chip

Quality Parameter

Standard ATPG

Standard ATPG with

MT-FILL

ProposedFlow

A

15K Scan Cells

Test Coverage 86.86% 86.86% 85.65%Pattern Count 957 971 1123Avg Capture SA 33.1% 32.9% 7.7%Peak Capture SA 35.7% 35.4% 11.0%Avg Shift SA 48.2 % 34.5 % 16.4%Peak Shift SA 55.4 % 45.2% 35.8%Pattern Inflation - 1.5% 17.3%

B

47K Scan Cells

Test Coverage 95.83% 95.82% 96.29%Pattern Count 37689 38016 44931Avg Capture SA 29.6% 28.9% 11.4Peak Capture SA 38.0% 37.7% 15.0%Avg Shift SA 45.7% 34.2% 12.1%Peak Shift SA 52.3% 44.8% 33.5%Pattern Inflation - 0.9% 19.2%

An analysis to measure the absolute shift and capture

power reduction with the proposed technique was performed and the results are tabulated in table II. Chip A and B are in 32nm with 100 MHz of shift operation. The capture frequencies are 400 Mhz and 333 MHz respectively for Chip A and B. Table II clearly shows that with no capture power reduction technique, vectors consume power 2.3 to 3.3 times the functional power. For Shift mode, we obtain a dynamic power reduction of up to 55.5% over standard MT-Fill. Even though the absolute shift power for vectors generated with Standard MT-fill technique are not alarming but they may be at risk for peak power issue owing to high switching activities.

Since, the proposed technique reorders less than 5% scan cells, the increase in routing congestion is expected to be within acceptable limits. To see the impact on routing congestion, Place and Route flow using the industrial tools was performed on the two chips, A & B with standard and scan re-ordered netlists. An increase of 5.44% and 7.62% total wire length is observed in both the design respectively.

VII. CONCLUSION This paper describes the overall low power methodology

adopted at ST. It also proposes a new method of shift power reduction by scan re-ordering less than 5% of the flops. Combining the MT-fill technique and scan reordering the flops driving the clock gating cells with a high static probability of ‘1’ or ‘0’, high reduction in switching activity in shift mode is obtained for scan compressed designs. The increase in routing congestion is observed to be insignificant to impact the overall route-ability of the design. Results obtained on industrial designs shows an average reduction of 58.44% in average switching activity during shift with an average routing overhead of 6.53%.

TABLE II. COMPARISON OF ABSOLUTE POWER CONSUMPTION

Chip Quality Parameter Standard

ATPG Standard

ATPG with MT-FILL

ProposedFlow

A

Func. power 120mW - - -Dyn. Capture Power

mW 394 374 112Impact 3.3X 3.1X 0.93X

Dyn. Shift Power

mW 163 131 75Reduction - 19.6% 42.7%

B

Func. Power 430mW - - -Dyn. Capture Power

mW 1033 1006 418Impact 2.4X 2.3X 0.97X

Dyn. Shift Power

mW 457 371 165 Reduction - 18.8% 55.5%

REFERENCES [1] A. Crouch, “Design-for-Test for Digital IC's and Embedded Core

Systems”, Prentice Hall, 1999. [2] M.L. Bushnell and V.D. Agrawal, “Essentials of Electronic Testing”,

Kluwer Academic Publishers, Boston, 2000. [3] Seongmoon Wang; Gupta, S.K.; , "DS-LFSR: a new BIST TPG for

low heat dissipation," Test Conference, 1997. Proceedings., International , vol., no., pp.848-857, 1-6 Nov 1997

[4] Ajami, A.H.; Banerjee, K.; Mehrotra, A.; Pedram, M.; , "Analysis of IR-drop scaling with implications for deep submicron P/G network designs," Quality Electronic Design, 2003. Proceedings. Fourth International Symposium on , vol., no., pp. 35- 40, 24-26 March 2003

[5] Zorian, Y.; , "A distributed BIST control scheme for complex VLSI devices," VLSI Test Symposium, 1993. Digest of Papers., Eleventh Annual 1993 IEEE , vol., no., pp.4-9, 6-8 Apr 1993

[6] Chou, R.M.; Saluja, K.K.; Agrawal, V.D.; , "Power constraint scheduling of tests," VLSI Design, 1994., Proceedings of the Seventh International Conference on , vol., no., pp.271-274, 5-8 Jan 1994

[7] Bonhomme, Y.; Girard, P.; Guiller, L.; Landrault, C.; Pravossoudovitch, S.; , "A gated clock scheme for low power scan testing of logic ICs or embedded cores," Test Symposium, 2001. Proceedings. 10th Asian , vol., no., pp.253-258, 2001

[8] Chakravadhanula, K.; Chickermane, V.; Keller, B.; Gallagher, P.; Narang, P.; , "Capture power reduction using clock gating aware test generation," Test Conference, 2009. ITC 2009. International , vol., no., pp.1-9, 1-6 Nov. 2009

[9] Badereddine, N.; Girard, P.; Pravossoudovitch, S.; Landrault, C.; Virazel, A.; Wunderlich, H.-J.; , "Structural-Based Power-Aware Assignment of Don't Cares for Peak Power Reduction during Scan Testing," Very Large Scale Integration, 2006 IFIP International Conference on , vol., no., pp.403-408, 16-18 Oct. 2006

[10] Santiago Remersaro; Xijiang Lin; Zhuo Zhang; Sudhakar M. Reddy; Irith Pomeranz; Janusz Rajski; , "Preferred Fill: A Scalable Method to Reduce Capture Power for Scan Based Designs," Test Conference, 2006. ITC '06. IEEE International , vol., no., pp.1-10, Oct. 2006

[11] Girard, P.; Guiller, L.; Landrault, C.; Pravossoudovitch, S.; , "A test vector ordering technique for switching activity reduction during test operation," VLSI, 1999. Proceedings. Ninth Great Lakes Symposium on , vol., no., pp.24-27, 4-6 Mar 1999

[12] Tseng, W.-D.; , "Scan chain ordering technique for switching activity reduction during scan test," Computers and Digital Techniques, IEE Proceedings - , vol.152, no.5, pp. 609- 617, 9 Sept. 2005

[13] Wohl, P. et al., "Minimizing the Impact of Scan Compression," VLSI Test Symposium, 2007. 25th IEEE , vol., no., pp.67-74, 6-10 May 2007

[14] Sankaralingam, R.; Oruganti, R.R.; Touba, N.A.; , "Static compaction techniques to control scan vector power dissipation," VLSI Test Symposium, 2000. Proceedings. 18th IEEE , vol., no., pp.35-40, 2000

505

Documents

[IEEE 2011 IEEE 20th Asian Test Symposium (ATS) - New Delhi, India (2011.11.20-2011.11.23)] 2011 Asian Test Symposium - Power Aware Shift and Capture ATPG Methodology for Low Power