Upload
swapnil
View
215
Download
1
Embed Size (px)
Citation preview
Power Aware Shift and Capture ATPG methodology for Low Power Designs
Shray Khullar, Swapnil Bahl
Technology Research and Development STMicroelectronics Noida, India
[email protected] , [email protected]
Abstract - Power management has emerged as a major design objective, both in functional and test mode, in most of the application domains that employ digital ICs. This paper presents a low power ATPG methodology for managing power both in shift and capture mode. The technique exploits the embedded clock gates and provides a good tradeoff between pattern count and reduction in switching activity without any significant coverage loss. The methodology also presents a novel method of selective scan chain reordering for scan compressed designs to reduce shift switching activity with minimal design flow constraints.
Keywords – capture power, shift power, scan reordering, low power
I. INTRODUCTION Power consumption is increasing with SOC complexity
and rising design performance. Today it is an issue in both functional and manufacturing test domains. In many designs, power consumption during test can be significantly higher than during normal functional mode [1,2]. The reason is that test patterns cause as many nodes switching as possible while a typical functional mode only activates a few modules at the same time. Another reason is that successive functional input vectors applied to a given circuit during system mode have a significant correlation, while the correlation between consecutive test patterns can be very low [3].
In general, the test power problems can be classified into two categories: average and peak power issue. Increased average power causes an increase in the temperature of the circuit-under-test (CUT). This can cause hot spots during the duration of a test session, which may lead to the permanent damage of the CUT. Other thermal effects, e.g., hot-carrier-induced defects, electro migration, or dielectric breakdown, are accelerated gradually and may affect performance. Peak power is the highest value of power at any given instant and can cause false failures during test if it surpasses the thermal and electrical limits of the chip. All of the above cause circuit reliability concerns or parts being damaged during test or in preventing good parts from passing test, leading to yield loss [4].
. In integrated circuits (ICs), there are different design strategies to reduce power consumption during functional mode. Clock gating, Multi-supply voltage (MSV) and Power shut-off (PSO) techniques are now widely used for power management. To reduce power in test mode, recently, a number of viable solutions have been proposed to cope with the power problems during test. Test scheduling algorithms described in try to determine the blocks of a complex design to be activated in parallel at each stage of the test session in
order to reduce the number of concurrently tested modules [5,6]. The average test power is reduced but they increase the test time. Some recent papers have considered the use of clock gating for capture power reduction [7,8].
To reduce power in shift mode, various X-fill techniques have been proposed in the literature [9-10]. Scan chain and pattern re-ordering techniques to reduce the switching activity are presented in [11,12]. These techniques modify either the order in which test patterns of a given test sequence are applied to the CUT or the order in which the scan flops are chained to form the scan chain.
In this paper we propose complete low power test vector generation methodology. The rest of the paper is organized as follows. Section II describes the low power ATPG flow. In section III, capture power reduction technique based on embedded clock gates is discussed. Section IV describes the MT-Fill technique and its limitations in scan compressed designs. In section V, a novel method of restructuring scan chains is presented to reduce shift switching activity by reordering minimal scan cells. Results are presented in Section VI, and finally, section VII draws the conclusion.
II. LOW POWER ATPG FLOW The preferred strategy to test the chip is to keep the
complete chip in full power ON mode. This helps in optimizing the overall test time and pattern count. Additionally, we limit the switching activity of the design during pattern generation process such that test power consumption is within the chip power specification. This strategy is applicable to most of the designs but in few design this strategy may not work. For such power critical designs, we selectively switch ON few power domains at a given time and test the complete chip in a sequential manner.
The power reduction in shift mode is based on utilizing the MT-fill technique along with limiting the care-bits in a given pattern. We will discuss this later in section V. Capture reduction is based on limiting the number of flops clocked by controlling the embedded clock gates. As one technique affects the other, simultaneously enabling both the reduction techniques may affect the overall effectiveness. When both techniques are enabled, we give priority to capture mode power reduction and limit the maximum care bit to 25% to shift mode. The overall flow is described in figure 1. The flow can be seen in three parts – (A) Power budget calculation, (B) Low Power ATPG pattern generation and (C) Power validation of ATPG patterns.
2011 Asian Test Symposium
1081-7735/11 $26.00 © 2011 IEEE
DOI 10.1109/ATS.2011.65
500
A. Power budget calculation Power consumption mainly depends upon frequency,
voltage supply and switching activity. Since, capture frequency and voltage cannot be reduced, we rely on reducing the switching activity i.e. power budget to reduce the power in test mode. In this paper, power budget is defined as the maximum number of registers allowed to toggle in any given clock cycle. To calculate the power budget for capture mode, a vectorless analysis is performed in Power Analysis tool. The design is read in the power analysis tool and capture mode constraints are applied. Power consumption is calculated for various range of switching activity applied on the all design nets. The values range from 5% to 100% with an interval of 5. The power values obtained tend to follow a linear relationship with switching activity. By plotting a regression line, the relationship between the power consumption and switching activity is translated into an equation of the form –
� � � � ��� � (1)
Where, P = power consumption m = slope of the line Sact = switching activity C = constant Given the chip’s functional power specification, we can
calculate the power budget using (1). A 15% margin is also introduced to cater for the deviation between vectorless and vector based power calculation.
Figure 1. Low Capture Power ATPG Flow
B. Low Power ATPG pattern generation The calculated power budget is given to ATPG engine to
generate low power patterns. ATPG uses this value to limit the switching activity for any given capture clock cycle for each pattern. It filters out any pattern which exceeds the power budget. ATPG tool also reports the switching activity obtained for capture mode in each pattern. To generate the optimal low power ATPG patterns, it is very important to choose the correct power budget. An aggressive budget may impact the coverage and pattern count while over budgeting may result into false silicon failures due to excessive power consumption.
C. Power validation of ATPG patterns To validate the absolute power consumption of ATPG
vectors, the patterns are simulated using the SDF (static delay format) file and then power analysis tool is used to calculate the absolute power. Since this step is very time consuming, it becomes impractical to perform this on the complete ATPG pattern set. Depending upon the design size, few tens of ATPG patterns with highest switching activities are selected and undergo the pattern validation process. Generally, we observe a very good correlation between the vectorless power and the absolute power. An example of power correlation is shown in figure 2.
Figure 2. Vectorless power correlation with absolute power
III. CAPTURE POWER REDUTION The current industrial Low Power ATPG tools utilize the
clock gating cells to limit the number of flops allowed to toggle during capture mode [8].
Normally, clock gating cells are inserted during the synthesis step to reduce the power during functional mode. By inserting clock gating circuitry in the chip’s design, the original clock is subdivided into many gated clocks. Also, the levels of clock gating present for any flop can be more than one. In figure 3, we show an example design with clock CLK driving 100 flip-flops. When clock gate insertion is performed, let us assume 3 clock gating cells are added each driving 25 flops each. The number of flops that be driven by single clock gate depends upon the design’s functionality. From figure 3, it can be seen that in order to apply a clock to a flip-flop driven directly by CLK, CG1, CG2 and CG3 can all be turned off, such that only the 25 flip-flops driven
Power budget calculation
Low power ATPGpattern generation
Power validation of ATPG patterns
501
directly by CLK receive a clock pulse. Similarly, in order to apply a clock pulse to a flip-flop driven by CG3, CG1 and CG2 can be turned off, such that only the 25 flip-flops driven by CG3 and the 25 flip-flops driven directly by the clock receive a clock pulse.
This way low power ATPG tool enables only selected number of clock gates which are required to test faults targeted in a single pattern and disables clock gates that do not contribute to fault testing. In every pattern generation process, the overall switching activity per clock cycle is maintained in strict accordance to the power budget specified. The power budget limitation directly impacts the pattern count and test coverage. A very aggressive power budget may result into coverage drop as the clock gating cells required to be enabled for fault detection may toggle flops higher in number than allowed by the power budget. This also depends on the clock gating granularity in the design. Pattern count may also increase as fewer faults could now be targeted in a given pattern due to restricted togging of flops.
Figure 3. An example design with clock gaters
IV. SHIFT POWER REDUTION TECHNIQUE: MT-FILL One of the most efficient techniques to minimize shift
power during test is Minimum transition-fill or MT-fill [14]. This is purely a software method applied during pattern generation. In this technique, all the scan flops that are unspecified in a vector are assigned bit values in such a way to produce minimum transitions during the shift mode. The unspecified bits are referred as don’t care bits. Similarly, bits that are specified during vector generation are termed as care bits. By filling in the last care bit value in subsequent don’t care positions, we can achieve reduced shift transitions or switching activity. Thus, we can control both the average and peak shift power. As one can interpret, greater the percentage of don’t care bits in a vector, greater is its effectiveness. Fortunately, for most designs MT-fill has been proved very effective. But with the advent of scan compressed designs it has shown serious limitations.
A. Ineffectivenss of MT-FILL in Compression Architecture Scan compression architectures reduces the effective
scan chain length for every pattern and provide the necessary compression in both test data volume and test application time. Let us consider an example of a combinational scan compression architecture shown in figure 4. Using a decompressor block, the architecture is able to feed greater
number of internal scan chains while utilizing few external test inputs. The compressor block at the unload end compacts the unload test data from the greater number of internal scan chains to few external test output pins [13].
Figure 4. Combinational scan compression architecture
Since, each input data is broadcasted to multiple chains, this result in multifold increase in care-bits being pushed in scan chains, leaving very less possibility for techniques like MT-fill to prove effective. Figure 5 & 6 shows a comparison of care-bits profiles between normal scan and scan compression architecture for a 30K scan cell design. It is evident that the average number of care bits in normal scan is less than 1%. The remaining 99% bits can take any value thus increasing the degrees of freedom to make MT-fill successful. However, for the scan compressed design, the assigned bits per vector can be as high as 90% by virtue of scan decompression logic rendering MT-Fill ineffective until specific measures are taken.
Figure 5. Care-bits distribution for normal/internal scan design
Figure 6. Care-bits distribution for scan compressed design
In the next section, we propose a scan chain restructuring technique to enhance the effectiveness of MT-fill.
Patterns
Car
e bi
ts (%
)
Patterns
Car
e bi
ts (%
)
Compressor
Scan Output Channels
Internal Scan Chains
Scan Input Channels
De-Compressor
E1
E2
E3 CG3
CG2
CG1
502
V. SHIFT SWITCHING REDUCTION BY RESTUCTURING SCANC CHAINS
Before discussion the restructuring technique, let us lay down a few terminologies in context of shift switching activity. Power consumed in shift mode directly depends on the number of toggling happening in the scan chain. We can estimate the number of scan cell transitions in a given cycle as a product of total number of cells and their transition probability. This is expressed in (2).
� �������� � � � ���� (2)
Where S is the total number of scan cells in the design and ���� is the average transition probability of cells. Further, let us assume that the static probabilities for a jth scan cell being loaded with logic ‘1’ and logic ‘0’ are Pj,1 and Pj,0 respectively in a given vector. For example, if Pj,1=0.99 and total vectors are 1000, then the jth scan cell will be loaded with ‘1’ for 990 vectors and loaded with ‘0’ for remaining 10 vectors only. The transition probability ������ can be defined as probability of jth cell in scan chain making a transition from logic ‘0’ to logic ‘1’ as
������� � ������ � ����� � ������ Similarly,
������� � ������ � ����� � �������
Combining (3) and (4) we get, ������ � ������ ������ (5)
Thus the power can be rewritten in the form
�� ��� ! �"# $ %���������� ����������&'� (6)
In (6) it is shown that the static probability of cells and
the order in which they appear in a scan chain governs the shift mode power consumption. For example, consider five scan cells as shown in figure 7 and let us assume vector to be loaded is XX101. Value of ‘X’ signifies don’t care bit and can assume any value according to MT-Fill. The scan chain marked as unordered i.e. connected as shown in the figure 7, will at least observe two transitions with ‘X’s filled as ‘1’. However, if the scan chain is reordered and cells S3 and S4 are swapped, the scan chain will observe only one transition with ‘X’s filled as ‘0’. Also, if for all the scan vectors the static probabilities for S3, S5 to be ‘1’ and for S4 to be ‘0’ are very, then it is highly likely for the ordered scan chain to produce minimal transitions.
Figure 7. Impact of static probability of scan cells
The proposed scan chain restructuring technique is based on the simple principle of stitching scan cells with identical and high static probabilities in the same chain. Higher the
probabilities of cells in a chain to attain a value of say logic ‘0’, greater will be the shift switching activity reduction.
We now explain the principle behind the selection of scan cells having higher Pj,1 and Pj,0. As discussed in section IV, when low capture power ATPG is enabled, all the flops controlling the enable pin of clock gaters need to be deterministically specified, i.e., they either enable or disable the clock while respecting the power budget. A clock gate is enabled only in few vectors when faults belonging to its logic are targeted and is disabled for rest of the vectors. Hence, the scan cells driving the enable pins of CGC cells either have high Pj,1 or Pj,0. Figure 8 shows an example of clock gating structure in a design. The scan flops S1 and S2 drive the enable of clock gate CG during capture mode when SE is low. For all the vectors, except when faults occurring in the logic cone of gClk are targeted, S1 and S2 are loaded with ‘1’ and ‘0’ respectively, to disable the clock gating cell CG. S1 has high Pj,1 and S2 has high Pj,0.
Figure 8. Control of CGC enable pins
In the proposed technique, selected scan cells with high Pj,1 are stitched in a single chain. Let us call these cells as static-1 cells and similarly cells with high Pj,0 are stitched together to form a chain of static-0 cells.
Figure 9 describes the scan re-ordering flow. The flow can be segregated in 3 parts – (A) Identification of static-1 and static-0 cells, (B) Chain stitching in scan compressed designs, and (C) generation of low power vectors.
A. Identification of static-1 and static-0 scan cells The identification of static-1 and static-0 cells is a two
step process. Firstly, all the clock gates present in the path of test clocks of the design are located. A custom script forces a value of ‘0’ on the enable pin for all the clock gates in the design and justify it by assigning values to the driving flops. This process not only identifies the driving scan cells of each clock gate’s enable pin but also their polarity to disable the clock gates. On the basis of their polarity, the cells are segregated into static-1 and static-0 cells.
B. Chain stitching in scan compression designs We then stitch static-1 cells and static-0 cells in separate
scan segments. The switching activities for these segments are expected to be very low. The remaining scan cells which are neither static-1 nor static-0 are termed as random cells and are stitched in a non-specific scan order. It has been observed for many designs that approximately 5% of total
D Q
S1
D Q
S2
SE
D Q
CG
EN
Clk
D Q
S2
D Q
S4
D Q
S3 gClk
To other logic
To other logic
S1 S2 S3 S4 S5 Scan In Scan Out
X X 1 0 1 X X 0 1 1
Unordered Ordered
503
flops make up the static-1 and static-0 cells. Next, the required de-compressor and compressor structures are connected to the interface of the scan segments. Figure 10 shows an example of scan chain order with scan compression.
In scan compressed designs, it is important to carefully connect the chains with static-0 and static-1 cells to the de-compressor structure, as they might significantly reduce the degrees of freedom for the care bits in a vector. This is because, owing to the structure of the de-compressor block, some scan segments tend to have dependency between them while some are independent [13]. Since, static-1 and static-0 chains makes up a major portion of the total care bits in a vector, they should be kept in segments that are maximally independent of each other for all modes of scan compression. This way ATPG is able to generated most efficient test vectors.
Figure 9. Flow for low power test
Figure 10. Sample scan reorder in Scan compression design
C. Generation of low power test vectors Apart from constraining the ATPG to respect power
budget during capture and using MT-Fill for don’t care bits, another restriction is imposed during pattern generation. As one can see from the care bit profile shown in figure 5 & 6, the care bit profiles are highly non-uniform. This makes the MT-Fill technique to be less effective for initial vectors and more effective later on. We constraint the tool to keep the number of care bits within a specified care bit limit. This makes the care bit profile very uniform. Figure 11 shows the care bit profile for a 30K scan cell design with care bit limit imposed of 25%. From the figure 11, it is clear that don’t care bits for most of the vectors are approx. 75% and since MT-Fill relies completely upon number of don’t care bits in a vector; the effectiveness of shift switching reduction is uniform too.
Figure 11. Care bits assigned per pattern
VI. RESULTS The proposed scan restructuring technique is based on
the principal that scan cells driving clock gate’s enable have high static probabilities of either being a ‘1’ or ‘0’. To confirm this theory, low power ATPG vectors were generated with 10% capture budget for a scan reordered design using the proposed technique. Figure 12 shows the percentage times a static-1 cell is loaded with a ‘1’ and a static-0 cell is loaded with a ‘0’ for all the patterns. As shown, a good correlation of 90.2% was observed.
Figure 12. Correlation of static-0 and static-1s in generated patterns
Table I shows the quality of results measured for 2 industrial designs. Chip A contains 15K scan flops with 10X scan compression and chip B contains 47K scan flops with 20X scan compression. Result are obtained for 3 scenarios; standard ATPG with no capture or shift power reduction method, standard ATPG with MT-fill to reduce shift power and finally the proposed test generation methodology. As shown in table, with the proposed technique, the average shift switching is reduced further by 52.46% for design A
Reordered scan cells
% C
orre
latio
n C
are
bits
(%)
Patterns
Compressor
Decompressor
1
1
1
1
1
0
0
0
0
0
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Reordered chain with
static-1cells
Reordered chain with
static-0 cells
Unordered chains
Synthesized Design
Identify static-1 and static-0 cells
Scan chain ordering and scan insertion
Connect scan segments to de-compressor
and compressor blocks
ATPG constraints
Low Power Patterns
ATPG
• Capture Budget • MT-Fill • Care Bit Limit
Identification of static-1 and static-0 cells
Chain stitching in scan compression designs
Generation of low power test vectors
504
and 64.62% for design B over the existing MT-Fill technique without scan restructuring. The peak shift switching also reduces by 20.79% for design A and by 25.22% for design B. The test coverage impact is negligible and up to 19.2% pattern inflation is seen owing to restricted power budget.
TABLE I. QOR FOR STANDARD AND PROPOSED TECHNIQUES
Chip
Quality Parameter
Standard ATPG
Standard ATPG with
MT-FILL
ProposedFlow
A
15K Scan Cells
Test Coverage 86.86% 86.86% 85.65%Pattern Count 957 971 1123Avg Capture SA 33.1% 32.9% 7.7%Peak Capture SA 35.7% 35.4% 11.0%Avg Shift SA 48.2 % 34.5 % 16.4%Peak Shift SA 55.4 % 45.2% 35.8%Pattern Inflation - 1.5% 17.3%
B
47K Scan Cells
Test Coverage 95.83% 95.82% 96.29%Pattern Count 37689 38016 44931Avg Capture SA 29.6% 28.9% 11.4Peak Capture SA 38.0% 37.7% 15.0%Avg Shift SA 45.7% 34.2% 12.1%Peak Shift SA 52.3% 44.8% 33.5%Pattern Inflation - 0.9% 19.2%
An analysis to measure the absolute shift and capture
power reduction with the proposed technique was performed and the results are tabulated in table II. Chip A and B are in 32nm with 100 MHz of shift operation. The capture frequencies are 400 Mhz and 333 MHz respectively for Chip A and B. Table II clearly shows that with no capture power reduction technique, vectors consume power 2.3 to 3.3 times the functional power. For Shift mode, we obtain a dynamic power reduction of up to 55.5% over standard MT-Fill. Even though the absolute shift power for vectors generated with Standard MT-fill technique are not alarming but they may be at risk for peak power issue owing to high switching activities.
Since, the proposed technique reorders less than 5% scan cells, the increase in routing congestion is expected to be within acceptable limits. To see the impact on routing congestion, Place and Route flow using the industrial tools was performed on the two chips, A & B with standard and scan re-ordered netlists. An increase of 5.44% and 7.62% total wire length is observed in both the design respectively.
VII. CONCLUSION This paper describes the overall low power methodology
adopted at ST. It also proposes a new method of shift power reduction by scan re-ordering less than 5% of the flops. Combining the MT-fill technique and scan reordering the flops driving the clock gating cells with a high static probability of ‘1’ or ‘0’, high reduction in switching activity in shift mode is obtained for scan compressed designs. The increase in routing congestion is observed to be insignificant to impact the overall route-ability of the design. Results obtained on industrial designs shows an average reduction of 58.44% in average switching activity during shift with an average routing overhead of 6.53%.
TABLE II. COMPARISON OF ABSOLUTE POWER CONSUMPTION
Chip Quality Parameter Standard
ATPG Standard
ATPG with MT-FILL
ProposedFlow
A
Func. power 120mW - - -Dyn. Capture Power
mW 394 374 112Impact 3.3X 3.1X 0.93X
Dyn. Shift Power
mW 163 131 75Reduction - 19.6% 42.7%
B
Func. Power 430mW - - -Dyn. Capture Power
mW 1033 1006 418Impact 2.4X 2.3X 0.97X
Dyn. Shift Power
mW 457 371 165 Reduction - 18.8% 55.5%
REFERENCES [1] A. Crouch, “Design-for-Test for Digital IC's and Embedded Core
Systems”, Prentice Hall, 1999. [2] M.L. Bushnell and V.D. Agrawal, “Essentials of Electronic Testing”,
Kluwer Academic Publishers, Boston, 2000. [3] Seongmoon Wang; Gupta, S.K.; , "DS-LFSR: a new BIST TPG for
low heat dissipation," Test Conference, 1997. Proceedings., International , vol., no., pp.848-857, 1-6 Nov 1997
[4] Ajami, A.H.; Banerjee, K.; Mehrotra, A.; Pedram, M.; , "Analysis of IR-drop scaling with implications for deep submicron P/G network designs," Quality Electronic Design, 2003. Proceedings. Fourth International Symposium on , vol., no., pp. 35- 40, 24-26 March 2003
[5] Zorian, Y.; , "A distributed BIST control scheme for complex VLSI devices," VLSI Test Symposium, 1993. Digest of Papers., Eleventh Annual 1993 IEEE , vol., no., pp.4-9, 6-8 Apr 1993
[6] Chou, R.M.; Saluja, K.K.; Agrawal, V.D.; , "Power constraint scheduling of tests," VLSI Design, 1994., Proceedings of the Seventh International Conference on , vol., no., pp.271-274, 5-8 Jan 1994
[7] Bonhomme, Y.; Girard, P.; Guiller, L.; Landrault, C.; Pravossoudovitch, S.; , "A gated clock scheme for low power scan testing of logic ICs or embedded cores," Test Symposium, 2001. Proceedings. 10th Asian , vol., no., pp.253-258, 2001
[8] Chakravadhanula, K.; Chickermane, V.; Keller, B.; Gallagher, P.; Narang, P.; , "Capture power reduction using clock gating aware test generation," Test Conference, 2009. ITC 2009. International , vol., no., pp.1-9, 1-6 Nov. 2009
[9] Badereddine, N.; Girard, P.; Pravossoudovitch, S.; Landrault, C.; Virazel, A.; Wunderlich, H.-J.; , "Structural-Based Power-Aware Assignment of Don't Cares for Peak Power Reduction during Scan Testing," Very Large Scale Integration, 2006 IFIP International Conference on , vol., no., pp.403-408, 16-18 Oct. 2006
[10] Santiago Remersaro; Xijiang Lin; Zhuo Zhang; Sudhakar M. Reddy; Irith Pomeranz; Janusz Rajski; , "Preferred Fill: A Scalable Method to Reduce Capture Power for Scan Based Designs," Test Conference, 2006. ITC '06. IEEE International , vol., no., pp.1-10, Oct. 2006
[11] Girard, P.; Guiller, L.; Landrault, C.; Pravossoudovitch, S.; , "A test vector ordering technique for switching activity reduction during test operation," VLSI, 1999. Proceedings. Ninth Great Lakes Symposium on , vol., no., pp.24-27, 4-6 Mar 1999
[12] Tseng, W.-D.; , "Scan chain ordering technique for switching activity reduction during scan test," Computers and Digital Techniques, IEE Proceedings - , vol.152, no.5, pp. 609- 617, 9 Sept. 2005
[13] Wohl, P. et al., "Minimizing the Impact of Scan Compression," VLSI Test Symposium, 2007. 25th IEEE , vol., no., pp.67-74, 6-10 May 2007
[14] Sankaralingam, R.; Oruganti, R.R.; Touba, N.A.; , "Static compaction techniques to control scan vector power dissipation," VLSI Test Symposium, 2000. Proceedings. 18th IEEE , vol., no., pp.35-40, 2000
505