Reconfigumble VLSI Processor Arrays with VHDL · Reconfigumble VLSI Processor Arrays with VHDL Kuochen Wang and Sy-Yen Kuo, Member, IEEE Abstract-In this paper, we present an integrated

185 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN, VOL. 11, NO. 2, FEBRUARY 1992

Computer-Aided Modeling and Evaluation of Reconfigumble VLSI Processor Arrays

with VHDL Kuochen Wang and Sy-Yen Kuo, Member, IEEE

Abstract-In this paper, we present an integrated computer- aided design environment, the VAR (VHDL-based Array Re- configuration) system, for the tasks of design, reconfiguration, simulation, and evaluation in an architecture modeled by VHDL. An easily diagnosable and reconfigurable two-dimensional defect-tolerant PE-switch lattice array is used as an example to illustrate the methodology of VAR. VAR allows the designers study and evaluate fault diagnosis and reconfiguration algorithms by inserting faults, which are generated based on manufacturing yield data, into the array and then locating the faulty PE’s as well as simulating the reconfiguration process. Thus, VAR can assist the designers in evaluating different combinations of fault patterns, fault diagnosis algorithms, reconfiguration algorithms, and reconfigurable architectures through a complete set of figures of merit which aim at architectural improvements. Extensive simulation and evaluation have been performed to demonstrate and support the effectiveness of VAR. The results from this research can drive the applications of large-area VLSI or WSI (wafer scale integration) closer to reality and result in low-cost, high-yield array architectures.

Index Terms-VHDL, modeling, evaluation, yield enhancement, array architecture, reconfiguration.

I. INTRODUCTION HE VHDL-based Array Reconfiguration (VAR) sys- T tem is an integrated high-level CAD environment for

the tasks of design, diagnosis, reconfiguration, simulation, and evaluation on an architecture described in a hardware description language, VHDL. VHDL, the IEEE 1076 Standard, is the basis of the VAR system. It is used to model reconfigurable array architectures in a modular way, and its support environment is used to simulate and evaluate different design alternatives. One of the advan- tages of developing such a VHDL-based tool is making other VHDL-based tools available to VAR and vice versa. VHDL has been adopted as a standard hardware description language by the U.S. Department of Defense and is

Manuscript received January 12, 1990; revised November 21, 1990. This paper was recommended by Associate Editor J . Allen. This work was sup- ported by the National Science Foundation under Grant MIP-89-08648.

K. Wang was with the Department of Electrical and Computer Engi- neering, University of Arizona, Tucson, A 2 85721. He is now with the Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan.

S.-Y. Kuo was with the Department of Electrical and Computer Engi- neering, University of Arizona, Tucson. He is now with the Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan.

IEEE Log Number 9103049.

gaining popularity in industries and academia. We can use manufacturing yield data to generate different fault pattems to see if a simulated array has the capability to with- stand the possible defects through fault diagnosis and reconfiguration. The above results along with the quan- titative evaluation of the target array can help the designers determine appropriate redundancy deployment and al- location. The procedures of modeling, simulating, and evaluating processes are illustrated by an example two- dimensional reconfigurable systolic array. The transformation of a faulty array into a target array in a VHDL support environment is used to simulate the actual reconfiguration process. The correctness of the reconfiguration process and the functionality of the target array are verified by performing matrix applications. Experimental results are included to demonstrate the evaluation process. The issues evaluated include redundancy overhead, efficiency of reconfiguration algorithms, fault distribution effect, array quality, yield, and reliability.

Although extensive research in the areas of fault diagnosis and reconfiguration for array architectures [ 11-[4] has been performed, integrated architecture-level computer-aided design (CAD) tools are lacking with respect to assisting the design and evaluation of defect-tolerant array architectures. Many physical level VLSI CAD tools, such as tools for layout design automation, are available today but few are architecture-level system design [5], especially for reconfigurable architectures. The design and analysis of reconfigurable VLSI array architectures are increasingly complicated tasks, especially in evaluating the impact of reconfiguration mechanisms on performance, overhead, and yield. It is necessary to develop such an integrated high-level CAD tool to assist in simulating and evaluating various combinations of fault patterns, fault diagnosis algorithms, reconfiguration algorithms, and reconfigurable architectures. This environment will enable system designers to reduce the design turnaround time, to pinpoint possible design problems in the early design phase, and to optimize their designs through architectural improvements. It will drive the applications of large-area VLSI or WSI (wafer scale integration) closer to reality and help produce low-cost, high-yield array architectures. In [6] the high-level structure of a CAD system for reconfigurable array architectures was outlined, but the au-

0278-0070/92$03.00 0 1992 IEEE

186 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN, VOL. 11, NO. 2, FEBRUARY 1992

thors did not address how the modeling, simulation, and evaluation could actually be performed and no experimental results were presented. The workbench in [7] aimed at reconfiguration issues only. It did not consider fault diagnosis issues and adopted a concurrent process hardware description language (CPHDL) for reconfigurable architectures. However, it did provide a translator to allow direct translation of the generated architectures into VHDL descriptions. The CAD systems in [8]-[lo] are not specifically for defect-tolerant arrays, while the CAD system in [ 111 is for the conceptual design of VLSI systems. The VAR system presented in this paper can fill this void. Kung 1121 presented an array compiler for CAD of array processors which can be divided into three levels: the array level, the processor level, and the realization level. The VAR system focuses on design and analysis issues at the processor level.

The paper is organized in the following manner. An overview of the VAR system is first presented in Section 11. In Section 111, a PE-switch lattice model and its implication are first presented. Then VHDL is used to model an example M X N defect-tolerant array. The reconfiguration process and the experimental results are detailed in Section IV. The evaluation process and simulation results are depicted in Section V, and followed with some con- cluding remarks.

11. OVERVIEW OF THE VAR SYSTEM This section overviews the VAR's organization and

problem-solving approach, The system configuration of the VAR system is shown in Fig. 1. The module of array design with VHDL will be described in Section 111. The functions of other major modules are specified in the following subsections.

A . Kernel and VHDL Support Environment (K& VSE) K&VSE supervises, schedules, and coordinates the

tasks in VAR. In Fig. 1 a dashed line between K&VSE and a module implies that there is an interaction involving commands and responses between these two modules. The Intermetrics Standard VHDL 1076 Support Environment is the basis of VSE.

B. Fault Diagnosis Module (FDM) In a reconfigurable system, replaceable faulty elements

must first be identified before a repair action can be taken. FDM will execute a fault diagnosis algorithm specific to an architecture. The fault diagnosis algorithms in [13] are used here and the trade-offs can be compared via simulation.

C. Reconjiguration Module (RM) RM coordinates the following three events: execution

of a reconfiguration program, execution of a switching mechanism transformation program, and VHDL simulation of target array generation and functionality verifica-

I Design for diagnosis and

Evaluation (yield, delay. performance,

tion. The reconfiguration algorithms in [14] are em- ployed. Depending on the switching mechanisms, the algorithms reconfigure an array under three constraints: (1) row and column bypass (algorithm BB), (2) row bypass and column rerouting (algorithm BR), and (3) row and column rerouting (algorithm RR). The efficiency of the algorithms and the complexity of the switching mechanism transformation can be observed through simulation.

D. Evaluation Module (EM) EM evaluates redundancy overhead, time complexity

of fault diagnosis and reconfiguration algorithms, fault coverage, yield, delay, performance, and cost. The parameters that may affect yield are area overhead, diag- nosability , reconfigurability , and the efficiency of diagnosis and reconfiguration algorithms. Overhead arising from diagnosis and reconfiguration results in long wire delay and performance penalty. The correlation of these parameters is studied by EM. Variations in the design using different fault diagnosis algorithms, reconfiguration algorithms, and reconfigurable architectures are also evaluated based on the above parameters.

E. Optimization Module (OM) Based on the evaluation results, OM modifies an archi-

tecture to enhance yield, performance, overhead reduc- tion, and diagnosis and reconfiguration efficiency. The optimized architecture needs to be evaluated by EM again. This procedure may need to be iterated several times before a cost-effective solution is obtained.

F. Synthesis Module (SM) The ultimate goal of the VAR system is to generate a

complete layout of a reconfigurable array directly from its VHDL description. SM can interface with a VHDL-based synthesis tool to translate the array from the behavioral

WANG AND KUO: COMPUTER-AIDED MODELING A N D EVALUATION 187

description to a structural description and then from the structural description into a complete layout.

111. MODELING DEFECT-TOLERANT ARRAY USING VHDL

A systolic array will be used as an example to illustrate the modeling process for defect-tolerant arrays using VHDL. Descriptions of the array model, system design, PE design, and switch design and a comparison with an existing modeling method are detailed in the following subsections.

A . A PE-Switch Lattice Model and Its Implication A general PE-switch lattice model is shown in Fig. 2

[ 131, [ 151, where a square represents a PE and a horizontal (vertical) line represents a communication link. A row or column consisting of PE’s and links is a PE track; oth- erwise it is called a link track if it consists of links only. A switch may exist at the intersection of a link track and a PE track, and such a switch is called a cardinal switch. A switch may also be located at the intersection of a horizontal link track and a vertical link track, and such a switch is called a comer switch. These two types of switches have the same implementation. A type-I reconfigurable array contains only cardinal switches and no switches at the array boundaries. A type-2 reconfigurable array contains both cardinal and comer switches. The type-1 array uses fewer switches but with less reconfiguration capability than the type-2 array. The notation (M, N , Th, T,, S, U , V ) is used to represent and characterize such an array model, where M x N is the size of the host array, Th and T, are the numbers of horizontal and vertical link tracks between two adjacent PE tracks respectively, S indicates a type-1 ( S = 1) or a type-2 ( S = 2) array, and U x V is the size of the target array. The array I/O ports are assumed at the array boundaries: row 0, row M + 1, column 0, and column N + 1. For a reconfigurable array based on the degradation approach, U = V = 0 is set if there is no restriction on the size of the target array. For ease of discussion, some necessary definitions and terms originating from this array model are introduced next.

Given the array model (M, N , Th, T,,, S, U , V ) , a non- defect-tolerant array is a U X V array with no redundant elements such as extra PE’s, switches, or links; i.e., it is the original architecture for a particular application. A host array is an M X N reconfigurable PE-switch lattice array which may contain faulty elements after manufacturing. A logic array is a host array with the faculty switches and links switched out using the method in [ 131. Note that a logic array may still contain faculty PE’s. A target array is a U x V reconfigurable PE-switch lattice array that contains no faulty elements and is obtained by replacing faulty elements through reconfiguration. Al- thoug a non-defect-tolerant array and the corresponding target array are functionally equivalent, the performance of the latter might be worse than the former owing to the

Fig. 2. A PE-switch lattice model

switches and longer wires. A logic matrix is a U X V matrix in which an element ( r , c) in row i and columnj represents PE(i, j ) of a target array and is located in row r and column c of the host array. An augmented logic matrix is a ( U + 2) X (V + 2) expanded logic matrix with the locations of I/O ports included. An error matrix is a (F + 1) X 1 matrix which contains the size of a host array and each element ( r , c) represents a faulty PE located in row r and column c of the host array, where F is the number of faulty PE’s. A plain-switch matrix is a matrix for controlling the states of the switches in the vertical (or horizontal) line tracks of a host array. A mix-switch matrix is a matrix for controlling the states of switches in the vertical (or horizontal) PE tracks of a host array. A PE matrix is an M X N matrix for controlling the states of PE’s in a host array. A characteristic matrix is a matrix wherein each element represents the state of either a PE or a switch in a reconfigurable array. The size of characteristic matrix is (Th(M - 1) + M ) x (T,(N - 1) + N ) for a type-1 array and is (Th(M + 1) + M ) x (TJN + 1) + N ) for a type-2 array. A characteristic matrix charac- terizes the status of a host array before configuration and the status of a target array after reconfiguration. It can be decomposed into a plain-switch matrix, a mix-switch matrix, and a PE matrix. The classification used in the characteristic matrix is shown in Fig. 3, where each leaf node is a possible state of a PE or switch. The classification concept is related to the work by Pradhan [16].

B. System Design A reconfigurable array (M, N , Th, Tu, S , U , V ) is used

to demonstrate the approach of top-down design and bot- tom-up implementation. Fig. 4 shows a hierarchical VHDL description of the reconfigurable array. It is a modular structure and consequently is easy for complexity management, design modification, or implementation change with different components by rebinding the design via the configuration declaration in VHDL [17]. The description of the array is divided into two types of blocks, switch blocks and PE-switched blocks. A link track in the array of Fig. 2 is a switch block which consists of switches

188

Systolicjen

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN. VOL. I I , NO. 2, FEBRUARY 1992

- s t i ” u S S y stolic-amy

PF Switch

Fault-free /\ >ulty

A 17 C I % I P ~ Random Clustered

fl Fault-free

partial failure total failure allure

Fig. 3 . Classification used in the characteristic matrix.

Fig. 4. Hierarchical VHDL description of the reconfigurable array

and links, while a PE track is a PE-switch block which consists of PE’s as well as switches and links. In Fig. 4, only the leaf nodes are modeled with VHDL behavioral descriptions and all higher level nodes are modeled with VHDL structural descriptions. The array is described in a generic way. The host and target array dimensions (M X N and U X V ) , the numbers of horizontal and vertical link tracks (Th, Tu), the array type ( S ) , the clock cycle times of PE’s, switches, and control lines (PE-cycle- - time, SW-cycle-time, Ctr-cycle-time), and the delays of subcomponents (Multiplier-delay, Adder-delay, etc.) are described by a VHDL generic statement to allow the designers to specify these parameters and, hence, to facilitate design flexibility.

In order to simulate such a reconfigurable array, a VHDL test bench environment is created to facilitate test and simulation. Fig. 5 demonstrates the VHDL test bench environment for reconfigurable arrays. The VHDL test bench includes two components: the generator (Sys- tolic-gen) and the reactor (Systolic-array) . When the VHDL test bench is simulated, Systolic-gen will provide stimulus to Systolic-array, and Systolic-array will re- spond to this stimulus and will send results back to Sys- tolic-gen [ 181. By using the test bench environment, we can experiment with different design alternatives of Sys- tolic-array and various types of Systolic-gen. For in- stance, this environment can be used by FDM for the simulation of the fault diagnosis process. In this case, the Systolic-gen supplies the test patterns, while the Sys- tolic-array responds with the test results. In Section IV, the test bench environment will be used by RM to simulate the generation and the functionality verification of the target array. The VHDL description of the test bench environment is depicted as follows. Fig. 6 shows the VHDL entity declaration with a generic statement of the test bench environment. The generic statement includes the above parameters with default values, which can also be supplied during simulation without reanalyzing the VHDL codes. For a port such as a switch port with multiple sources, the port is defined as a resolved signal and has

I Generator Reactor

VHDL Test Bench Fig. 5. VHDL test bench environment.

entity Systolic-test is genericw, N : Positive;

U, V : Positive; Th : Natural := 1; Tv : Natural := 1; S : Natural := 2; PE-cycle-time : Positive := 200, Ctr-cycle-time : Positive := 20; SW-cycle-time : Positive := 20; Multiplier-delay : Natural := 80; Adder-delay : Natural := 20; Demuxl2-delay : Natural := 10; Mwr3l-delay : Namal := 10; Mux2l-delay : Natural := 10; Data-regdelay : Natural := 10; Ctr-regdelay : Natural := 10; NAND3-delay : Natural := 5; SW-comm-delay : Natural := 10; SW-regdelay : Natural := lo);

end Systolic-test;

Fig. 6. VHDL entity declaration of the test bench environment.

an associated resolution function to determine the value of the resolved signal. Note that those time-related parameters have the data type of Positive or Natural. This is because the VHDL simulator restricts the generic parameters to be of the basic data types only. Therefore, these parameters need to be converted into the data type of Time with the time unit ns during component instantiation. The component Systolic-array is the VHDL description of the example array, which is hierarchical and modular in the manner described by Fig. 4. Various component design options can be easily checked by experiment. The component Systolic-gen is used to 1) generate the PE, switch, and control checks; 2) read and transmit the following data to Systolic-array: PE and switch control data and input data for multiplication; and 3) receive the multiplication results from Systolic-array . The above events are synchronized in Systolic-gen by adapting the synchroni- zation mechanism to the host arrays and target arrays of various sizes, and different clock cycle times. Note that all the nonbasic data types are defined in the package declaration to facilitate resource sharing and modification. Since this is an architecture-level design system, a single delay time is used to capture all possible inherent delay inside a subcomponent, such as propagation delay and ca- pacitive charging and discharging delay.

C. PE Design Two types of PE’s (PE, and PE,) are designed to illus-

trate how to incorporate and evaluate different component implementations. The structure of PE, is shown in Fig. 7

WANG AND KUO: COMPUTER-AIDED MODELING A N D EVALUATION 189

[ 1 3 ] . The I/O relationships of PE, are Yo,, + Yin and X,,, + Xi, + W X Y,,. It has a demultiplexer ( D ) , a multiplexer ( M ) , three 16-bit data registers (W, RI, and R J , a multiplier (Mpy), an adder (Add) , and a 2-bit control register ( c d m ) . The size of each data register can be easily modified in the VHDL package declaration. The control register is controlled by a control clock CTRClk and a control line DM,,,. When OM,,, = 0, the control registers of the PE's in the same row of the array form a scan path through DMi, and DM,,,, which can scan in the PE control data. The demultiplexer and the multiplexer are controlled by the output of the control register. If Cdm(0) = 1, the corresponding PE is bypassed. Each data register has two clock inputs (PE,,, and PE,,,,) controlled by the value of C d m ( 1). PJ?!?,,,, is the normal clock for an active PE when c d m ( l ) = 0. PEclkl has a shorter clock cycle and is used when c d m ( 1) = 1 . It is the clock for a data register when the corresponding PE is bypassed. A data register functions as a delay register when the corresponding PE is being bypased. The other PE design (PE,) has a configuration similar to that of PE, with the addition of a bypass link to each data path. PEb needs only one PE clock PEclk for each data register. The operation of each register in either PE is triggered by the falling edge of a single- phase clock. The single-phase clock scheme is adopted to focus on system design issues. Although MOS circuits are typically driven by a two-phase clock to avoid the clock skew problem, it will always be possible to directly translate the resulting circuits to corresponding two-phase MOS implementation [ 191.

D. Switch Design The switch structure is shown in Fig. 8 [ 131. It has four

I/O ports (SW,, SW,, SW,, and SW,) and consists of a switch communication box (SWComm) and a 4-bit switch control register (SWreg). The switch control register has a clock line SW,,,, a control line SW,,,, and a control register input (output) line CRi, (CROut). When SW,,, = 0, the switch control registers in the same column of the array form a scan path through CRi, and CR,,, which can scan in the switch control data. The switch communication box has three types of connection patterns with four possible states for each type. That is, there are 12 possible states

Fig. 8. Switch structure

which are controlled by the output (SC,,,) of the switch control register. The switch design is more efficient than the design in [20] in terms of the reconfiguration time of switches and design complexity.

E. Comparison with an Existing Modeling Method The object-oriented design of reliable/reconfiguration

architectures (OODRA) workbench was targeted at the design and analysis of concurrent message-passing-based, parallel reconfigurable architectures [7]. It used a concurrent process model for the description of application specific reconfigurable parallel architectures. This concurrent process model has been encapsulated in CPHDL. Al- though VHDL and CPHDL both have the same features that can describe concurrency and structural reconfiguration, CPHDL was chosen for OODRA instead of VHDL. They argue that VHDL is too verbose to be easily cap- tured in a simulation model targeted at application specific parallel architectures [7]. However, eventually they still need a translator to convert the CPHDL descriptions into the VHDL descriptions that might face the same problem in the translator design. Based on our experience and the fact of VHDL being an IEEE standard, this problem can be overcome. When we talk about tool support and the interface to other tools, VHDL is the choice for the description of reconfigurable architectures. Other existing models of hardware execution and their associated hardware description languages are not targeted toward reconfigurable architectures, and are reviewed in [7].

Now we will compare the differences of using CPHDL in OODRA and VHDL in VAR to describe reconfigurable architectures. There are three basic elements in a CPHDL-based reconfigurable architecture description: process abstractions, switch abstractions, and link abstractions [7]. A reconfigurable architecture is composed of multiple processors implementing processes intercon- nected by links and reconfigured by switch mechanisms [7]. The VHDL description of a reconfigurable architecture is based on the hierarchical structure in Fig. 4. One

190 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN. VOL. I I , NO. 2 , FEBRUARY 1992

extra level of hierarchy, which consists of the switch block and the PE-switch block, is added, so we deal only with one dimension of interconnections instead of two dimensions of interconnections, as in the CPHDL-based architectural description. This has the advantage of making architectural description concise and clear. In OODRA, both the host array and the non-defect-tolerant array need to be described by CPHDL. The description of the non-defect- tolerant array is used as a template in order to map the host array with multiple faults into a working system (target array) [7]. The link and channel structures are static, i.e., once instantiated. they cannot be physically rerouted. In VAR, only the host array is described in VHDL. The target array is obtained through actual reconfiguration by using the PE and switch control data to convert the host array into the target array. The interconnection patterns are dynamic in our approach; i.e., they can be rerouted by using different PE and switch control data. Therefore, by describing the host array in VHDL, we can reconstruct and simulate various target arrays based on different fault patterns. The above discussion outlines the basic differences of the modeling methods between CPHDL in OODRA and VHDL in VAR. This comparison demonstrates the effectiveness and flexibility of the VHDL modeling method in VAR.

IV. RECONFIGURATION PROCESS AND SIMULATION Only host arrays with PE faults will be considered,

which is usually assumed in the literature [3], [4], in the simulation and evaluation processes. The effect of faulty switches and links was addressed in [13], where the faulty switches and links can be identified. During reconfiguration, each faulty switch or link is lumped into one of the adjacent PE’s and that PE is considered faulty. One study [2 11 showed that the following yield statistics are typical: 30-65% for PE’s, 99% for switches, and 95% for links (or wires). Since the yield of switches and links are high, it is appropriate to use host arrays with faulty PE’s only to illustrate the simulation and evaluation processes.

A. Reconjiguration Process The reconfiguration process in Fig. 9 includes three

stages: 1) execution of the reconfiguration program, 2) execution of the switching mechanism transformation program, and 3) generation of the VHDL target array. The reconfiguration algorithms described in [ 141 are used to generate a logic matrix based on the error matrix for a host array. The transformation program takes the logic matrix as the input and transforms it into a characteristic matrix according to the selected switching mechanism. The characteristic matrix which contains switch settings and PE bypassing information can then be used as the control input to reconfigure the VHDL-based host array into the target array. The correctness of the reconfiguration process is verified through VHDL simulation. The transformation program consists of two parts: the transformation algorithm and the text-to-binary conversion

Charactenstic matrix

mamx

L _ _ - . _ _ _ _ _ . _ _ _ _ _ _ . _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ . ~ - - - - . ~ ~ ~

Fig. 9. Reconfiguration process in VAR.

program. The transformation algorithm generates the characteristic matrix of a target array based on an error matrix and a logic matrix. The three output matrices, the plain-switch matrix, the mix-switch matrix, and the PE matrix, are derived from the characteristic matrix. These output matrices contain the switch settings for the switches in the vertical link tracks and the switches in the vertical PE tracks, together with the PE bypassing information. These matrices need to be converted into binary before they can be used as the input for VHDL simulation to reconfigure the VHDL-based host array.

B. Simulation Setup

The reconfiguration program and the transformation program are implemented in C on a Sun 31260 worksta- tion. The conversion of an example 6 X 6 type-2 host array into a 5 X 4 target array is used to illustrate the reconfiguration process. Fig. 10 shows the graphical representation of the reconfiguration process for the example array, where only active links are retained for clarity. After the first two stages of the reconfiguration process are executed, the Intermetrics Standard VHDL 1076 Sup- port Environment is used to simulate the third stage of the reconfiguration process and matrix multiplications. To demonstrate how to validate the reconfiguration process and the functionality of the target array, an example matrix multiplication is performed. The default values of the generic parameters (as shown in Fig. 6) are used unless they are particularly mentioned. The VHDL simulation process includes the third stage of the reconfiguration process and a matrix multiplication, and is divided into five parts: 1) transmitting the plain-switch matrix and the mix- switch matrix from the north side, and the PE matrix (which initially contains the control data to bypass all the PE’s) from the west side of the array to proceed reconfiguration; 2) transmitting the multiplicated matrix [ W ] from the west side of the array via the X input data paths to the array and storing each element of the matrix in the W data register of the corresponding PE; 3) transmitting the PE matrix to reconfigure the host array into the target array; 4) transmitting the zeros from the west side of the array via the X input data paths and the multiplier matrix [ Y ] from the north side of the array via the Y input data paths to the array, and starting multiplication; and 5 ) col- lecting the multiplication product [XI from the east side of the array via the X output data paths. An input format conversion program is necessary to convert the multipli- cand and multiplier matrices into the appropriate formats, and an output format conversion program is also needed

WANG A N D KUO: COMPUTER-AIDED MODELING AND EVALUATION 191

(C)

Fig. 10. Graphical representation of the reconfiguration process.

to convert the binary output data into integers and in the form of a matrix.

C. Experimental Results Both the PE,-based array and the PEb-based array are

simulated. It is found that the PE,-based target array needs to operate at PE-cycle-time of 400 ns, while the PEb-based target array can operate at PE-cycle-time of 200 ns. This is because the delay caused by the bypass registers in the PE,-based target array is larger than that of the bypass links in the PEb-based target array. One way to improve the PE-cycle-time of PE, is by adopting only one PE clock and adding some delay registers on certain data paths, as suggested in [ 2 2 ] . The simulation results compiled from the output report of VHDL report generator for the PEb-based array are shown in Table I. Note that at 1262 ns, the values of X,,, indicate that the [W] matrix is correctly stored in the W registers of the target array. The output X,,, is available at an interval of 200 ns. The simulation results verify the correctness of the reconfiguration process and the expected functionality of the target array. It is observed that it takes 13 switch clock cycles to reconfigure the array, four PE clock cycles to transmit [W] values, six control clock cycles to transmit the PE matrix, and 13 PE clock cycles to perform the

matrix multiplication. These results match the theoretical analysis of max ((2M + l)SW-cycle-time, N * C,,cycle-time) to reconfigure an array, V PE clock cycles to transmit [W] values, N control clock cycles to transmit a PE matrix, and (2U + I/ - 1) PE clock cycles to perform a matrix multiplication. The PEb-based array has the same cycle time of 200 ns (or 5 MHz clock) as the Warp processor array, which is programmable linear systolic array, and the Hadamard transform chip, which contains eight PE's [12]. Notice that the performance measure for the PEb-based array is done at the processor level while for the Warp processor array and the Hadamard transform chip it is done at the realization level. To measure the efficiency of the VAR environment, the run times of the first two stages of the reconfiguration process and the VHDL simulation process for the PE,-based and PEb-based arrays are shown in Table 11. Note that the third stage of the reconfiguration process is part of the VHDL simulation process. It is observed that the total run time is dominated by the reconfiguration time and the transformation time. The reconfiguration time and the transformation time for larger arrays are further discussed in the next section. The reason why we have a fast VHDL simulation process has to do with the simple designs of PE and switch control lines, which reduce the actual recon-

_.

192 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN, VOL. 1 1 , NO. 2, FEBRUARY 1992

2 402

1262 1402 2350 2550 2750 2950 3150 3350 3550 3750 3950

0 0 5 0

144 158 172 186 200 0 0 0 0

0 0 6 0 0

178 196 214 232 250 0 0 0

0 0 7 0 0 0

212 234 256 278 300 0 0

0 0 8 0 0 0 0

246 272 298 324 350

0

0 0 9 0 0 0 0 0

280 3 10 340 370 400

TABLE I1 RUN TIME FOR THE PE,- AND P E b - B ~ s ~ ~ ARRAYS

~~ ~

VHDL Simulation Array Reconfigurtion Transformation Process

PE,-based array 0.014 s 0.012 s 343.6 ps PE,-based array 0.014 s 0.012 s 171.7 ps

figuration time of PE's and switches as well as fast matrix multiplications for small array sizes.

V. EVALUATION AND DISCUSSION The work in [23] focused on comparisons of algorithm-

based fault-tolerant WSI architectures. Other evaluation approaches using different evaluation criteria can be found in [24] and [25]. The figures of merit are described first. To evaluate the efficiency of redundancy and reconfiguration algorithms as well as the quality of target arrays, processor arrays with various sizes are simulated and evaluated using these figures of merit. Simulation results for both random and clustered faults are discussed in subsections B and C respectively.

A. Figures of Merit The evaluation criteria for a reconfigurable architecture

are based on the following figures of merit: survival probability (S,), locality (LA), maximal interconnection length (MIL), hardware overhead (oh), area overhead (O,), yield (Y,), reliability (RA), utilization (Ur), and run time (Tr, T,) of reconfiguration and transformation algorithms respectively. Sp indicates the reconfiguration capability of a reconfiguration algorithm for different fault distribution

tion link. MIL (2 1) implies the critical interconnection delay in a target array. A larger MIL implies longer clock cycle time and lower array throughput. 0, can be an index of hardware complexity. Two architectures may have the same area overhead, but one is more complex than the other in term of the amount of active circuitry. The value of 0, affects the manufacturing yield, Y,, significantly. RA shows the effect of redundancy on array reliability. The mathematical expressions of the figures of merit are derived as follows.

The interconnection length (IL, ,, , k , I ) between PE ( i , j ) and PE (k, I ) (or I/O port (k, I ) ) in a target array ( M , N, Th, Tu, S, U , V ) is defined, in terms of the Manhattan distance with the consideration of the numbers of horizontal and vertical link tracks, as follows:

& , , , k , 1 = (Th + 1)Iri - rkl + ( T u + ')Ic, - cl\ (l)

where ( T i , cl) is the index in the logic matrix for PE ( i , j) in the target array and (rk, cL) is the index in the augmented logic matrix for the PE (k, I) or 110 port (k , I ) that is adjacent to PE ( i , j) in the target array. Note that in most VLSI layout, all geometries must be Manhattan geometries, which means that their edges are parallel to the x and y axes [26]. The mathematical definitions of the rest of the figures of merit are either self-explanatory or derived from IL and previous studies to fit into the PE-switch lattice model. The locality (LA) of a target array (M, N, Th, Tu, S, U , V ) is defined as the average interconnection length of the target array and is derived as follows:

z L i , ] , k , l

(2) i .I . k , l

LA = I ,

patterns. LA, Oh, and Oa can be indices Of the quality where In is the number of interconnections in a non-&- of a target array. LA (> l) indicates the wire length fect-tolerant array ( U , I/, 0, 0, S, U , v), i.e., of an interconnection link. Although array throughput is

array. The larger LA is, the longer the average wire length of an interconnection link will be. It also means more switches along an interconnection link. As a result, a

determined by MIL, LA can tell us the quality of a target In = U(V + V(U + 1). (3) The maximal interconnection length (MIL) of a target array ( M , N , Th, Tu, S, U , V ) is defined as follows:

MIL = max (ILi,j,k,)) (4) larger LA implies a higher failure rate of an interconnec- i . j . k . 1

WANG AND KUO: COMPUTER-AIDED MODELING AND EVALUATION 193

where (r i , cj) and (rk, cl) are the indices of the augmented logic matrix for any two adjacent PE ( i , J ) and PE ( k , I ) , 0, = Am ~ - AO 100%

An or a PE ( i , j) and is adjacent 110 port (k , 1 ) in the target array. Note that an interconnection which is an MIL in an array is the critical path of the array. The utilization (U,) of a reconfiguration algorithm is defined as the fraction of

The manufacturing yield (Y,) of a PE-switch lattice array ( M , N, Th, Tu, s, U, v ) based on the negative binomial distribution [271 is

good PE's in a host array ( M , N, Th, Tu, S , 0 , 0 ) used in Ym = P(X I SpE) a target array (M, N, Th, Tu, S , U , V ) , i.e.,

r(a + k) ( A , D / c x ) ~ SPE

= c q k ) (14) U, = ~ 100% (5) k = O k!r((Y) (1 + AmD/(Y)OIfk uv

MN - F

where F is the number of faulty PE's in the host array. The hardware overhead (oh) of a PE-switch lattice array ( M , N, Th, Tu, S , U , V ) is defined as follows:

(based on the degradation approach) (6) or

0 - sn + 100% - P(MN - P)

(based on the redundancy approach) (7)

where P is the manufacturing cost ratio of a PE to a switch, S,, is the number of switches, and P = RN + CM - RC is the number of extra PE's, with R ( C ) being the number of extra PE rows (columns). Equations (6) and (7) are equivalent if UV = MN - P. To attain the value of oh, the number of switches used in the array should be determined first. The number of switches in a type-1 reconfigurable array architecture is

sn = Th(M - 1)N 4- Tu(N - 1)M (8) and that in a type-2 reconfigurable array architecture is

s,, = Th(M + 1)N + Tu(N + 1)M

+ ThT,,(M + 1) (N + 1). (9)

The area overhead (0,) of a PE-switched lattice array is derived as follows. Assuming the width of a switch to be wX, where X is the length unit, the area (Ao) of a non- defect-tolerant array ( U, V, 0, 0, S, U, V ) is equal to

A. = U V 6 2 ~ 2 X 2 (10)

where 6 is the width ratio of a PE to a switch. The area of PE-swith lattice array (M, N, Th, T,, S, U , V ) is

A m = (Th(M - 1) 6 M ) (T,(N - 1) 6N)W2h2

(based on the type-1 array), (1 1)

or

Am = (Th(M + 1) + 6M)(T,(N + 1) + 6N)w2X2

(based on the type-2 array). (12)

Therefore, the area overhead is equal to

where SpE = MN - UV is the number of extra PE's, (Y is the clustering parameter, D is the defect density, and SJk) is the probability that the reconfiguration algorithm can reconfigure a faulty host array into the desired target array given k faulty PE's. The value of SJk) = S k / T can be attained by the Monte Carlo simulation, where Sk is the number of fault pattems with k faulty PE's that can be reconfigured successfully and Tis the total number of fault pattems generated. The manufacturing yield (Yo) of a non- defect-tolerant array (U, V , 0, 0, S , U, V ) obtained by setting S,, = 0 in Equation (14) is

(15)

Assume that the reliability (RPE(t ) ) of a PE is exponen- tially distributed with a failure rate APE, i.e., RpE(t) = e-APE'. The reliability (RA(t)) of a PE-switch lattice array

Yo = (1 + AoD/a) -" .

(M, N , Thy Tu, s, U, v, is

RAt) = k = O ? e) RPE(t)MN-k(l - RpE(t))kSp(k) (16)

and the reliability (Ro(t)) of a non-defect-tolerant array (U, V, 0, 0, S , U, V ) is

B. Simulation with Random Fault Distribution

Three host arrays, (27, 27, 1, 1, 1, 25, 25), (22, 22, 1, 1, 1, 20, 20) , and (17, 17, 1, 1, 1, 15, 15), are simulated by using the reconfiguration algorithm RR. Parts (a) and (b) of Fig. 11 show the relationship of reconfiguration time and transformation time with respect to fault sizes. The reconfiguration time is less than 0.2 s in most cases. The time needed by transformation is also small (10.2 s) and independent of fault sizes. The utilization is an important index for a reconfiguration algorithm based on the degradation approach. Fig. l l (c) shows the relationship between utilization and fault sizes. The utilization for these arrays is at least 80% with less than 30 faults. The utilization decreases if the fault size increases. Higher utilization usually results in higher survival probability. To study the effect of redundancy on the survival probability, the following host arrays are simulated by using algorithm RR: (21, 20, 1, 1, 1, 20, 20) , (21, 21, 1, 1, 1, 20, 20), (22, 21, 1, 1, 1, 20, 20), and (22, 22, 1, 1, 1, 20, 20). That is, the number of extra rows/col-

~

I94 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN, VOL. 11.

22 x 22

17 x 17

0 10 20 30 0 10 20 30

( 2 0 , 2 0 , 0 , 0 , 1 , 2 0 , 2 0 ) 0.00 1.00 1.00 10 0.00 0.00 7.15 20 0.00 0.00 0.77

(21, 20, 1, 1 , 1, 20, 20) 0.18 2.18 3.85 10 7.00 25.93 51.66 20 5.50 15.23 11.19

(21, 21, 1 , 1, 1 , 20, 20) 0.34 2.21 4.39 10 12.35 32.25 81.50 20 10.78 21.00 25.13

(22, 21, 1 , 1 , 1 , 20, 20) 0.48 2.28 5.02 10 17.70 38.58 90.82 20 16.05 26.78 37.05

(22, 22, 1 , 1 , 1 , 20, 20) 0.62 2.33 5.28 10 23.31 45.20 96.41 20 21.58 32.83 49.82

NO. 2. FEBRUARY 1992

Fault Size (a)

0.85

0.8

U,

0.75 17 x 17' I I I I 0 10 20 30

Fault Size (c)

Fault Size

(b)

11~ 21 \\ x 2 0

0

0 10 20 30 Fault Size

(d)

Fig. 1 1 . Array evaluation.

TABLE I11 REDUNDANCY EVALUATION UNDER THE RANDOM FAULT DISTRIBUTION

Host A m y SP L A MIL 6 O h ( % ) O " ( % ) Y , ( % )

umn is in the range from 1 to 4. As expected, Fig. l l (d) shows that the larger the redundancy, the higher the survival probability. This figure can help designers determine the appropriate redundancy based on the require- ment of s, and possible fault sizes based on manufacturing yield data. To determine if the amount of redundancy is apporpriate, several parameters need to be evaluated. Ta- ble I11 shows the relationships between S,, LA, MIL, Oh, O,, and Y, with respect to different amounts of redundancy by using algorithm RR. The following values are assumed in calculating overhead and yield: /3 = 6 2 , w = 48, h = 1.5 pm for a 3 pm technology, cy = 2, and D = 2.5/cm2. Note that S, is the average value of the survival probabilities with the fault sizes up to 35. The yield is increasingly enhanced by the extra rows/columns with the penalty of larger LA, MIL, Oh, and 0,. The redundancy should be evenly distributed on rows and columns if the target array is a square array. Table I11 also shows the effect of 6 on yield. The yield decreases if 6 increases; that is, arrays with larger PE's result in a larger wafer area and thus tend to have lower yield.

C. Simulation with Clustered Fault Distribution

Manufacturing defect clustering occurs on a wafer [27], [28]. Therefore, in addition to evaluating the effect of random faults, we also study the effect of clustered faults on reconfigurable processor arrays. We apply the method in [3], which is based on [29], to generate clustered faults. The generation of clustered faults is controlled by two parameters, cy1 and cy2, where cy1 is the probability of a PE being faulty at the initial fault generation stage and a2 is the clustering parameter. The clustered fault generator first generates faults with the probability cy1 for each PE. Based on this fault pattern, the generator converts a nonfaulty PE (i, j ) to a faulty PE according to the probability, cy1

+ cy2 - adjacent (i, j ) , where adjacent (i, j ) is the number of faulty PE's that are adjacent to PE (i, j ) . Wraparound is assumed at the array boundaries to determine the value of adjacent (i, j ) . Therefore, a boundary PE at one side of the array is assumed to be adjacent to the boundary PE at the other side [30]. Compared with Table 111, Table IV shows the figures of merit for redundancy evaluation un-

WANG AND KUO: COMPUTER-AIDED MODELING AND EVALUATION 195

TABLE IV REDUNDANCY EVALUATION UNDER THE CLUSTERED FAULT DISTRIBUTION

Host Array SP L, MIL 6 0,?(%) 0, (%) Y,(%)

(21, 20, 1, 1, 1, 20, 20)

(21, 20, 1, 1, 1, 20, 20)

(21 ,21 , 1, 1, 1 , 2 0 , 20)

(21, 21, 1, 1, 1, 20, 20)

(22, 21, 1, 1, 1, 20, 20)

(22, 21, 1, 1, 1, 20, 20)

(22, 22, 1, 1, 1, 20, 20)

(22, 22, 1, 1, 1, 20, 20)

(a* = 0.01)

(a2 = 0.001)

(012 = 0.01)

(012 = 0.001)

(012 = 0.01)

(a* = 0.001)

(012 = 0.01)

(a* = 0.001)

0.13 2.12 3.75 10 20

0.17 2.19 3.83 10 20

0.32 2.15 4.48 10 20

0.34 2.20 4.54 10 20

0.43 2.24 4.91 10 20

0.44 2.25 4.68 10 20

0.60 2.27 5.46 10 20

0.67 2.31 5.18 10 20

7.00 5.50 7.00 5.50

12.35 10.78 12.35 10.78

17.70 16.05 17.70 16.05

23.31 21.58 23.31 21.58

25.93 15.23 25.93 15.23

32.25 21.00 32.25 21.00

38.58 26.78 38.58 26.78

45.20 32.83 45.20 32.83

39.25 7.28

49.83 10.51

79.88 24.01 81.82 25.75

86.50 32.84 88.39 33.81

95.57 45.57 97.00 50.88

TABLE V RECONFICURABLE ALGORITHM EVALUATION

SP L A MIL 6 0,(%) 0,(%) Y , " ( % ) Algorithm Host Array

RR (22, 22, 1, 1, 1, 20, 20) 0.62 2.33 5.28 10 23.31 (random) 20 21.58

(22, 22, 1, 1, 1, 20, 20) 0.60 2.27 5.46 10 23.31 (017 = 0.01) 20 21.58

(22, 22, 1, 1, 1, 20, 20) 0.67 2.31 5.18 10 23.31 (a2 = 0.001) 20 21.58

BR (22, 22, 1, 0, 1, 20, 20) 0.61 1.72 4.28 10 22.16 (random) 20 21.29

(22, 22, 1, 0, 1, 20, 20) 0.54 1.70 4.47 10 22.16 (012 = 0.01) 20 21.29

(22, 22, 1, 0, 1, 20, 20) 0.60 1.69 4.15 10 22.16 (012 = 0.001) 20 21.29

BB (22, 22, 0, 0, 1, 20, 20) 0.19 1.05 1.91 10 21.00

(22, 22, 0, 0, 1, 20, 20) 0.19 1.05 2.06 10 21.00

(22, 22, 0, 0, 1, 20, 20) 0.19 1.05 1.83 10 21.00

(random) 2d 21.00

(a, = 0.01) 20 21.00

(a* = 0.001) 20 21.00

45.20 32.83 45.20 32.83 45.20 32.83

32.55 26.78 32.55 26.78 32.55 26.78

21.00 2 I .oo 2 1 .oo 21.00 21.00 21.00

96.41 49.82 95.57 45.47 97.00 50.88

96.71 48.25 94.71 42.39 96.63 47.94

58.70 10.63 60.49 11.22 59.02 10.66

der the clustered fault distribution (a , = 0.001) using algorithm RR. Tables 111 and IV show that the higher the redundancy, the larger the S,, LA, and MIL. We can see that the arrays under the clustered fault distribution with a2 = 0.01 have lower yield than the arrays under the random fault distribution. However, increasing the degree of clustering with a2 = 0.001 (i.e., a2 decreasing) tends to increase S, and thus enhance array yield. Table IV shows that S, increases if a2 decreases; that is, after increasing the degree of clustering to a point, arrays with a clustered fault distribution will have higher yield than arrays with a random fault distribution. This is why the clustered faults have to be considered. Therefore, inclusion of clustering in redundancy yield calculation is of considerable importance [ 2 8 ] .

To illustrate the effect of the numbers of link tracks and clustered faults as well as random faults on array yield by using the reconfiguration algorithms RR, BR, and BB re-

spectively, the host arrays, ( 2 2 , 2 2 , 1, 1, 1, 20, 2 0 ) , ( 2 2 , 22, 1, 0 , 1, 20 , 20), and ( 2 2 , 22 , 0, 0 , 1, 20 , 2 0 ) , are simulated. Table V shows the simulation results, where a I is set to 0.001. A more complex switching mechanism results in larger S,, LA, and MIL, as demon- strated in Table V . Under the clustered fault distribution, if the degree of clustering is increased, the arrays for algorithms RR and BR have higher yield, but the arrays for algorithm BB have lower yield. Note that for the arrays using algorithm BB, their S, and LA stay the same even if a2 decreases. So algorithms RR and BR perform better for arrays with clustered faults. Algorithm BB has smaller overall yield because it does not have link tracks to provide row or column rerouting capability. One surprising result is that the arrays for algorithm BR have a yield that is comparable to that of the arrays for algorithm RR, although algorithm RR is more flexible. Arrays for algorithm RR have slightly higher S, but this is offset by the

196 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN, VOL. I I . NO. 2, FEBRUARY 1992

RA 0.5 l x 2 k

I I I I

-3 -2 -1 0 Time (log,,)

Fig. 12. Reliability evalution.

larger area owing to more switches than arrays for algorithm BR. This shows that it is not necessarily better to have more switches since the extra switches would increase the area without increasing the size of the target array. Therefore, for certain applications, one may want to choose the redundancy strategy in algorithm BR which has smaller LA, MIL, Oh, and 0,. Fig. 12 illustrates the relationship of reliability with respect to time for various reconfigurable arrays as well as the non-defect-tolerant array under the clustered fault distribution (a, = a2 = 0.001). The PE failure rate is assumed to be APE = 0.1 failures per unit time. Note that the array (22, 2 1, 1, 1, 1, 20, 20) has only a slightly higher reliability than the array (21, 21, 1, 1, 1, 20, 20), although the former has an extra row. However, without redundancy the reliability of the non-defect-tolerant array will fall off rapidly. The above simulation results demonstrate that by evaluating the figures of merit through simulation, we can choose a better combination of the redundancy scheme, switching mechanism, and diagnosis and reconfiguration algorithms to design a reconfigurable array based on actual manufacturing yield data.

VI. CONCLUSIONS

The integrated high-level CAD environment VAR for the design, diagnosis, reconfiguration, simulation, and evaluation of defect-tolerant VLSI or WSI array architectures has been presented. We have concentrated on the modeling, simulation, and evaluation processes for a defect-tolerant two-dimensional array in the VAR system. The simulation of the reconfiguration process in VAR is implemented by interfacing the reconfiguration program, the transformation program, and the VHDL description of the array. Extensive simulation has been performed and experimental results are obtained which indeed demonstrate the effectiveness of our approach. The VAR system will greatly help designers to evaluate different redundancy strategies, various fault diagnosis and reconfiguration algorithms, quality of target arrays, yield, and reliability. Future research issues include 1) simulation and evaluation of the fault diagnosis process; 2) optimization in terms of architectural designs, fault diagnosis tech-

niques, and reconfiguration techniques; 3) interface to a VHDL synthesis system for the layout generation; and 4) extension to other reconfigurable parallel architectures.

REFERENCES

N. Jarwala and Y. Zorian, “A methodology for the design of fault- tolerant, testable, wafer-scale processor arrays using boundary scan architecture,” in Wafer Scale Integration Ill, M. Sami and F. Dis- tante, Eds. J . H. Kim and S . M. Reddy, “On the design of fault-tolerant two- dimensional systolic arrays for yield enhancement,” IEEE Trans. Comput., vol. 38, pp. 515-525, Apr. 1989. S . Y. Kung, S . N. Jean, and C. W. Chang, “Fault-tolerant array processors using single-track switches,” IEEE Trans. Compur., vol. 38, pp. 501-514, Apr. 1989. F. Lombardi, M. G. Sami, and R. Stefanelli, “Reconfiguration of VLSI arrays by covering,” IEEE Trans. Computer-Aided Design, vol. 8 , pp. 952-965, Sept. 1989. R. E. Bryant, “Panel: CAD tool needs for system designers,” in Proc. 25th Design Automation Conf,, June 1988, p. 472. F. Distante and V. Piuri, “APES: An integrated system for behavioral design, simulation and evaluation of array processors,” in Proc. Int. Conf. Computer Design, Oct. 1988, pp. 568-572. W. K. Fuchs and S. Y. Kuo, “Spare allocationireconfiguration for WSI,” in Wafer Scale Integration, E. E. Swartzlander, Jr., Ed. Norwell, MA: Kluwer, 1989, pp. 171-185. D. Notkin et a l . , “Experiences with poker,” SIGPLAN Notices, vol. 23, pp. 10-20, Sept. 1988. G. Frank, D. Franke, and W. Ingogly, “An architecture design and assessment system,” VLSI Design, pp, 30-50, Aug. 1985. P. Gachet, P. Quinton, and P. Frison, “DIASTOL: A systolic design tool with multiple hardware design style capabilities,” in Wafer Scale Inregration, G. Saucier and J. Trilhe, Eds. Amsterdam: Elsevier,

A. M. Dewey and S . W. Director, “Yoda: A framework for the conceptual design VLSI systems,” in Proc. Int. Conf. Computer-Aided Desinn, Nov. 1989, pp. 380-383.

Amsterdam: Elsevier, 1990, pp. 73-86.

1986, pp. 237-251.

[I21 S. Yy Kung, VLSI Array Processors. Englewood Cliffs, NJ: Pren- tice-Hall, 1988.

1131 S . Y. Kuo and K. C . Wang, “Fault diagnosis in reconfigurable VLSI and WSI processor arrays,” J . VLSI Signal Processing, vol. 2, pp. 169-183, 1990.

(141 S . Y. Kuo and I. Y. Chen, “Efficient reconfiguration algorithms for degradable VLSI/WSI arrays,” in Proc. In?. Con$ Wafer Scale In- tegration, Jan. 1991, pp. 120-126.

[IS] L. Snyder, “Introduction to the configurable, highly parallel com- puters,” IEEE Computer, vol. 15, pp. 47-56, Jan. 1982.

[ 161 D. K. Pradhan, “Fault-tolerant multiprocessor and VLSI-based system communication architectures,” in Fault-tolerant Computing: Theory and Techniques, vol. 11, K. Pradhan, Ed. Englewood Cliffs, NJ: Prentice-Hall, 1986, pp. 547-567.

[I71 R. Lipsett, C. F. Schaefer, and C. Ussery, VHDL: Hardware De- scription and Design.

[18] D. R. Coelho, The VHDL Handbook. Norwell, MA: Kluwer, 1989. [I91 F. J. Hill and G. R. Peterson, Digital Systems: Hardware Organi-

zation and Design, 3rd ed. [20] G. Chevalier and G. Saucier, “A programmable switch matrix for the

wafer scale integration of a processor array,” in Proc. Workshop Wafer Scale Integration, 1986, pp. 92-100.

[21] K. S . Hedlund and L. Snyder, “Systolic architectures-A wafer scale approach,” in Proc. IEEE Int. Con$ Computer Design: VLSI in Com- puters, Oct. 1984, pp. 604-610.

[22] H. T. Kung and M. S . Lam, “Wafer-scale integration and two-level pipelined implementations of systolic arrays,” J . Parallel and Dis- tributed Computing, pp. 32-63, 1984.

[23] V. S. S . Nair and J. A. Abraham, “A model for the analysis, design and comparison of fault-tolerant WSI architectures,” in Wafer Scale Integration, Ill, M. Sami and F. Distante, Eds. Amsterdam: Else- vier, 1990, pp. 99-108.

[24] M. Wesseling and U. Ramacher, “Evaluation and comparison of selected WSI reconfiguration architectures in terms of yield and yield per area,” in Wafer Scale Integration, I l l . M. Sami and F. Distante, Eds.

[25] H . S . Bolouri and R. M. Lea, “Evaluation of the cost-effectiveness

Norwell, MA: Kluwer, 1989.

New York: Wiley, 1987.

Amsterdam: Elsevier, 1990, pp. 87-98.

WANG AND KUO: COMPUTER-AIDED MODELING AND EVALUATION I97

of fault tolerant strategies for wager-scale integration,” in Wafer Scale Integration, I I , R. M. Lea, Ed. Amsterdam: Elsevier, 1988, pp. 125- 133. S. M. Rubin, Computer Aids for VLSl Design. Reading, MA: Ad- dison-Wesley, 1987. architectures. C. H. Stapper, F. M. Armstrong, and K. Saji, “Integrated circuit yield statistics,” Proc. IEEE, vol. 71, pp. 453-470, Apr. 1983. 1. Koren and C. H. Stapper, “Yield models for defect-tolerant VLSI circuits: A review,” in Defect and Fault Tolerance in VLSI System, vol. 1, I. Koren, Ed.

rectorate General of Telecommunications in Taiwan. He served in the army as a second lieutant commudication platoon leader from 1978 to 1980. His research interests include testing and computer-aided design of VLSI/WSI systems, fault-tolerant computing, parallel processing, and multiprocessor

New York: Plenum, 1989, pp. 1-21. _. C. H. Stapper, “Block alignment: A method for increasing the yield of memory chips that are partially good,” in Defect and Fault Tol- erance in VLSI System, vol. l , l. Koren, Ed. New York: Plenum, 1989, pp. 243-255. C. H. Stapper, “Simulation of spatial fault distributions for integrated circuits yield estimations,” IEEE Trans. Computer-Aided De- sign, vol. 8, pp. 1314-1318, Dec. 1989.

Sy-Yen Kuo (S’SS-M’SS) received the B.S. degree in electrical engineering from National Tai- wan University, Taipei, Taiwan, in 1979, the M.S. degree in electrical and computer engineering from the University of Califomia at Santa Bar- bara in 1982, and the Ph.D. degree in computer science from University of Illinois at Urbana- Champaign in 1987.

He is currently an Associate Professor in the Department of Electrical Engineering, National Taiwan Universitv. He was a faculty member in

Kuochen Wang received the B.S. degree in control engineering from National Chiao Tung Uni- versity, Taiwan, in 1978 and the M.S. and Ph.D degrees in electrical engineering from the Univer- sity of Arizona in 1986 and 1991, respectively.

He is currently an Associate Professor in the Department of Computer and Information SCI- ence, National Chiao Tung University, Taiwan.

From 1980 to 1984, he worked on network management, and the design and implementation of the Toll Trunk Information System at the Di-

the Department of Electrical and Computkr Engineering at the University of Anzona from 1988 to 1991 and was an engineer at Fairchild Semicon- ductor and Silvar-Lisco from 1982 to 1984. In 1989, he also worked as a summer faculty fellow at the Jet Propulsion Laboratory of the Califomia Institute of Technology. His current research interests include fault-tolerant corrlputing, parallel processing, and computer-aided design and testing of VLSI systems.

Dr. Kuo was the recipient of the Best Paper Award in the simulation and test category at the 1986 IEEE/ACM Design Automatic Conference (DAC), a National Science Foundation Research Initiation Award in 1989, and an IEEE/ACM Design Automation Scholarship in 1990 and 1991.

Documents

Reconfigumble VLSI Processor Arrays with VHDL · Reconfigumble VLSI Processor Arrays with VHDL Kuochen Wang and Sy-Yen Kuo, Member, IEEE Abstract-In this paper, we present an integrated