13
PARCIS: a robust parallel VLSI circuit simulator P. Linardis * , I. Vlahavas 1 Department of Informatics, Aristotle University of Thessaloniki, 54006 Thessaloniki, Greece Received 1 May 1998; received in revised form 2 November 1998 Abstract The accurate verification of VLSI circuits is essential for their successful and economic pro- duction but is an extremely time consuming process for the large circuits of today. This paper describes a robust parallel circuit simulator, PARCIS, designed for a message passing multi- processing system. It uses a demand driven technique, based on the analysis of hierarchically partitioned circuits. The computation time is reduced by decoupling the circuit equations and distributing the computational load over many processors. On each processor, the circuit blocks, compacted in hierarchical levels, are analyzed asynchronously according to their tem- poral activity. Currently the PARCIS system is running on a network of transputers. To dem- onstrate the eectiveness of the proposed simulation program, results are presented for the simulation of typical digital circuits, showing that the execution time decreases in a constant rate as the number of processors (transputers) increases. Ó 1999 Published by Elsevier Science B.V. All rights reserved. Keywords: VLSI; CAD; Parallel; Simulator; Circuit 1. Introduction As the integrated circuit fabrication technology advances, the complexity of VLSI circuits increases. The accurate verification of such circuits is essential for their suc- cessful and economic production but is an extremely time consuming procedure. In the past, a number of simulators have been developed for this purpose. Logic simulators, with their fast responses, are valuable for debugging the higher levels of a design. Still, there are many cases where the information delivered by a circuit level – transient – simulator is indispensable. For example, only transient simulators can predict with accuracy the time domain responses of signals in: (a) critical paths Simulation Practice and Theory 7 (1999) 91–103 * Corresponding author. Fax: +30 31 998419; e-mail: [email protected] 1 E-mail: [email protected] 0928-4869/99/$ – see front matter Ó 1999 Published by Elsevier Science B.V. All rights reserved. PII: S 0 9 2 8 - 4 8 6 9 ( 9 8 ) 0 0 0 2 0 - 2

PARCIS: a robust parallel VLSI circuit simulator

Embed Size (px)

Citation preview

PARCIS: a robust parallel VLSI circuit simulator

P. Linardis *, I. Vlahavas 1

Department of Informatics, Aristotle University of Thessaloniki, 54006 Thessaloniki, Greece

Received 1 May 1998; received in revised form 2 November 1998

Abstract

The accurate veri®cation of VLSI circuits is essential for their successful and economic pro-

duction but is an extremely time consuming process for the large circuits of today. This paper

describes a robust parallel circuit simulator, PARCIS, designed for a message passing multi-

processing system. It uses a demand driven technique, based on the analysis of hierarchically

partitioned circuits. The computation time is reduced by decoupling the circuit equations and

distributing the computational load over many processors. On each processor, the circuit

blocks, compacted in hierarchical levels, are analyzed asynchronously according to their tem-

poral activity. Currently the PARCIS system is running on a network of transputers. To dem-

onstrate the e�ectiveness of the proposed simulation program, results are presented for the

simulation of typical digital circuits, showing that the execution time decreases in a constant

rate as the number of processors (transputers) increases. Ó 1999 Published by Elsevier

Science B.V. All rights reserved.

Keywords: VLSI; CAD; Parallel; Simulator; Circuit

1. Introduction

As the integrated circuit fabrication technology advances, the complexity of VLSIcircuits increases. The accurate veri®cation of such circuits is essential for their suc-cessful and economic production but is an extremely time consuming procedure.

In the past, a number of simulators have been developed for this purpose. Logicsimulators, with their fast responses, are valuable for debugging the higher levels of adesign. Still, there are many cases where the information delivered by a circuit level ±transient ± simulator is indispensable. For example, only transient simulators canpredict with accuracy the time domain responses of signals in: (a) critical paths

Simulation Practice and Theory 7 (1999) 91±103

* Corresponding author. Fax: +30 31 998419; e-mail: [email protected] E-mail: [email protected]

0928-4869/99/$ ± see front matter Ó 1999 Published by Elsevier Science B.V. All rights reserved.

PII: S 0 9 2 8 - 4 8 6 9 ( 9 8 ) 0 0 0 2 0 - 2

within VLSIs required to meet today's demands for high clock rates, (b) fast bipolarchips used for the manufacture of high speed computers, and (c) modern mixed an-alog±digital circuits. However, simulation methods as used in SPICE [14] are limitedin the size of circuits that can be simulated because of the large computation timethey require.

To overcome the problem of computation time a series of improvements havebeen added to circuit simulators. It is recognized that the elements of practical cir-cuits are not fully interconnected [9] and this led, in the past, to the developmentof Sparse Matrix Techniques [3,7] with superlinear time complexity and further intoBlock Diagonal Matrix Techniques [16]. Techniques for speeding up the convergenceof the nonlinear circuit equations have been proposed [15] as well as techniques sup-porting a hierarchical approach to the analysis of the circuit [12,13,5,2,19].

Latency exploitation, e.g. the exploitation of temporal inactivity [11,18,12,13,25,20], was another improvement. The sti�ness [3] of the circuit di�erential equationshas the e�ect that during intervals of the simulation time, certain parts of a large dig-ital circuit (blocks) remain inactive and need not be recomputed for these timepoints, thus reducing the computation load on the CPU.

Exploitation of the particular characteristics of special circuits like MOS VLSIhas led to further improvements on the computational time. The strong diagonaldominance [16] of the matrix equations of MOS circuits allows these simulators toemploy faster equation solution methods [10,6,22], but at the cost of limiting eitherthe accuracy of the results or the scope of the circuit categories to be validly simu-lated.

The only way to further speed up the simulation process without sacri®cing thesolution accuracy or the range of candidate circuits is either to use a faster algorithmor to use higher performance computers [17]. This paper describes a robust ParallelCircuit Simulator ± PARCIS ± designed for a message passing multiprocessing sys-tem. It uses a demand driven technique, based on the analysis of hierarchically par-titioned circuits. This was made possible by developing a method that allows thedecoupling of the circuit equations, without loosing accuracy, into hierarchicallypartitioned blocks and which method further allows the compression of block data.By distributing the computational load and by exploiting the temporal block inactiv-ity (latency) it becomes feasible to get accurate circuit responses in reasonable time.It is noted that PARCIS is:

(a) a robust simulator, in the sense that it uses a direct solution method for thecircuit equations so that its speed or accuracy are not hindered by the circuit cat-egory or the degree of coupling between sub-circuits, and(b) a fully electrical simulator, because it can simulate all primitive electrical de-vices like inductors, capacitors, transistors etc.The rest of the paper is organized as follows. Section 2 is dedicated to the theo-

retical aspects of partitioning and computing the circuit equations. In Section 3,the structure of this simulator is explained brie¯y. In Section 4, implementation is-sues and performance results obtained by running various circuit examples in atransputer system, are presented. Section 5 concludes the paper and outlines the fu-ture directions.

92 P. Linardis, I. Vlahavas / Simulation Practice and Theory 7 (1999) 91±103

2. The PARCIS computation method

VLSI circuits and in general large digital systems consist of numerous modules in-terconnected to perform certain logic functions. In order to simulate a system the be-havior of each module must be described at a convenient level of abstraction. In thiswork because of the detail required we concentrate on the lowest level behavior ofthe system, i.e. the electrical behavior. By treating the modules as electrical objectswe try to e�ciently compute the waveforms of the voltages v(t) of the correspondingcircuit.

In the following we describe our approach for circuit partitioning, the de®nitionand computation of the circuit equations and exploitation of latency.

2.1. Circuit partitioning

One of the usual approaches to fast circuit simulation is to formulate the circuitequations and then solve the system by partitioning them into small blocks [11,13].In the present work, instead of tearing a given system apart, the synthesis of the sys-tem made up from a hierarchy of modules is emphasized. This top down approach ismore bene®cial than the tearing method because it permits the hierarchical formula-tion and computation of a circuit, once the rules for interconnecting electrical mod-ules have been established. It also permits the extension of the simulator to includemodules with di�erent levels of description (i.e. functional) or to mix di�erent de-scriptions of a module into the same run.

As module we consider here an object consisting of a number of componentswhich may be modules themselves, called child modules, and/or primitive elementslike transistors, diodes, resistors, inductors, capacitors, etc. This hierarchical modu-lar structure of the system may be represented by a tree (Fig. 1). Each tree node cor-responds to a module. The root of the tree is termed module main (level 0 module orM0) and the leaf modules consist solely of primitive elements.

Fig. 1. Hierarchy of modules.

P. Linardis, I. Vlahavas / Simulation Practice and Theory 7 (1999) 91±103 93

A module Mk communicates with its parent Mkp through electrical signals, voltag-es v and currents i, appearing on circuit nodes which are common to both the parentmodule and the child. These connecting nodes are called boundary (external) nodesof the child. The module may also contain nodes not common with any other mod-ule; these are called internal nodes.

2.2. Module equations

In electrical terms a module is a composite nonlinear dynamic multiport element.By de®nition, it communicates with its parent via the set ne of boundary (external)nodes and it is subject to no external in¯uences other than the set of currents Ie en-tering these nodes from the parent module and the set of common node voltages Ve.It also contains a set ni of internal nodes.

The behavior of a module Mk may be described by a set of nonlinear di�erentialequations [11,13]:

fe�ve; _ve; vi; _vi; t� � Ie; �1a�fi�ve; _ve; vi; _vi; t� � 0; �1b�

where ve, vi are the vectors of voltages at the boundary and internal nodes, _ve; _vi theirtime derivatives, Ie the vector of boundary currents and t is the time.

The Nodal Equation formulation method [3,4] adopted in PARCIS (Eqs. (1a)and (1b)), instead of the more general Modi®ed Nodal Approach [8] or the TableauApproach [7], is not limiting its modeling capabilities. PARCIS may for examplesimulate inductors and a model of a pure inductor is already included. It containsalso a modeling extension in the form of a hypothetical electrical device called ``gy-rator'' [4]. This device permits circuit currents to appear in Eqs. (1a) and (1b) dis-guised as voltage variables, allowing the user to build models of devices whichhave inductive behavior like transformers and electrical motors. In this sense PAR-CIS may simulate complete electrical systems, not only devices integrated into a chip.

Integration. The di�erential equations (1) are discretized in time [3] by applyingnumerical integration formulas of the form:

_xt�1 � aÿ1xt�1 �Xk

i�0

aixtÿi � hXl

i�0

bi _xtÿi; �2�

where ai, bi are constants and h is the integration step size.The di�erential equations of a circuit are notorious for their sti�ness, i.e. for the

wide spread of their time constants [3]. This characteristic imposes severe problemson the stability of numerical integration methods and restricts the size of the integra-tion step. For this reason Gear's [23] implicit (aÿ1 ¹ 0) second order formula was se-lected for this simulator in order to secure conditions of A-stability [3] for theintegration process.

Linearization. Eqs. (1a) and (1b) of a module Mk after discretization at time tn,which corresponds to the time moment t + 1 of Eq. (2), will become a set of nonlin-ear algebraic equations of the form:

94 P. Linardis, I. Vlahavas / Simulation Practice and Theory 7 (1999) 91±103

Fe�ve; vi� � Ie; �3a�Fi�ve; vi� � 0: �3b�

In Eqs. (3a) and (3b) all time derivatives have been substituted by formulas (2) andthe only values left to be computed is the vector v� col(ve, vi) at time tn. Here, forsimplicity reasons, the time index has been dropped.

Linearization of Eqs. (3a) and (3b) around a point v0 � col(v0e, v0i) will give the ma-trix equations [11,13]:

Geeve � Geivi � ce � Ie; �4a�Gieve � Giivi � ci; �4b�

where ce, ci are vectors of constants and Gee, Gei, Gie, Gii are the Jacobian matrices:

Gee � oFe

ove

Gei � oFe

ovi

;

Gie � oFi

ove

Gii � oFe

ovi

:

Eqs. (4a) and (4b) may be written in more compact form:

Gee Gei

Gie Gii

� �ve

vi

���� ���� � ce

ci

���� ����� Ie

0

���� ����: �5�

Eq. (5) is transformed into the equivalent pair of matrix equations:

Ie � �Gee ÿ GeiGÿ1ii Gie�ve � �GeiGÿ1

ii ci ÿ ce�; �6a�vi � �Gÿ1

ii ci� ÿ �Gÿ1ii Gie�ve: �6b�

This transformation de®nes a module M̂k equivalent to the linearized module Mk

de®ned by Eq. (5). Module M̂k may be considered as a linear multiport element con-nected to the parent module Mkp and de®ned by the smaller set of Eq. (6a). It is im-portant to note that in the present formulation this connection, as will be explained,is e�ected by a simple addition of the matrix coe�cients of Eq. (6a) to the contentsof the corresponding coe�cient places of Eq. (5) of the linearized parent moduleM̂kp.

Let ne be the set of boundary nodes of Mk, which is also a subset of the nodes ofMkp, and let also gmn and im be the coe�cients of the matrix and constant vector ofEq. (6a), i.e.

�gmn� � �Gee ÿ GeiGÿ1ii Gie�; �im� � �GeiGÿ1

ii ci ÿ ce�; m; n 2 ne: �7�The set of the gmn and im values for a module or primitive element is called its stamp.Since gmn and im refer only to nodes which are common to the parent module Mkp

Eq. (6a) designates that the currents Ie leaving Mkp may be accounted for by thesuperposition on the common nodes ne of Mkp of: (a) the constant currents im and (b)the currents induced by the conductances gmn. Let Àkp be the mapping of theboundary nodes nek of a module Mk into the nodes nkp of Mkp. The superposition is

P. Linardis, I. Vlahavas / Simulation Practice and Theory 7 (1999) 91±103 95

e�ected by the numerical addition of parameters gmn and im of M̂k into the places(Àkp(m),Àkp(n)) of the matrix and Àkp(m) of the vector c� col(ce, ci) of Eq. (5) be-longing to Mkp.

When the contributions from all children M̂k have been included into the parentmodule Mkp a set of matrix equations, of the form (5), is produced for Mkp. ThenEq. (5) is transformed into the form of Eqs. (6a) and (6b) forming the equationsof M̂kp.

The above process is repeated for the parent of each module until, ®nally, the lastparent is reached i.e. the root module M0. Module M0, being the root of the tree, isneither in¯uenced by external currents Ie (Ie� 0) nor does it contain any set of boun-dary nodes ne. Its corresponding form (5) is reduced to

Giivi � ci �8�

from which vi is computed.Starting from the computed values vi of module M0 the ± inverse ± process of up-

dating the internal values of the modules (backsubstitution) is carried out.Let vkp be the computed values for module Mkp and let vek be the subset corre-

sponding to the boundary nodes of its child Mk. Then the internal parameters vik

of module Mk, for which vk � col(vek, vik), are computed from Eq. (6b) correspondingto Mk

vik � �Gÿ1iik

cik � ÿ �Gÿ1iik

Giek �vek : �9�

The process of backsubstitution is continued until the internal nodes of all leafmodules are evaluated. Note that Eq. (9) may be generalized to include moduleM0, considering that ve� 0.

The process of linearizing Eqs. (3a) and (3b) and solving the resulting linear sys-tem of equations, that is forming Eqs. (4a), (4b), (5), (6a) and (6b) Eqs. (7)±(9),forms a so-called ``Newton'' iteration [3]. By repeatedly applying this process tothe whole circuit, the system of nonlinear equations (3) is solved.

2.3. Latency

The modular formulation of the circuit is very suitable for the exploitation ofcircuit latency. The concept of latency [11±13,18,25,20] may be explained as follows:let the vector vk represent an instance of the variables designating the response of aset S of certain circuit components and let vk�1 represent the next instance of thesevariables, where ``next'' is explained later. Then kvk ÿ vk�1k, where k � k is some vec-tor norm, may be considered as a measure of the activity of this set of components.Whenever kvk ÿ vk�1k < e, where e is a small number, the components in S are con-sidered latent. When S covers all the components of a module then this module istermed latent. In this work, two types of latency are distinguished for a module[13]:

96 P. Linardis, I. Vlahavas / Simulation Practice and Theory 7 (1999) 91±103

· The ``physical'', with respect to time, or ``horizontal'' latency meaning that thestate of this module is not changing for a certain time interval or that the changeis so small that it may be ignored.

· The ``Newton'' or ``vertical'' latency meaning that Eqs. (4a) and (4b) of a module,the linearized version of Eqs. (3a) and (3b), do not change inside the loop of New-ton iterations as used for solving the nonlinear equations of the whole circuit.In the case of Newton latency the behavior of a latent module may be su�ciently

represented with the linear model of the compact set of Eqs. (6a) and (6b), thus re-ducing the computational complexity of the circuit. When the nonlinear equations ofthe whole circuit need to be solved (through Newton iterations) the matrix and con-stant vector coe�cients gmn and im (7) of the latent module are not recomputed butthe previous set of them is substituted in the corresponding places of the parent mod-ule.

Latency, as applied here, is primarily an algorithmic characteristic. However,physical latency implies Newton latency as well. Thus physical latency, an observablecircuit characteristic, is usually expected to lead to algorithmic latency.

3. The PARCIS system

The proposed PARCIS computational model is designed for a multiprocessor sys-tem. The PARCIS abstract system comprises two types of Processing Elements(PEs). The master PE (named PE0) and a number of slave PEs (Fig. 2). Emphasiswas put on to minimizing the communication delays between the processing elementsPE0±PEn. The communication between PEs is asynchronous and the messages aretailored to be as short as possible. Each PE, with a slight modi®cation for PE0 (mas-ter), executes the following tasks:1. Circuit Processing Task: It processes the given circuit modules.2. Comm. A: It serves the communication from low to high address PEs (message

routing, etc.) and also has high priority access to the data lists of the local CircuitProcessing Task.

3. Comm. B: It is similar to ``Comm. A'' but serves the communication in the oppo-site direction.The Circuit Processing Task implements the operations of Integration, Lineariza-

tion and Latency exploitation on the parameters of each circuit module, described in

Fig. 2. The PARCIS abstract architecture.

P. Linardis, I. Vlahavas / Simulation Practice and Theory 7 (1999) 91±103 97

Sections 2.2 and 2.3. Formally the computation of every instant of the circuit may bedescribed recursively by de®ning the processes Module_Compact and Module_Up-date:

Module_Compact (module);{Initialize storage space for coe�cients of Eq. (5)Scan Primitive_Element List:

{Discretize and linearize element parametersAdd element stamp into coe�cient places of Eq. (5)}

Scan Child_Module List:{IF child_module is NOT latent THEN Module_Compact (child_module)Add child_module stamp Eq. (7) into coe�cient places of Eq. (5)}

Transform Eq. (5) into form (6)}

Module_Update (module);{Evaluate internal nodes from Eq. (9)Scan Child_Module List:

{ Module_Update (child_module) }}

The algorithm for computing the circuit then becomes:

Initialize_ParametersRepeat

RepeatModule_Compact (``main'') (* Forward Compaction of Circuit *)Module_Update (``main'') (* Backward Updating of Modules *)

Until (Newton_Convergence)Next_Time_Step (* Calculate next integration step and time *)

Until (end_of_simulation_time)

The PARCIS software architecture adapts quite naturally to a distributed multi-processor system like a transputer system or a network of workstations and it mayalso be implemented in a common shared memory parallel machine. Currently it hasbeen implemented in Pascal language for a network of transputers. The communica-tion modules of Fig. 2 were designed to completely bypass the high level communi-cations o�ered by the operating system. When the inherent communication facilitiesof a distributed operating system are used, PARCIS may be easily ported to a mul-

98 P. Linardis, I. Vlahavas / Simulation Practice and Theory 7 (1999) 91±103

tiprocessor system by replacing the Comm. A and Comm. B communication tasks bysimple read/write instructions.

4. Implementation and performance results

The PARCIS system has been currently implemented on a network of trans-puters, which consists of a front-end processor acting as a host and ®ve T800 typetransputers, mapping each PE to a separate transputer module. In this network onlyone transputer (master), on which PE0 is mapped, communicates directly with thehost. In this implementation the transputer links are con®gured so as to form a linearchain (Fig. 2). This topology was selected due to the limited number of transputerunits, while for larger transputer networks, other topologies (e.g. tree) may be pref-erable, depending on the load distribution.

Speci®c software for the management at the system level and for data exchangehas been developed, instead of using existing operating systems, to optimize the com-munication performance. Internal task communication is optimized too, using acombination of communication pipes and shared bu�ers regulated by semaphores.

The PARCIS software has been developed in Pascal language for transputers (3LPascal), which supports extensions for handling low level communication betweenthe transputer modules. Information is exchanged in the form of variable length mes-sages where each message packet has the structure:

=Address=Command=MessageLength=Message=

Address and Command are single bytes denoting respectively the destination trans-puter address and the command to be immediately executed. Message Length de-notes the number of bytes of the following ``Message'' and Message is a train ofbytes containing the information to be transmitted. The message routing is done bythe Communication modules.

In order to measure the performance of the proposed system, several example cir-cuits including TTL, ECL and BiCMOS digital circuits and bipolar analog circuitswere simulated. In the following we present the CPU timings of two typical digitalcircuits, using the transistor level descriptions of the gates:

(a) One containing 60 TTL NAND gates where each gate is represented by its fulltransistor model [24] shown in Fig. 3, the circuit containing in total 480 transis-tors, and(b) One containing four J±K Flip-Flops, each FF having the structure shown inFig. 4 [24] with the gates composed of circuit models similar to that of Fig. 3,the total circuit consisting of 256 transistors.Table 1 presents the CPU times needed for the simulation of the example consisting

of 60 TTL NAND gates (Fig. 3) with respect to the number of PEs as well as the dis-tribution of workload, i.e. the number of NAND circuits allocated to each PE. Case 1in this table, represents the sequential execution, e.g. the whole circuit is loaded ontoone PE for reasons of comparing the parallel execution with the execution in a single

P. Linardis, I. Vlahavas / Simulation Practice and Theory 7 (1999) 91±103 99

CPU. Case 2 refers to a system with two PEs and so on. As it is shown in Fig. 5, thespeedup increases almost linearly as the workload is evenly distributed to more PEs.

Table 2 presents the CPU time and load distribution for the simulation of the ex-ample consisting of four J±K Flip-Flops. The distribution of workload is not bal-anced in order to show the communication overheads. That is to say PE0 is moreloaded that the other PEs in cases 2±4. In case 5 PE0 is left to supervise only the oth-er PEs so the corresponding place in Table 2 is empty. As it is shown, the CPU timedecreases linearly, on average 47 s per J±K Flip-Flop, as the number of participatingPEs increases and more J±K Flip-Flops are unloaded from PE0 and distributed to

Fig. 4. A J±K Flip-Flop circuit.

Fig. 3. A TTL NAND gate circuit. Light lines indicate the gate module and its internal hierarchical divi-

sion, as used in the present example.

100 P. Linardis, I. Vlahavas / Simulation Practice and Theory 7 (1999) 91±103

other PEs. This almost linear speedup is due to the small communication overheadthat stems from the coarse grain granularity of the simulation algorithm.

In the current implementation of the PARCIS system the level of the granularityis controlled by the user who de®nes the distribution of modules to PEs. Referring toFig. 1, the interprocessor communication is reduced as the modules allocated to aPE belong to consecutive branch positions.

For reasons of comparing the e�ciency of PARCIS with that of known simula-tors the example of a master±slave Flip-Flop (Fig. 6) was tested by running this ex-ample: (a) on a single transputer PE, (b) with a serial version of PARCIS on a SUNsparc station 4 system, and (c) with Berkeley SPICE 3 on the same SUN system, un-der the same simulation conditions. It is noted that each NAND gate of the testedcircuit is a module like that of Fig. 3. The CPU times are presented in Table 3

Table 1

Distribution of workload for a 60 TTL NAND gate circuit and CPU time

Case PE 0 PE 1 PE 2 PE 3 PE 4 CPU time (s)

1 60 ± ± ± ± 2742

2 30 30 ± ± ± 1463

3 20 20 20 ± ± 1021

4 15 15 15 15 ± 799

5 12 12 12 12 12 674

Fig. 5. Simulation of a 60 TTL NAND gate circuit, speed up vs. no of PEs.

Table 2

Distribution of workload for a four 4 J±K Flip-Flop circuit and CPU time

Case PE 0 PE 1 PE 2 PE 3 PE 4 CPU time (s)

1 4 ± ± ± ± 260

2 3 1 ± ± ± 210

3 2 1 1 ± ± 160

4 1 1 1 1 ± 120

5 ± 1 1 1 1 70

P. Linardis, I. Vlahavas / Simulation Practice and Theory 7 (1999) 91±103 101

and show that PARCIS, in its current experimental implementation, is only 50%slower than SPICE 3. The transputer implementation of PARCIS is much slowerdue to the low hardware performance (CPU clock rate 25 MHz).

5. Conclusion and future work

We have presented the PARCIS system, a robust parallel circuit simulation algo-rithm, based on the analysis of hierarchically partitioned circuits. A circuit system isdecomposed in simpler modules, which in turn are decomposed in simpler ones, con-stituting of a hierarchy of modules represented by a tree. The equations describingthe circuit are decoupled and are distributed to the nodes of a multiprocessor system,reducing su�ciently the computation time.

The results obtained by the implementation of the system in a network of trans-puters encourage us to continue extending our system to support multilevel simula-tion, hybrid circuits (i.e. analog and digital on the same chip) and incorporate otherproposed methods as for example Broyden's method [1] to generate, as suggested inRef. [2], stamps for models described at the behavioral level.

Current work includes the implementation of the system in a multiprocessor envi-ronment. To run a simulation on a given number of PEs the implementation createsan equal number of UNIX processes. Each PE (process) comprises a circuit process-ing unit and a simple interface to handle the communication with the other PEs. Thistype of implementation of the PARCIS software architecture is easily adaptable ando�ers almost a straightforward port to common commercial, parallel machines.

References

[1] C.G. Broyden, A class of methods for solving nonlinear simultaneous equations, Mathematics of

Computation 19 (92) (1965) 577±593.

Fig. 6. A master-slave Flip-Flop circuit.

Table 3

Comparison of CPU times for PARCIS and SPICE

Circuit Transputer PARCIS (s) SUN PARCIS (s) SUN SPICE (s)

Master±slave Flip-Flop 384 45 22

Group of 12 master±slave Flip-Flops 4651 510 247

102 P. Linardis, I. Vlahavas / Simulation Practice and Theory 7 (1999) 91±103

[2] G. Casinovi, J.M. Yang, Multi-level simulation of large analog systems containing behavioral models,

IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems 13 (1994) 1391±1399.

[3] L.O. Chua, P.M. Lin, Computer Aided Analysis of Electronic Circuits, Prentice-Hall, Englewood

Cli�s, NJ, 1975.

[4] C. Desoer, E. Kuh, Basic Circuit Theory, McGraw-Hill, New York, 1969.

[5] W. Engl, R. Laur, H. Dirks, MEDUSA ± a simulator for modular circuits, IEEE Trans. Computer-

Aided Design of Integrated Circuits and Systems CAD-1 (1982) 85±93.

[6] W. Fang, M.E. Mokari, D. Smart, Robust VLSI circuit simulation techniques based on overlapped

waveform relaxation, IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems 14

(1995) 510±518.

[7] G. Hachtel, R. Brayton, F. Gustavson, The sparse tableau approach to network analysis and design,

IEEE Trans. Circuit Theory CT-18 (1971) 101±113.

[8] C. Ho, A. Ruehli, P. Brennan, The modi®ed nodal approach to network analysis, IEEE Trans.

Circuits and Systems CAS-25 (1975) 504±509.

[9] G.P. Jessel, Network statistics for computer-aided network analysis, IEEE Trans. Circuit Theory CT-

20 (1973) 635±640.

[10] E. Lelarasmee, A.E. Ruehli, A. Sangiovanni-Vincentelli, The waveform relaxation method for time-

domain analysis of large scale integrated circuits, IEEE Trans. Computer-Aided Design of Integrated

Circuits and Systems CAD-1 (1982) 131±145.

[11] P. Linardis, K.G. Nichols, E.J. Zaluska, Network partitioning and latency exploitation in time-

domain analysis of nonLinear electronic circuits, in: Proc. 1978 IEEE Int. Symposium on Circuits

and Systems, New York, USA, 1978, pp. 510±514.

[12] P. Linardis, K.G. Nichols, Partitioning with latency exploitation in the time-domain analysis of large

nonlinear electronic circuits, in: Proc. IEE International Conference on Computer Aided Design and

Manufacture of Electronic Components, Circuits and Systems (CADMECCS), Brighton, UK, 1979,

pp. 105±109.

[13] P. Linardis, Partitioning and latency exploitation in the time-domain analysis of large electronic

circuits, Ph.D. Thesis, Dept. of Electronics, Univ. Southampton, UK, 1979.

[14] L.W. Nagel, SPICE 2, a computer program to simulate semiconductor circuits, Tech. Rep. ERL-

M520, Electron. Res. Lab., Univ. of California, Berkeley, May 1975.

[15] E. Ngoya, J. Rousset, J. Obregon, Newton±Raphson iteration speed-up algorithm for the solution of

nonlinear circuit equations in general-purpose CAD programs, IEEE Trans. Computer-Aided Design

of Integrated Circuits and Systems 16 (6) (1997) 638±644.

[16] J.M. Ortega, W.C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables,

Academic Press, New York, 1970.

[17] L. Peterson, S. Mattisson, The design and implementation of a concurrent circuit simulation program

for multicomputers, IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems 12

(1993) 1004±1014.

[18] N.B. Rabbat, H.Y. Hsieh, Concepts of latency in the time domain solution of nonlinear di�erential

equations, in: Proc. 1978 IEEE Int. Symposium on Circuits and Systems, New York, USA, 1978, pp.

813±825.

[19] R. Saleh, B. Antao, J. Singh, Multilevel and mixed-domain simulation of analog circuits and systems,

IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems 15 (1) (1996) 68±82.

[20] R. Saleh, A. Newton, The exploitation of latency and multirate bahavior using nonlinear relaxation

for circuit simulation, IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems 8

(1989).

[22] S. Lin, E. Kuh, M. Marek-Sadowska, Stepwise equivalent conductance circuit simulation technique,

IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems 12 (5) (1993) 672±683.

[23] H. Shichman, Integration system of a nonlinear network-analysis program, IEEE Trans. Circuit

Theory CT-17 (3) (1970) 378±386.

[24] The TTL Data Book for Design Engineers, Texas Instruments, 1973.

[25] P. Yang, I.N. Hajj, T.N. Trick, SLATE: a circuit simulation program with latency exploitation and

node tearing, in: Proc. IEEE Int. Conf. Circuits Computers, 1980, pp. 353±355.

P. Linardis, I. Vlahavas / Simulation Practice and Theory 7 (1999) 91±103 103