65
Friedrich-Alexander-Universität Erlangen-Nürnberg MAML - An Architecture Description Language for Modeling and Simulation of Processor Array Architectures Part I Alexey Kupriyanov, Frank Hannig, Dmitrij Kissler, Rainer Schaffer * ,J¨ urgen Teich Department of Computer Science 12 Hardware-Software-Co-Design University of Erlangen-Nuremberg Am Weichselgarten 3 D-91058 Erlangen, Germany Co-Design-Report 03-2006 March 14, 2006 * Dresden University of Technology, Department of Electrical Engineering and Information Techno- logy, Institute of Circuits and Systems 1

MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

Friedrich-Alexander-UniversitätErlangen-Nürnberg

MAML - An ArchitectureDescription Language for

Modeling and Simulation ofProcessor Array Architectures

Part I

Alexey Kupriyanov, Frank Hannig, Dmitrij Kissler,Rainer Schaffer∗, Jurgen Teich

Department of Computer Science 12Hardware-Software-Co-Design

University of Erlangen-NurembergAm Weichselgarten 3

D-91058 Erlangen, Germany

Co-Design-Report 03-2006

March 14, 2006

∗Dresden University of Technology, Department of Electrical Engineering and Information Techno-logy, Institute of Circuits and Systems

1

Page 2: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

Contents

1 Abstract 3

2 Introduction 3

3 Related Work 4

4 Characterization of Regular Architectures 64.1 Array-Level Architecture Specification. . . . . . . . . . . . . . . . 8

4.1.1 The<ProcessorArray> Element . . . . . . . . . . . . 84.1.2 The<PElements> Element . . . . . . . . . . . . . . . . 134.1.3 The Interconnect Domain<ICDomain> Element . . . . . 134.1.4 The<ElementsPolytopeRange> Subelement . . . . 204.1.5 The<ElementAt> Subelement . . . . . . . . . . . . . . 214.1.6 The<ElementsDomain> Subelement . . . . . . . . . . 214.1.7 The PE Class Domain<ClassDomain> Element . . . . . 21

4.2 PE-Level Architecture Specification. . . . . . . . . . . . . . . . . 224.2.1 The<PEClass> Element . . . . . . . . . . . . . . . . . . 234.2.2 The<IOPorts> Element . . . . . . . . . . . . . . . . . 254.2.3 The<Resources> Element . . . . . . . . . . . . . . . . 264.2.4 The<StorageElements> Element . . . . . . . . . . . 264.2.5 The<Resmap> Element . . . . . . . . . . . . . . . . . . 284.2.6 The<Opnames> Element . . . . . . . . . . . . . . . . . 304.2.7 The<Operations> Element . . . . . . . . . . . . . . . 354.2.8 The<Units> Element . . . . . . . . . . . . . . . . . . . 38

4.3 Simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 WPPA Design Flow in Scopes of ArchitectureComposer Frame-work 40

6 Conclusions and Future Work 42

A MAML Document Type Definition 45

B Example: WPPA Description in MAML 52

References 63

2

Page 3: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

1 Abstract

In this report, we introduce an architecture description language (ADL) for the sys-tematic characterization, modeling, simulation and evaluation of massively parallelreconfigurable processor architectures that are designed for special purpose applica-tions from the domain of embedded systems. Numerous ADLs have been developedto describe different capabilities of architectural modeling and analysis. But, unfor-tunately, there is no ADL so far which could describe massively parallel processorarrays. The ADL proposed in this report is being developed to characterize such ar-chitectures. The architectural description of the processor system is supposed to bedone according to two abstraction levels of massively parallel reconfigurable proces-sor architectures. Architectural parameters of processor elements are characterizedon the (lower)processor element leveland the interaction between processor ele-ments (i.e., interconnect topology, positioning of each PE, etc.) is described on the(higher)processor array level. Key features, grammar, and technical innovations ofthe proposed ADL are covered in this report.

2 Introduction

Today, the steady technological progress in integration densities and modern nan-otechnology will allow implementations of hundreds of 32-bit microprocessors andmore on a single die (System-on-a-Chip technology). Furthermore, the function-ality of the microprocessors increases continuously, e.g., by parallel processing ofdata with low accuracy (8 bit or 16 bit) within each microprocessor. Due to theseadvances, massively parallel data processing has become possible in portable andother embedded systems. These devices have to handle increasingly computational-intensive algorithms like video processing (H.264) or other digital signal processingtasks (3G), but on the other hand they are subject to strict limitations in their costand/or power budget. These kind of applications can only be efficiently realized if de-sign tools are able to identify the inherent parallelism of a given algorithm and if theyare able to map it into correctly functional, reliable, and highly optimized systemswith respect to cost, performance, and energy/power consumption. But, technical an-alysts foresee the dilemma of not being able to fully exploit next generation hardwarecomplexity because of a lack of mapping tools. Hence, parallelization techniques andcompilers will be of utmost importance in order to map computational-intensive al-gorithms efficiently to these processor arrays.

At all times, there was the exigence (demands at speed, size, cost, power, etc.) todevelop dedicated massively parallel hardware in terms of ASICs (Application Spe-cific Integrated Circuits). For instance, let us consider the area of image processing,where a cost-benefit analysis is of crucial importance: On a given input image, se-quences of millions of similar operations on adjacent picture elements (pixel) (e.g.,

3

Page 4: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

Co-Exploration

Retargetable MappingMethodology

Modeling, Simulation,Emulation

Algorithms

Architectural ResearchWPPE WPPE WPPE WPPE

WPPE WPPE WPPE WPPE

WPPE WPPE WPPE WPPE

WPPE WPPE WPPE WPPE

I/O I/O

I/O I/O

I/OI/O

I/OI/O

Figure 1:Co-design flow.

2- or 3-dimensional filter algorithms, edge detection, Hough transformation) have tobe computed within splits of a second. The use of general purpose parallel com-puters like MIMD or SIMD multiprocessor machines is not reasonable because suchsystems are too large and expensive. Such machines are also of no use in the con-text of mobile environments where additional criteria such as energy consumption,weight and geometrical dimensions exclude solutions with (several) general purposeprocessors.

In order to avoid huge area and thus cost overheads of general purpose comput-ers, the architecture of choice isproblem-or domain-specific. The main problemis that the development of a new architecture requires also suitable compilers andmapping tools, respectively.Parameterizable architectureand compiler co-designwill therefore be a key step in the development of such embedded systems in the fu-ture (see Figure1). The main challenges are on the one hand side the extraction ofcommon properties of regular designs independent of hardware and software, respec-tively. On the other side, theanalysis of program transformations and architectureparametersis of great importance in order to achieve highly efficient and optimizedsystems. Concepts ofretargetable compilersare needed for array-like architectures.One major milestone here is to study and understand thecorrelation and matching ofarchitectural parameters with the parameters of program transformationsas part ofsuch compilers.

3 Related Work

Many architecture description languages have been developed in the field of retar-getable compilation. In the following, we list only some of the most significantADLs. For instance, the hardware description language nML [FVF95] permits con-cise, hierarchical processor descriptions in a behavioral style. nML is used in theCBC/SIGH/SIM framework [Fau95] and the CHESS system [GVL+96]. The ma-chine description language LISA [PHM00] is the basis for a retargetable compiledsimulator approach developed at RWTH Aachen, Germany. The project focuses onfast simulator generation for already existing architectures to be modeled in LISA.Current works in the domain of multi-core system simulation [KPBT06, ACL+06]enable a co-simulation of multiple processor cores with busses and peripheral mod-ules which are described in SystemC. At the ACES laboratory of the University of

4

Page 5: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

California, Irvine, the architecture description language EXPRESSION [HGK+99]has been developed. From an EXPRESSION description of an architecture, the retar-getable compiler Express and a cycle-accurate simulator can be automatically gen-erated. The Trimaran system [Tri] has been designed to generate efficient VLIWcode. It is based on a fixed basic architecture (HPL-PD) being parameterizable in thenumber of registers, the number of functional units, and operation latencies. Param-eters of the machine are specified in the description language HMDES. MIMOLA[LM98] is one of RT-level ADLs. It was developed at the University of Kiel, Ger-many. Originally, it targeted at micro-architecture modeling and design. Some ofregister transfer level(RTL) hardware description languages were also used for mod-eling and simulation of processor architectures, i.e. UDL/I [Aka96] was developedat Kyushu University in Japan. It describes the input to the COACH ASIP designautomation system. A target specific compiler can be generated based on the instruc-tion set extracted from the UDL/I description. The instruction set simulator can alsobe generated to supplement the cycle accurate RT-level simulator. ISDL is one ofInstruction Set Description Languages. It was developed at MIT and is used by theAviv compiler [Han99] and the associated assembler. It was also used by the sim-ulator generation system GENSIM [HRD99]. The target architectures for ISDL areVLIW ASIPs. Maril is an ADL used by the retargetable compiler Marion [BHE91].It contains both instruction set information as well as coarse-grained structural in-formation. The target architectures for Maril are RISC style processors only. Thereis no distinction between instruction and operation in Maril. TDL stands for targetdescription language. It has been developed at Saarland University in Germany. Thelanguage is used in a retargetable post-pass assembly-based code optimization sys-tem called PROPAN [Kae00]. Another architecture description language is PRMDL[TPE01]. PRMDL stands for Philips Research Machine Description Language. Thetarget architectures for PRMDL are clustered VLIW architectures. Finally, we referto the Machine Markup Language (MAML) which has been developed in the BUILD-ABONG project [FTTW01]. MAML is used for the efficient architecture/compilerco-generation of ASIPs and VLIW processor architectures. For a more completeADL’s summary we point out to the surveys in [QM02], [THG+99], and [MD05].All these ADLs have in common that they have been developed for the design ofsingle processor architectures such as ASIPs which might contain VLIW execution.But, to the best of our knowledge, there exists no ADL which covers the architecturalaspects of massively parallel processor arrays. Of course, one could use hardwaredescription languages such as Verilog or VHDL but these languages are too low leveland offer only insufficient possibilities to describe behavioral aspects.

5

Page 6: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

Programmable Interconnection

WPPE WPPE WPPE WPPE WPPE WPPE

WPPE WPPE WPPE WPPE WPPE WPPE

WPPE WPPE WPPE WPPE WPPE WPPE

WPPE WPPE WPPE WPPE WPPE WPPE

WPPE WPPE WPPE WPPE WPPE WPPE

WPPE WPPE WPPE WPPE WPPE WPPE

I/O I/O I/O

I/O I/O I/O

I/OI/O

I/O

I/OI/O

I/O

i0 i1 i2 i3

InstructionDecoder

o0 o1

Input Registers/FIFOs

OutputRegisters

ALUtype1

mux mux

demux

General Purpose Regs

ip0 ip1 ip2 ip3

op0 op1

BUnit

regFlags

f0 f1

r0r1r2r3r4r5r6r7r8r9r10r11r12r13r14r15

regGP

rPor

tsw

Por

ts

regI

regO

InstructionMemory

pc

Figure 2:Example of a WPPA with parameterizable processing elements (WPPEs).A WPPE consists of a processing unit which contains a set offunctionalunits. Some functional units allow to compute sub-words in sub-word unitsin parallel. The processing unit is connected to input and output registers.A small data memory exists to temporary store computational results. Aninstruction sequencer exists as part of the control path which executes a setof control instructions from a local tiny program memory.

4 Characterization of Regular Architectures

As an example of massively parallel reconfigurable architectures we introduce a newclass of them -weakly-programmable arrays(WPPA). Such architectures consist ofan array of processing elements (PE) that contain sub-word processing units withonly very few memory and a regular interconnect structure. In order to efficiently im-plement a certain algorithm, each PE may implement only a certain function range.Also, the instruction set is limited and may be configured at compile-time or even dy-namically at run-time. The PEs are called weakly-programmable because the controloverhead of each PE is optimized and kept small. An example of such an architectureis shown in Figure2. The massive parallelism might be expressed by different typesof parallelism: (1) several parallel working weakly-programmable processing ele-ments (WPPEs), (2) functional and software pipelining, (3) multiple functional unitswithin one WPPE, and finally (4) sub-word parallelism (SWP) within the WPPEs.WPPAs can be seen as a compromise between programmability and speciality by ex-ploiting architectures realizing the full synergy of programmable processor elementsand dedicated processing units.

Since design time and cost are critical aspects during the design of processor ar-chitectures it is important to provide efficient modeling and simulation techniques

6

Page 7: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

PEClassPEClassProcessorArray

Array-Level PE-Level

...

namemaml

Figure 3:Root element<maml>.

in order to evaluate architecture prototypes without actually designing them. In thescope of the methodology presented here, we are looking for a flexible reconfigurablearchitecture in order to find out trade-offs between different architecture peculiaritiesfor a given set of applications. Therefore, a formal description of architecture prop-erties is of great importance.

In order to allow the specification of massively parallel processor architectureswe use theMAchine Markup Language(MAML) [ FTTW01] and provide extensionsthat are needed for modeling WPPAs. MAML is based on XML notation and isused for describing architecture parameters required by possible mapping methodssuch as partitioning, scheduling, functional unit and register allocation. Moreover,the parameters extracted from a MAML architectural description can be used forinteractive visualization and simulation of the given processor architecture. The fourmain constraints of well-formed XML documents were followed in order to defineMAML due to the XML standard: (i) there is exactly one root element, (ii) every starttag has a matching end tag, (iii) no tag overlaps another tag, and (iv) all elements andattributes must obey the naming constraints.

A MAML document has one root element<maml> with an attributename, spec-ifying the file name of the architecture.

Example 4.1 Root element of MAML.

<maml name="wppa.maml">...

</maml>

The architectural description of an entire WPPA can be subdivided into two mainabstraction levels, thearray-leveldescribing parameters such as the topology of theinterconnection, number and location of processor and I/O-ports, etc., and thePE-leveldescribing the internal structure of each WPPE’s type the WPPA may be com-posed of. The general structure of a MAML specification is shown in Figure3. TheMAML elements and attributes are presented by the ellipses and rectangles, respec-tively. The elements in the MAML are strictly ordered from left to right. First, the

7

Page 8: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

ProcessorArray

PElements

ICDomain ICDomain...

ClassDomain ClassDomain...

nameversion

namerows

namepeclass

PEInterconnectWrapper

namecolumns

selection

selection

Figure 4:The<ProcessorArray> element.

processor array architecture should be described on the array-level and only then fol-lows the specification of the internal structure of each WPPE’s type (PE-level). Inorder to validate a MAML-code we use a MAML Document Type Definition (DTD)which is completely listed in AppendixA.

4.1 Array-Level Architecture Specification

The array-level properties of a WPPA are described in the body of a special MAMLelement<ProcessorArray> . This element specifies the parameters of the wholeWPPA in general, i.e., the name of the WPPA, the interconnect topology, the numberand types of WPPEs, etc. For instance, if the WPPA has a mesh structure of PEs thesize of the WPPA must be given in terms of the number of columns and rows. Theinterconnect between processor array cells is one of the very important parameters ofa WPPA.

4.1.1 The <ProcessorArray> Element

The structure of the<ProcessorArray> element is shown in Figure4. Thiselement contains the attributesname andversion specifying the name of the pro-cessor array architecture and its version, respectively. It also contains a set of subele-ments:

• <PElements>

• <PEInterconnectWrapper>

• <ICDomain>

• <ClassDomain>

8

Page 9: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

c1 c1

c1 c1

c1 c1

c1 c1

c1 c1

c1 c1

c1 c1

c1 c1

c2 c2 c2 c2

c2 c2 c2 c2

c2 c2 c2 c2

c2 c2 c2 c2

c3 c3 c3 c3

c3 c3 c3 c3

c3 c3 c3 c3

c3 c3 c3 c3

ICDomain D

ICDomain d1

ICDomain d4

ICDomain d2

ICDomain d3

Figure 5: Interconnect Domains and Class Domains representation.

The multiple definition of the subelements<ICDomain> and<ClassDomain> isadmissible.

<PElements> (stands forProcessor Elements) defines the number of PEs in thewhole processor array, gives the referring name for them. The number of elementsis specified as two-dimensional array with fixed number of rows (rows attribute)and columns (cols attribute). The number of rows multiplied by the number ofcolumns is not necessarily the total number of processors within the array. Sinceas discussed above, the grid serves only as basis in order to place different typesof processors, memories, and I/O-elements. Furthermore, each grid point does notnecessarily correspond to one element because the size of the elements could bedifferent. Here, size in terms of physical area is rather subordinate but the logicalsize in terms of connectors.

<PEInterconnectWrapper> specifies aninterconnect wrapper(IW) whichwraps each processor element of the processor array. All interconnect wrappers aredirectly connected to each other via their ingoing and outgoing signal ports on eachside. Also, an interconnect wrapper describes the ingoing and outgoing signal portsof a processor element inside it, thus providing the interconnection between the PEsin the whole processor array. A schematic view of it is shown in Figure7 (the ex-planations follow in Section4.1.3). Although each processor element with possiblydifferent internal architecture is placed inside of the interconnect wrapper, the pa-rameters of the IWs are common for the entire processor array. Therefore, theseparameters are specified in the array-level of the MAML description.

<ICDomain> (Interconnect Domain) specifies the set or domain of the processorelements with the same interconnect topology. The PEs here are either the subsets ofthe set of PEs defined by the<PElements> element or another domain. Recursivedefinition is not allowed.

9

Page 10: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<ClassDomain> specifies the set or domain of the processor elements with thesame architectural structure (PE Class). The PEs here are either the subsets of theset of PEs defined by the<PElements> element or another domain. Recursivedefinition is not allowed.

In Figure5, an example of interconnect and class domains representation of a pro-cessor array architecture is presented. Four interconnect and three class domains areshown. The interconnect domaind1 contains the PEs of classc1. The interconnectdomaind2contains the PEs of classc2. The interconnect domaind3contains the PEsof classc3. And finally, the interconnect domaind4contains PEs of classc1again.

The interconnect topology for the PEs in the interconnect domaind2 is shown inFigure10. In the following, the processor array definition for the example in Figure5is listed. The complete MAML-code for this example is listed in AppendixB.

Example 4.2 Processor array definition.

<ProcessorArray name="PA" version="1.0"><PElements name="pe" rows="4" columns="12"/>

<PEInterconnectWrapper><Channels>

<Southward index="0" bitwidth="32"/><!-- data path--><Southward index="1" bitwidth="32"/><!-- data path--><Southward index="2" bitwidth="1"/><!-- control path--><Southward index="3" bitwidth="1"/><!-- control path--><Northward index="0" bitwidth="32"/><Northward index="1" bitwidth="32"/><Northward index="2" bitwidth="1"/><Northward index="3" bitwidth="1"/><Eastward index="0" bitwidth="32"/><Eastward index="1" bitwidth="32"/><Eastward index="2" bitwidth="1"/><Eastward index="3" bitwidth="1"/><Westward index="0" bitwidth="32"/><Westward index="1" bitwidth="32"/><Westward index="2" bitwidth="1"/><Westward index="3" bitwidth="1"/>

</Channels><PElementPorts>

<Inputs number="4"> <!-- max number of PE inputs--><Outputs number="4"> <!-- max number of PE outputs-->

</PElementPorts></PEInterconnectWrapper>

<ICDomain name="d1"><Interconnect type="static">

<!--Side Outputs --><!--NNNNEEEESSSSWWWWPPPP-->

<AdjacencyMatrix> <!--01230123012301230123-->

10

Page 11: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<WInput idx="0" row="00000000000000001000"/><WInput idx="1" row="00000000000000000100"/><WInput idx="2" row="00000000000000000010"/><WInput idx="3" row="00000000000000000001"/><POutput idx="0" row="00000001000000000000"/><POutput idx="1" row="00000010000000000000"/><POutput idx="2" row="00000100000000000000"/><POutput idx="3" row="00001000000000000000"/>

</AdjacencyMatrix></Interconnect><ElementsPolytopeRange>

<MatrixA row=" 1 0"/><MatrixA row="-1 0"/><MatrixA row=" 0 1"/><MatrixA row=" 0 -1"/><VectorB value=" 1"/><VectorB value="-2"/><VectorB value=" 1"/><VectorB value="-4"/>

</ElementsPolytopeRange></ICDomain><ICDomain name="d2">

<Interconnect type="static"><!--Side Outputs --><!--NNNNEEEESSSSWWWWPPPP-->

<AdjacencyMatrix> <!--01230123012301230123--><NInput idx="0" row="00000000000010001000"/><NInput idx="1" row="00000000000000000100"/><EInput idx="0" row="00000000000000000010"/><EInput idx="1" row="00000000000000000001"/><POutput idx="0" row="00000000000001000000"/><POutput idx="1" row="00000000100000000000"/><POutput idx="2" row="00000000010000000000"/><POutput idx="3" row="00000001000000000000"/>

</AdjacencyMatrix></Interconnect><ElementsPolytopeRange>

<MatrixA row=" 1 0"/><MatrixA row="-1 0"/><MatrixA row=" 0 1"/><MatrixA row=" 0 -1"/><VectorB value=" 3"/><VectorB value="-6"/><VectorB value=" 1"/><VectorB value="-4"/>

</ElementsPolytopeRange></ICDomain><ICDomain name="d3">

<Interconnect type="static"><!--Side Outputs -->

11

Page 12: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<!--NNNNEEEESSSSWWWWPPPP--><AdjacencyMatrix> <!--01230123012301230123-->

<WInput idx="0" row="00000000000000011000"/><WInput idx="1" row="00000000000000001100"/><WInput idx="2" row="00000000000000000110"/><WInput idx="3" row="00000000000000000001"/><POutput idx="0" row="00000001000001000000"/><POutput idx="1" row="00000010000100000000"/><POutput idx="2" row="00000100000010010000"/><POutput idx="3" row="00001000001000000000"/>

</AdjacencyMatrix></Interconnect><ElementsPolytopeRange>

<MatrixA row=" 1 0"/><MatrixA row="-1 0"/><MatrixA row=" 0 1"/><MatrixA row=" 0 -1"/><VectorB value=" 7"/><VectorB value="-10"/><VectorB value=" 1"/><VectorB value="-4"/>

</ElementsPolytopeRange></ICDomain><ICDomain name="d4">

<Interconnect type="static"><!--Side Outputs --><!--NNNNEEEESSSSWWWWPPPP-->

<AdjacencyMatrix> <!--01230123012301230123--><WInput idx="0" row="00000000000000001000"/><WInput idx="1" row="00000000000000000100"/><WInput idx="2" row="00000000000000000010"/><WInput idx="3" row="00000000000000000001"/><POutput idx="0" row="00000001000000000000"/><POutput idx="1" row="00000010000000000000"/><POutput idx="2" row="00000100000000000000"/><POutput idx="3" row="00001000000000000000"/>

</AdjacencyMatrix></Interconnect><ElementsPolytopeRange>

<MatrixA row=" 1 0"/><MatrixA row="-1 0"/><MatrixA row=" 0 1"/><MatrixA row=" 0 -1"/><VectorB value=" 11"/><VectorB value="-12"/><VectorB value=" 1"/><VectorB value="-4"/>

</ElementsPolytopeRange></ICDomain>

12

Page 13: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<ClassDomain name="dc1" peclass="c1"><ElementsDomain instance="d1"/>

</ClassDomain><ClassDomain name="dc2" peclass="c2">

<ElementsDomain instance="d2"/></ClassDomain><ClassDomain name="dc3" peclass="c3">

<ElementsDomain instance="d3"/></ClassDomain><ClassDomain name="dc4" peclass="c1">

<ElementsDomain instance="d4"/></ClassDomain>

</ProcessorArray>

4.1.2 The <PElements> Element

The attributes:

• name

• rows

• columns

<PElements> defines the set of PEs that are used for constructing the whole pro-cessor array. Thename attribute names the set of PEs. Therows andcolumnsattributes define the 2D size of the array of PEs.

Example 4.3 2D size definition of the processor array.

<PElements name="pe" rows="4" columns="12">

The setpe with 48 PEs (4× 12 array) is defined. Each element can be referred bythe name of the set and the 2D indices. In the example above, we can refer each PEas follows:pe[1,1]..pe[4,12] .

4.1.3 The Interconnect Domain <ICDomain> Element

The interconnect domain is used to specify the interconnect topology for a set ofPEs. There are a lot of different interconnect topologies (i.e., see Figure6) but theselection one of them is always a trade-off.

The supported topology classes are:

1. Nearest neighbor topologies like line horizontal, vertical lines (grid) (Figure6(a)),honeycomb (Figure6(b,c)),

13

Page 14: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

(a) (b) (c)

I/O I/O

I/O

I/O

I/OI/O

I/O

I/O

(d) (e) (f)

Figure 6: Examples of Interconnect topologies: (a) grid, (b,c) honeycomb, (d) bus,(e) crossbar, and (f) fat tree.

2. Bus (i.e., PACT array [BEM+03], see Figure6(d)),

3. Crossbar (Figure6(e)),

4. Tree (binary,k-ary, Fat in Figure6(f)), and

5. Torus.

In order to be able to model and specify any possible interconnect topology withinMAML, a PE interconnect wrapper (IW) concept is introduced (see Figure7(a)). Aninterconnect wrapper describes the ingoing and outgoing signal ports of a processorelement. Each interconnect wrapper has a constant number of inputs and outputson each of its side which are connected to the inputs and output of neighbor IWinstances. An interconnect wrapper is represented as a rectangle around a PE andconsists of the input and output ports on the northern, eastern, southern, and westernside of it.

However, the input ports and the output ports on the opposite sides of an IW (i.e.,northern inputs and southern outputs) must have equal bitwidths and the number ofthem must be the same. Introduction of this condition proves the correct interconnec-tion between neighbor IW instances. The condition can be completely satisfied bythe introduction ofdirected interconnect channels. Each directed interconnect chan-nel represents the pair of one input and one output port on the opposite sides of theinterconnect wrapper with a certain common bitwidth. The direction of the channelis determined by the position of the output port. For example, if we consider a pair

14

Page 15: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

PE

Interconnect Wrapper

N in N out

W out

W in

Ein

Eout

Sout Sin

0 1 2. . .

......

......

. . .

. . . . . .

0 1 2

0 1 2 0 1 2

0

1

2

0

1

2

0

1

2

0

1

2

P in

P out

Eastward interconnect channel

...

...

0 1 2 3

0 1 2 3

Interconnect Adjacency Matrix

cij ← 1, if ∃ a possible connection between input and output ports,

0, otherwise;

N in

Ein

Sin

W in

P out

N out Eout Sout W out P in

· cij

(b)

(a)

· · · · · · · ·

·

·

·

·

·

·

·

·

·

i

j

Figure 7: Interconnect Wrapper (a) and Interconnect Adjacency Matrix (b).

15

Page 16: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

of northern inputandsouthern outputIW ports, then the direction of correspond-ing interconnect channel is southward. The interconnection between all interconnectwrappers of the processor array is correct if and only if each IW has equivalent di-rected interconnect channels.

The number of input ports is represented by theN in, Ein, Sin,andW in for eachside respectively. The same holds for the output ports:N out, Eout, Sout, andW out.The numbering of the ports is done as shown in Figure7(a). The indices of the pairof IW ports which belong to the certain interconnect channel are the same and equalto the index of this interconnect channel. The consecutive numbering of interconnectchannels is done in the directions from left to right and from top to bottom. A PE isplaced inside of the interconnect wrapper. The input portsP in are shown on the topedge and the output portsP out are shown on the bottom edge of the PE.

The configuration of an IW is specified by the so-calledInterconnect AdjacencyMatrix (IAM) (see Figure7(b)). By theconfiguration of an IW, we mean the defini-tion of the mapping of the possible connections between the ports of an interconnectwrapper and a processor element. Therefore, the particular ports of an IW shouldbe considered instead of interconnect channels (the pair of ports). The rows of IAMrepresent the input ports of an IW, except the last few rows (dependent on the numberof the PEs output ports), which represent the output ports of the PE. The columnsrepresent the output ports of an IW, except the last few columns (dependent on thenumber of the PEs input ports), which represent the input ports of the PE. The matrixcontains the valuescij, which are equal to ”1” if there exists a possible connection be-tween input and output ports, and equal to ”0” otherwise. The last rows and columnsof IAM represent the port mapping between PE and IW ports. The interconnectionof PE ports is not allowed, however the interconnection of IW ports is possible. Thepositions of input PE ports are interchanged with the positions of the output PE portsin the IAM. This allows to avoid the configuration of such incorrect connections asa connection between IW input and PE output or a connection between PE input andIW output.

cij ←{

1, if ∃ a possible connection between input and output ports,0, otherwise;

In the most complex case, the IW is a configurable full crossbar switched matrix,but in practice, in the most cases, it is less complex since it is a compromise betweenrouting flexibility and cost.

The IW is defined by the<PEInterconnectWrapper> element. It containsthe specifications of the interconnect channels (index and bitwidth) in each directionand the definition of maximal number of PE input and output ports. An example ofan IAM for the interconnect topology in Figure10 is shown in the code on page10(see the interconnect domaind2).

The structure of the<ICDomain> element is depicted in Figure8. The<ICDomain>element contains the attributesname, selection , and the following subelements:

16

Page 17: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

...

...

ICDomain

Interconnect

ElementsDomain ElementsDomain

name

ElementsPolytopeRange ElementsPolytopeRange

ElementAt ...

selection

Figure 8:The<ICDomain> element.

• <Interconnect>

• <ElementsPolytopeRange>

• <ElementAt>

• <ElementsDomain>

<ICDomain> specifies the domain of the processor elements with the same inter-connect topology. The processor elements within an interconnect domain are thesubsets of PEs defined by the<PElements> element. The selection of the proces-sor elements which should be included into interconnect domain is done by usageof the following elements:<ElementsPolytopeRange> , <ElementAt> , and<ElementsDomain> . These elements define a range of PEs in the shape of a givenpolytope, theparticular PE with given coordinates (row, column), or thesubset ofPEsspecified by the name of another domain. Recursive definition is not allowed.Thename attribute gives the name to the interconnect domain.

Theselection attribute specifies how different subsets of PEs defined by the el-ements<ElementsPolytopeRange> , <ElementAt> , or<ElementsDomain>should be compound in resulting selection of PEs in the entire interconnect domain.Theselection attribute allows one of the following values:addition(this is a de-fault value if theselection attribute is not specified),subtraction, or intersection.This values stand for the composition of resulting selection by the addition (union),subtraction or intersection of PE-subsets, respectively. For example, we have a3× 3processor array and want to define a ring-shaped interconnect domain which containsall PEs except one in the center, as shown in Figure9(a). In this case, the intercon-nect domain should be specified with an attributeselection set tosubtractionand two PE subsets. The first subset should select all PEs by a polytope defini-tion <ElementsPolytopeRange> (detailed explanation of this element followsin Section4.1.4). The second subset which contains only one processor elementPE[2,2] defined by<ElementAt> (see Section4.1.5) is subtracted from the first

17

Page 18: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

PE[2,2] PE[2,3]

IW

PE[1,1]

IW

PE[1,2]

IW

PE[1,3]

IW

PE[2,1]

IW

PE[3,2]

IW

PE[3,1]

IW

PE[3,3]

IW

RingDomain

(a)

<ICDomain name="RingDomain"

<Interconnect type="dynamic">...

<\Interconnect><ElementsPolytopeRange>

<MatrixA row = " 1 0"/><MatrixA row = "-1 0"/><MatrixA row = " 0 1"/><MatrixA row = " 0 -1"/><VectorB value = " 1"/><VectorB value = "-3"/><VectorB value = " 1"/><VectorB value = "-3"/>

</ElementsPolytopeRange>

<ElementAt row = "2" column = "2">

</ICDomain>

(b)

selection="subtraction">

Figure 9: Example of an interconnect domain definition using theselection at-tribute.

subset. This results in required ring-shaped interconnect domain. The correspondingMAML-code is shown in Figure9(b).

The<Interconnect> subelement specifies the interconnect network topology.The attributetype defines the type of the interconnect. The value of thetype is ei-ther"static" or "dynamic" specifying thereconfigurabilityof the interconnect.If the interconnect of certain PE or of the whole domain is reconfigurable, the specialinterconnect control registers will be added to the interconnect wrappers in order todrive the process of interconnect reconfiguration. The details concerning dynamicreconfiguration of massively parallel processor architectures will be covered in PartII of this report in the near future.

The <AdjacencyMatrix> subelement defines the IAM of the interconnectwrapper for the PEs which belong to the current domain. The matrix is describedrow by row using the elements<NInput> , <EInput> , <SInput> , <WInput> ,and<POutput> with the attributesidx androw , specifying the rows of the matrix(rows with all zeros can be skipped). These elements represent the input ports of anIW and output ports of the PE. The value ofidx defines theindexof the correspond-ing interconnect channel which contains the specified IW input port. The numberingorder of the interconnect channels is explained on page16.

Figure10 shows the interconnect wrappers with four input and four output portson each of its edges. All of them contain PEs with four input and four output ports.Here, the interconnect topology represents the interconnect domaind2 in Figure5.The code for this interconnect topology is listed bellow.

Example 4.4 Definition of the interconnect topology.

18

Page 19: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

PE

wrapper

PE

wrapper

PE

wrapper

PE

wrapper

PE

wrapper

PE

wrapper

PE

wrapper

PE

wrapper

PE

wrapper

PE

wrapper

PE

wrapper

PE

wrapper

PE

wrapper

PE

wrapper

PE

wrapper

PE

wrapper

Figure 10: The interconnect topology of the PEs from interconnect domaind2 inFigure5.

<ICDomain name="d2"><Interconnect type="static">

<!--Side Outputs --><!--NNNNEEEESSSSWWWWPPPP-->

<AdjacencyMatrix> <!--01230123012301230123--><NInput idx="0" row="00000000000010001000"/><NInput idx="1" row="00000000000000000100"/><EInput idx="0" row="00000000000000000010"/><EInput idx="1" row="00000000000000000001"/><POutput idx="0" row="00000000000001000000"/><POutput idx="1" row="00000000100000000000"/><POutput idx="2" row="00000000010000000000"/><POutput idx="3" row="00000001000000000000"/>

</AdjacencyMatrix></Interconnect><ElementsPolytopeRange>

<MatrixA row=" 1 0"/><MatrixA row="-1 0"/><MatrixA row=" 0 1"/>

19

Page 20: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<MatrixA row=" 0 -1"/><VectorB value=" 3"/><VectorB value="-6"/><VectorB value=" 1"/><VectorB value="-4"/>

</ElementsPolytopeRange></ICDomain>

4.1.4 The <ElementsPolytopeRange> Subelement

c2

c2 c2

c2 c2 c2

c2 c2 c2 c2

1 2 3 4 5 6 7 8

1

2

3

4

i

j

j=4

j=1

j=i

c3

c3 c3

c3D2D1

Domain D1:(ij

)=

(

ij

)∈ Z2 |

(ij

)=

(1 00 1

) (xy

)+

(00

)∧

1 00 −1−1 1

(xy

)6

4−10

Domain D2:(

ij

)=

(

ij

)∈ Z2 |

(ij

)=

(2 00 2

) (xy

)+

(51

)∧

1 0−1 00 10 −1

(xy

)6

1010

Figure 11: Polytope domains representation.

The <ElementsPolytopeRange> subelement is used to define a subset ofPEs that are grouped together in order to organize one domain. The set of PEs isdefined by the points of an integer lattice defined as follows:(

ij

)=

{ (ij

)∈ Z2 |

(ij

)= L ·

(xy

)+ m ∧ A ·

(xy

)6 b

}A ·

(xy

)6 b describes a polytope which is affinely transformed byL ·

(xy

)+ m.

An example of this concept is shown in Fig.11. The processor array contains two

20

Page 21: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

domains of processor elements: domainD1 has a triangular shape and domainD2 isa set of PEs placed in shape of square.

The MAML-code listed below describes the polytope domainD2 in Figure11.

Example 4.5 Characterization of Polytope domain.

<MatrixL row="2 0"/><MatrixL row="0 2"/><VectorM value="5"/><VectorM value="1"/><MatrixA row=" 1 0"/><MatrixA row="-1 0"/><MatrixA row=" 0 1"/><MatrixA row=" 0 -1"/><VectorB value="1"/><VectorB value="0"/><VectorB value="1"/><VectorB value="0"/>

If matrix L and vectorm are not specified, they are assumed to be unity, zero,respectively. The default values for matrixL is thus the identity matrix and for vectorm the vector of zeroes.

4.1.5 The <ElementAt> Subelement

The <ElementAt> subelement is used to select one single PE by the index of itsrow and column in the processor array where it is placed.<ElementAt> containstwo corresponding attributes:row andcolumn .

4.1.6 The <ElementsDomain> Subelement

The<ElementsDomain> subelement selects the subset of PEs already specified inanother domain. Theinstance attribute specifies the name of this domain instance.The recursive definition is not allowed here.

4.1.7 The PE Class Domain <ClassDomain> Element

<ClassDomain> specifies the set of the processor elements with the same archi-tectural structure (PE class). The processor elements within a class domain are thesubsets of PEs defined by the<PElements> element. The selection of the proces-sor elements that should be included into the PE class domain is done in the samemanner as in the definition of the interconnect domain (see Section4.1.3). The struc-ture of the<ClassDomain> element is depicted in Figure12.

<ClassDomain> contains the attributesname, peclass , andselection .Thenameattribute gives a name to the domain. Thepeclass attribute specifies the

21

Page 22: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

...

...ElementsDomain ElementsDomain

ElementsPolytopeRange ElementsPolytopeRange

ElementAt ...

name

selectionpeclassClassDomain

Figure 12:The<ClassDomain> element.

class name of the PE architecture for all PEs in this domain. This name should matchthe name of any PE class specified in the PE-level section of a MAML description.Classes of a PE architecture are described in Section4.2.

The subelements<ElementsPolytopeRange> , <ElementsDomain> , and<ElementAt> are the same as for the<ICDomain> element (see Section4.1.3).

4.2 PE-Level Architecture Specification

A schematic view of a possible WPPE is depicted in Figure2. The processor ele-ment consists of three register files: input registersregI , output registersregO , andgeneral purpose registersregGP . As a special register, it contains also aprogramcounterpc and a set of registers-flags form the register bankregFlags . Conse-quently, the registers fromregI are given the namesi plus the index from 0 to 3, aswe have four input registers. The registers which belong toregO can be referred towith the nameso0, o1 in the same manner. The registers ofregGP are referred toby r0..r15 . The flag-registers are namedf0, f1 . Input and output registers areconnected to the input and output ports:{ip0, ip1, ip2, ip3 } and{op0,op1}, respectively. Reading from and writing to the registers of any register bank isestablished through the read- and write ports (rPort, wPort ). A WPPE can haveone or several functional units (FUs) of the same or different ALU types which canincrease the performance of a WPPE by enabling parallel computing by the execu-tion of VLIW instructions. FUs with a wide word width might be also configured toprovide sub-word parallelism. SWP allows the parallel execution of equal instruc-tions with low data word width (e.g., four additions with 16 bit data) on functionalunits with high data word width (e.g., a 64 bit adder). Also the execution of complexinstructions (e.g., multiply and add,y =

∑i ai · bi) with multiply data input (e.g.,

4 data pairs with 16 bit word width) and single data output (e.g., 1 data with 64 bitword width) is possible. Consequently, the type of SWP (operation and number ofsub-words) has to be given in the description for each FU in order to enable a mod-eling of SWP instructions. Mostly those sub-words which are packed together in full

22

Page 23: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

length words have to be rearranged for the next calculation. Therefore, additional in-structions are needed which are calledpacking instructions. The packing instructionscan be added to the instruction set of already given FUs or they can be implementedas an instruction set of dedicated packing FU. As the sub-words are stored in wordsof full length, and the rearranging will be done within FUs, there is no additionalcharacterization of those registers which are involved in the SWP needed.

An instruction decoder decodes the processor instructions stored in the instructionmemory and drives the multiplexors and demultiplexors which select registers andregister banks for the source and target (result) operands of FUs.

In order to provide a complete design flow, starting from the architecture specifi-cation and finishing by the compiler generation the results of compilation must berepresented in binary code, so that we can put this binary code as a stimuli entry datafor WPPE architecture simulation. In order to handle this, the MAML descriptionuses an instruction image binary coding.

The internal structure of a PE is described in the PE-level architecture specificationsection of MAML. The architectural properties of PEs are defined by the so calledPE-classes. The properties of one class can be instanced as well on the one PE as onthe set of PEs. PE-classes can extend or implement another already earlier definedPE-classes, thus providing larger MAML-code efficiency. PE-classes are defined bythe<PEClass> element.

4.2.1 The <PEClass> Element

The <PEClass> element specifies the internal architecture of the PE or set of theprocessor elements (PE-class) within a massively parallel processor architecture. Itcovers such architectural issues as shown in the following:

• Characterization of I/O ports (bitwidth, input/output/biderectional, control pathor data path, etc.),

• Internal resources (internal read/write ports, FUs, busses, etc.),

• Storage elements (data or control registers, local memories, instruction mem-ory, FIFOs, feed-back FIFOs, register files, etc.)

• Resource mapping (interconnection of the ports with internal elements),

• Instructions (instruction coding, functionality, SWP), and

• Functional units (resource usage, pipeline, etc.).

The structure of the<PEClass> element is shown in Figure13. MAML allowsthe specification of multiple PE-classes, whereas one PE-class can extend or imple-ment another earlier defined PE-class. This feature enables theinheritanceamong

23

Page 24: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

PEClass

StorageElements

nameimplements

IOPorts

Resources Resmap

Opnames

Operations

Units

Figure 13:The<PEClass> element.

PE-classes in the architecture specification. The<PEClass> element contains thefollowing attributes:

• name

• implements

The name attribute names the PE-class. Theimplements attribute providesthe name of another PE-class of all subelements and parameters which are copied tothe current PE-class. The further description of any subelement in the body of thisclass will overwrite the appropriate subelement. Theimplements attribute canbe omitted which would mean that the PE-class is supposed to be constructed fromscratch.

Example 4.6 PE-classes inheritance in MAML.

<PEClass name="def_c"><Resources>...<StorageElements>...<Opnames>...<Operations>...<Units>...<Resmap>...

</PEClass><PEClass name="def_c2" implements="def_c">...</PEClass>

The<PEClass> element contains the following subelements:

24

Page 25: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

• <IOPorts>

• <Resources>

• <StorageElements>

• <Resmap>

• <Opnames>

• <Operations>

• <Units>

They are described in the following sections in detail.

4.2.2 The <IOPorts> Element

The <IOPorts> section specifies the input and output ports of the processor ele-ment. The ports are connected to inputs and outputs of the interconnect wrapper. The<Port> element specifies a certain I/O port. It contains the following attributes:

• name

• bitwidth

• direction

• type

The attributes define the name, bitwidth, direction (in, our, or inout), and type(dataor ctrl) of a PE port, respectively.

Example 4.7 Definition of PE I/O ports.

<!-- Data ports --><Port name="ip0" bitwidth="32" direction="in" type="data"/><Port name="ip1" bitwidth="32" direction="in" type="data"/><Port name="op0" bitwidth="32" direction="out" type="data"/><Port name="op1" bitwidth="32" direction="out" type="data"/><!-- Control ports --><Port name="ic0" bitwidth="1" direction="in" type="ctrl"/><Port name="ic1" bitwidth="1" direction="in" type="ctrl"/><Port name="oc0" bitwidth="1" direction="out" type="ctrl"/><Port name="oc1" bitwidth="1" direction="out" type="ctrl"/>

25

Page 26: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

StorageElements

InstructionMemory

Register

RegisterBank

FIFO

FIFOBank

FBFIFO

FBFIFOBank

SpRegister

PortMapping

LocalMemory

ElementBank

Figure 14:The<StorageElements> element.

4.2.3 The <Resources> Element

The attributes:

• name

• num

The description of communication components (i.e., read/write ports, busses,...) ofthe internal architecture of PE is provided by the<Resources> element. Thenameattribute names the communication resource, and thenumattribute sets the quantityof it. In case of a bus resource, thenumattribute sets the width of the bus.

Example 4.8 Definition of PE internal resources.

<Resource name="rPort1" num="4"><Resource name="rPort2" num="4"><Resource name="rPort3" num="4"><Resource name="wPort1" num="3"><Resource name="wPort2" num="3"><Resource name="BUS1" num="8"><Resource name="BUS2" num="8">

4.2.4 The <StorageElements> Element

The<StorageElements> element specifies the storage components (register files,separate registers, local memory, instruction memory, FIFOs) of the internal archi-tecture of PE. The general structure of it is shown in Figure14.

The<StorageElements> element contains the following subelements:

26

Page 27: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

. . .mux

FIFO

Figure 15:A schematic view of a Feed-back-FIFO.

• <Register> or <RegisterBank>

• <FIFO> or <FIFOBank>

• <FBFIFO> or <FBFIFOBank>

• <SpRegister>

• <PortMapping>

• <LocalMemory>

• <InstructionMemory>

The<Register> element defines a register by its name, bitwidth and type (dataor control) parameters which are set by the attributesname, bitwidth , andtype ,respectively.

The <RegisterBank> specifies the register bank (the set of the registers) bythe definition of the attributesname, number (number of the registers in the registerbank),bitwidth , type , andnamespace . Thename attribute gives a name tothe register bank, whereas thenamespace attribute defines a name space (a namewithout index) for all registers in the register bank.

In order to distinguish between ordinary registers and FIFOs, the elements<FIFO> ,<FIFOBank> , <FBFIFO>, and<FBFIFOBank> are defined. Each of them has thesame attribute set as for the elements<Register> and<RegisterBank> onlythe additional attributedepth specifies the depth of the FIFO. The<FBFIFO> and<FBFIFOBank> elements are used to describe feed-back-FIFOs which could beuseful for data reuse. A schematic view of a feed-back-FIFO is shown in Figure15.

<SpRegister> provides the specification of the special registers in a certainregister or FIFO bank.

The instruction memory is separated from the local memory by the use of elementInstructionMemory with the memorysize (number of memory words) and

27

Page 28: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

bitwidth as attributes. The local memory is defined by the elementLocalMemoryin the same manner as definition of instruction memory.

The<PortMapping> element declares the direct connections between the regis-ters or FIFOs and I/O ports, defined in the<IOPorts> element (see Section4.2.2).Thus, the routing between the internal storage elements of different processor ele-ments through the interconnect wrapper ports is established.

Example 4.9 Characterization of PE storage elements within MAML.

<StorageElements><!-- Control Path Registers --><RegisterBank name="rCtrl" number="8" bitwidth="1"

type="ctrl" namespace="RC"/><FIFOBank name="iFIFOCtrl" number="2" bitwidth="1" depth="4"

type="ctrl" namespace="IC"/><FBFIFOBank name="FBFIFOCtrl" number="4" bitwidth="1" depth="8"

type="ctrl" namespace="FC"/>

<!-- Data Path Registers --><RegisterBank name="rData" number="16" bitwidth="32"

type="data" namespace="RD"/><FIFOBank name="iFIFOData" number="2" bitwidth="32" depth="4"

type="data" namespace="ID"/><SpRegister registername="OD" bankname="rData"

registernumber="14-15"/><FBFIFOBank name="FBFIFOData" number="4" bitwidth="32" depth="8"

type="data" namespace="FD"/>

<LocalMemory name="MEM" bitwidth="32" size="256"/><InstructionMemory bitwidth="128" size="32"/><PortMapping>

<Bank name="iFIFOData" index="0" port="ip0"/><Bank name="iFIFOData" index="1" port="ip1"/><Bank name="rData" index="14" port="op0"/><Bank name="rData" index="15" port="op1"/>

<Bank name="iFIFOCtrl" index="0" port="ic0"/><Bank name="iFIFOCtrl" index="1" port="ic1"/><Bank name="rCtrl" index="14" port="oc0"/><Bank name="rCtrl" index="15" port="oc1"/>

</PortMapping></StorageElements>

4.2.5 The <Resmap> Element

The <Resmap> Element describes theresource mapping[Kru02]. This elementassigns the read/write ports to the register banks, connects the functional units to theregister banks through the buses, and defines the pipeline stages.

28

Page 29: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

The assignment of read/write ports is set by thecase bank of get rport andcase bank of get wport subelements of the corresponding elements<Rport>and<Wport> . The attributebank defines a name of the register or FIFO bank, andrport or wport attributes select read or write ports. The dependencies specifiedhere are extracted by the entry of a keywordget rport or get wport in theappropriate subelement of the<Operations> element (see Section4.2.7).

The interconnection of functional units to the register banks or FIFOs throughthe buses is described in the element<Bus> . The dependencies specified here areextracted by the entry of a keywordget bus in the appropriate subelement of the<Operations> element.

The element<Unit exe stages> allocates the pipeline stages. The descriptionof this element is optional. If there is no specification of this element in a MAMLdescription, then the pipeline stages are allocated as shown in the following example.

Example 4.10 Resource mapping in MAML.

<Resmap><Rport>

<case_bank_of_get_rport bank="regConst" rport="rPort2" /><case_bank_of_get_rport bank="regCtrl" rport="rPort3" /><case_bank_of_get_rport bank="regGP1" rport="rPort1" /><case_bank_of_get_rport bank="regPC" rport="rPort4" />

</Rport><Wport>

<case_bank_of_get_wport bank="regCtrl" wport="wPort2" /><case_bank_of_get_wport bank="regGP1" wport="wPort1" /><case_bank_of_get_wport bank="regPC" wport="wPort3" />

</Wport><Bus>

<case_unit_of name="alu_1"><case_bank_of_get_bus bank="regCtrl" bus="BUS1" /><case_bank_of_get_bus bank="regGP1" bus="BUS1" />

</case_unit_of><case_unit_of name="alu_2">

<case_bank_of_get_bus bank="regCtrl" bus="BUS2" /><case_bank_of_get_bus bank="regGP1" bus="BUS2" />

</case_unit_of></Bus><Unit_exe_stages>

<Unit_exe_stage name="exe1"><case_unit_of_get_exeunit fu="alu_1" exeunit="exe1_alu_1" /><case_unit_of_get_exeunit fu="alu_2" exeunit="exe1_alu_2" />

</Unit_exe_stage></Unit_exe_stages>

</Resmap>

29

Page 30: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<Opname name="name of the operation" code="machine-code">

<functionDescription>

<inputData namespace="name" number="number"><word subwords="number">

<subword sign="<yes,no>" type="<int,float>"width="Bits" mantissa="Bits"/>

</word>...

</inputData>

<outputData namespace="name" number="number"><word subwords="number">

<subword sign="<yes,no>" type="<int,float>"width="Bits" mantissa="Bits"/>

</word>...

</outputData>

<outputFunction><function pos="pos"> output function in C </function>...

</outputFunction>

</functionDescription>

<functionDescription>...

</functionDescription>...

</Opname>

Figure 16:Description of the function(s) of an operation

4.2.6 The <Opnames> Element

All operations and binary image coding for them are listed by theOpnameselement.The operation name is specified by the attributenameand the operation image binarycoding is set by thecode attribute.

Example 4.11 All operations of the architecture should be listed.

<Opnames><Opname name="and" code="00001">

...</Opname><Opname name="or" code="00010">

30

Page 31: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

...</Opname><Opname name="shl" code="00011">

...</Opname>...

</Opnames>

The functional description for each operation (<Opname>) is given as sub-elementfunctionDescription here also. The general overview of the functional de-scription is given in Figure16.

The functional description regards the use of SWP. Therefore, the description ofthe input and output data scheme (<inputData> , <outputData> ) is neededin the output function<outputFunction> as well. The number of input andoutput data, the number of sub-words in a data word, and the type of the sub-wordsare described in the elements<inputData> and<outputData> . The element<outputFunction> describes the output function for each sub-word or for a setof sub-words.

The syntax of the element<Opnames> and its sub-elements is given in the DTDstyle as follows:

<!ELEMENT Opnames (Opname+)><!ELEMENT Opname (functionDescription+)>

<!ATTLIST name (#PCDATA)><!ATTLIST code (#PCDATA)>

<!ELEMENT functionDescription (inputData,outputData, outputFunction)>

<!ELEMENT inputData (word*)><!ATTLIST inputData namespace CDATA "op" number CDATA "2">

<!ELEMENT outputData (word*)><!ATTLIST outputData namespace CDATA "res" number CDATA "1">

<!ELEMENT word (subword+)><!ATTLIST word subwords CDATA "1">

<!ELEMENT subword EMPTY><!ATTLIST subword sign (yes|no) "yes"><!ATTLIST subword type (int|float) "int"><!ATTLIST subword width CDATA #REQUIRED><!ATTLIST subword mantissa CDATA #IMPLIED>

<!ELEMENT outputFunction (function+)><!ELEMENT function #PCDATA)>

<!ATTLIST function sw CDATA "">

• <Opnames>: Description of all operations

• <Opname>: Description of one operation

– name: Name of the operation. The name has to be unique.

31

Page 32: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

– code : Machine code for the operation. The code has to be unique.

• <functionDescription> : Description of the instruction functionality.This includes the input and output data scheme (inputData , outputData )and the output function (outputFunction ). Dependent on the input outputscheme, several descriptions for a SWP instruction can be given.

• <inputData> : Description of one or more input data words. If all input datawords are equal, only one word (word ) has to be described. Otherwise, eachword has to be described separately.

– namespace : Name for input data words

– number : Number of input data words

• <outputData> : Description of one or more output data words. If all outputdata words are equal, only one word (word ) has to be described. Otherwise,each word has to be described separately.

– namespace : Name for output data words

– number : Number of output data words

• <word> : Description of an input or output data word. If all sub-words areequal, one sub-word (subword ) has to be described only. Otherwise, eachsub-word (subword ) has to be described separately.

– subwords : Number of sub-words in the data word

• <subword> : Description of one sub-word.

– sign : Is the sub-word signed or not? Only the parameteryes or no areallowed.

– type : Data type of the sub-word. Only the typeint (integer) orfloat(floating point) are allowed.

– width : Number of bits for the storage of the sub-word.

– mantissa : Number of bits for the storage of the mantissa of a floatingpoint value.

• <outputFunction> : Description of the output function. This can be dif-ferent for each sub-word.

• <function> : Description of the output function for one or more sub-words.The function is described in C-code. The input data are labeled withopX[sw]and the output data withresY[sw] , whereopandresare the name spaces forthe input and output words,X is the index of the input word,Y is the index

32

Page 33: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

of the output word, andsw is the index of the sub-word. The counters for thewords and sub-words start with ”0”.

– sw: Position of the sub-word(s), for which the function is described inthis element. The position can includes a single position (e.g., 2 or 5) orranges (e.g., 4-7). If this attribute is not given the function will be usedfor all output sub-words.

Example 4.12 Description of input and output data schemes.

• 2 input data without sub-words. The words are 64 bit floating point values.

<inputData number="2"><word subwords="1">

<subword sign="yes" type="float" width="64"/></word>

</inputData>

• 2 input data with 4 sub-words in each word. The sub-words are 16 bit signedinteger words.

<inputData number="2"><word subwords="4">

<subword sign="yes" type="int" width="16"/></word>

</inputData>

• 1 output data with 3 sub-words. The first sub-word is a 16 bit signed integerword, the second and third sub-words are 8 bit signed integer words.

<outputData number="1"><word subwords="3">

<subword sign="yes" type="int" width="16"/><subword sign="yes" type="int" width="8"/><subword sign="yes" type="int" width="8"/>

</word></outputData>

Example 4.13 Operation output function definition.

33

Page 34: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

• Normal addition of 2 word without SWP

<outputFunction><function> res0 = op0 + op1; </function>

</outputFunction>

• Parallel addition of 4 sub-words (16 bit)

<outputFunction><function> res0 = op0[i] + op1[i]; </function>

</outputFunction>

• Parallel addition with saturation of 8 sub-words (8 bit)

<outputFunction><function>

res0 = ( ( (op0[i] + op1[i] = temp) > 255 )?255 : temp < 0 )? 0: temp;

</function></outputFunction>

• Pack operation where the higher two sub-words are moved to the lower twosub-words. The higher two sub-words from the output data word will be setwith the higher two sub-words from the input data word.

<outputFunction><function sw="0-1"> res0[sw] = op0[sw+2]; </function><function sw="2-3"> res0[sw] = op0[sw]; </function>

</outputFunction>

• Multiplication and addition operation. The 4 sub-words of one input wordare multiplied with the 4 sub-words of another input word. The results willbe added in two pairs. The output data word has only two sub-words (seeFigure17).

<outputFunction><function>

res0[sw] = op0[2*sw]*op1[2*sw]+op0[2*sw+1]*op1[2*sw+1];</function>

</outputFunction>

34

Page 35: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

−→op1 = (op1,0, op1,1, op1,2, op1,3) −→op0 = (op0,0, op0,1, op0,2, op0,3)

op0,2 op0,0op0,1op0,3

res0,0res0,1

−−→res0 = (res0,0, res0,1)

××××

++

−−→res0 = muladd(−→op0,−→op1)

muladd

op1,2 op1,0op1,1op1,3

res0,i = op0,2i · op1,2i + op0,2i+1 · op1,2i+1|i = 0, 1

Figure 17:The parallel multiplication and addition operation on two input words di-vided into 4 sub-words. The result is one output word divided into 2 sub-words.

4.2.7 The <Operations> Element

This element describes the resource usage of each instruction (operation). It defineshow many cycles the operation occupies the functional unit, the operands direction,and the resource occupation in each cycle.

The attribute isexelength .The subelements are:

• <Opname>

• <Opdirection>

• <Input>

• <Execution>

• <Output>

The operations with the same parameters are grouped in operation sets under thesubelement<Operationset> . The general structure of the<Operationset>element is shown in Figure18.

The attributeexelength of the<Operationset> element specifies the num-ber of cycles which are required for the execution of the operation. The resources

35

Page 36: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

...

Operationset

...Opdirection

OpNameOpName

Opdirection

Input

Execution

Output

exelength

Figure 18:The<Operationset> element.

occupation in every cycle is specified by the elements<Input> , <Execution> ,and<Output> cycle by cycle. The operations which belong to one operation setare listed in the subelement<Opname>(see Section4.2.6).

The subelement<Opdirection> specifies the direction of the operands whichis given by thedirection attribute. The available values for this attribute arein,out, or inout.

The subelement<Input> describes the operation fetch phase. The attributecycle specifies in which cycle the fetch process is currently standing,sourcedescribes which source operands are active, and thename attribute shows which re-source is occupied. The value ofname can be the direct name of the resource, thestringnores(meansno resource), or one of the keywordsget rport or get bus .These keywords represent functional interrelationships that are described by the ele-mentResmap. For detailed explanation, see Section4.2.5.

<Execution> describes the execution phase of the operations. The attributecycle in the element<exeunit> specifies which resource, given by attributename, is assumed in this cycle. For each cycle of the execution time (given by the at-tributeexelength of element<Operationset> ) an element<exeunit> mustbe given. The resource name can be the execution name as given in Section4.2.8orthe stringnores(meansno resources).

The subelement<Output> describes in which registers results of operation arestored. It has the equal attributes as the<Input> element. Only the keywords inthis case areget wport andget bus .

The value of the integer attributecycle is starting withcycle=0 . If the read ofdata requires extra cycles, then the first execution cyclecycle in element<exeunit>

36

Page 37: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

is greater than zero. Also, the storage of the result at the target can require additionalcycles. The following example will explain the resource usage of multi-cycle opera-tions.

Example 4.14 Resource usage of multi-cycle operations.

<Operationset exelength="4"><Opname name="mul" />

<Opdirection direction="out" /><Opdirection direction="in" /><Opdirection direction="in" /><Input>

<input cycle="0" name="get_rport" source="1" /><input cycle="0" name="BUS1" source="1" /><input cycle="0" name="get_rport" source="2" /><input cycle="0" name="BUS1" source="2" />

</Input><Execution>

<exeunit cycle="0" name="exe1_alu_1" /><exeunit cycle="1" name="exe1_alu_1" /><exeunit cycle="2" name="exe2_alu_1" />

</Execution><Output>

<output cycle="3" name="wPort1" target="1" /><output cycle="3" name="BUS2" target="1" />

</Output></Operationset>

t (cycle)

resource

get rport

exe1 alu 1 exe1 alu 1

exe2 alu 1

wport1

0 1 2 3

Figure 19:Resource usage of multi-cycle operations.

37

Page 38: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

Here, the operation set with only one multiplication operationmul is described.The execution of it is performed in three cycles (exelength=4 ). The operationmul operates with two input operands and assigns the result to one output. Figure19shows the diagram of the resource usage in each cycle of this operation. Duringthe operation fetch phase (cycle 0), input operands are read from the memory bankthrough the appropriate read port (see the<Resmap> element) via busBUS1. Theexecution phase starts also in this cycle. The execution stage takes three cycles. In thefirst two cycles (cycles0 and1) the execution of the pipeline stageexe1 of functionalunit alu 1 is performed. The execution phase is finished in the next cycle (cycle2) bythe pipeline stageexe2 of functional unitalu 1 which takes only one cycle. In thenext cycle (cycle3), the result is written to the memory bank through the write portwPort1via the busBUS2.

4.2.8 The <Units> Element

The <Units> element contains the set of<Unit> elements describing differentfunctional units of the PE. The general structure is shown in Figure20.

... OpNameOpName

Read1 Read2

Write

Units

Unitopset

Unit

... OpNameOpName

Read1 Read2

Write

Unitopset

Unit

Unitopset

Unit...

...Unitopset Unitopset...

numberstages

namenumberstages

namenumberstages

name

Figure 20:The<Units> element.

The attributenumber specifies the number of the same functional units, the at-tributestages specifies the number of execution stages. The attributename givesa unique name. There is any number of the elements<Unitopset> available ineach functional unit.<Unitopset> groups together the operations (<Opname>subelement) which are executable on this functional unit and which have the samesource and destination register banks (Read1, Read2 , andWrite attributes). Incase when no register bank is assigned to the operand, the stringnorbshould be givenas an attribute value.

38

Page 39: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

Example 4.15 Definition of functional units.

<Unit name="LoadStoreUnit" number="2" stages="3"><Unitopset>

<Opname name="loadw" />

<Read1 bankname="regGP1" /><Read2 bankname="norb" /><Write bankname="regGP1" />

</Unitopset><Unitopset>

<Opname name="const" />

<Read1 bankname="norb" /><Read2 bankname="norb" />

<Write bankname="regGP1" /></Unitopset><Unitopset>

<Opname name="storew" />

<Read1 bankname="regGP1" /><Read1 bankname="regConst" /><Read2 bankname="regGP1" /><Write bankname="norb" />

</Unitopset><Unitopset>

<Opname name="move" />

<Read1 bankname="regGP1" /><Read1 bankname="regConst" /><Read1 bankname="regCtrl" /><Read1 bankname="regPC" /><Read2 bankname="norb" /><Write bankname="regGP1" /><Write bankname="regCtrl" /><Write bankname="regPC" />

</Unitopset></Unit>

The execution unit names are generated from the name of a unit, the number of theunits, and its stages. These names are required in the element<Operationset>(see Section4.2.7) for the description of the usage of each pipeline stage during theexecution of an operation. The names for our example of oneLoadStoreUnitwith 2 units and 3 pipeline stages are given in Table1.

4.3 Simulation

In order to evaluate different processor architectures and to find out which of themoptimally fulfills given requirements on for example speed, resource usage, and/or

39

Page 40: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

Functional unit Included execution unitLoadStoreUnit1 exe1LoadStoreUnit1

exe2LoadStoreUnit1exe3LoadStoreUnit1

LoadStoreUnit2 exe1LoadStoreUnit2exe2LoadStoreUnit2exe3LoadStoreUnit2

Table 1:Notation of the units and its execution units.

power consumption, a simulation of the whole parallel processor architecture is needed.The architecture simulation can be done at different levels. On one hand, the register-transfer (RT) level generally enables very flexible, precise but sometimes relativelyslow simulations. On the other hand, the instruction-set (IS) level typically allows avery fast simulation but not equally precise and flexible. A prominent feature of mas-sively parallel processor architectures is a high number of registers. From one hand,the RT level high-speed simulation approach presented in [KHT04b] can be appliedhere as it enables high-speed RTL simulation of complex architectures with a largeamount of registers, such as existent in processor arrays. This simulation method-ology provides a direct automatic generation of the simulator from the given RTLnetlist. On the other hand, instruction-set-level (ISL) simulation can typically per-form at much higher speeds. Therefore, in order to enable the variations of the trade-offs between speed of ISL and precision of RTL simulations the simulation approachpresented in [KHT04b] should be extended to efficient ISL-simulation. Details onhow to efficiently and accurately simulate complete processor array architectures willnot be touched here.

5 WPPA Design Flow in Scopes ofArchitectureComposer Framework

A WPPA architecture does not have to be given manually in MAML.WPPA-Editor(see Figure21)is a tool which is an extension of the ArchitectureComposer frame-work ([TKW00], [FTTW02], [FTTW03]). First it is supposed that the user createsa library of different parameterizable WPPEs in theWPPE Parameterizer(see Fig-ure22). Here, the set of parameterizable WPPEs architectures is defined by such ar-chitectural peculiarities such as register files, local memory usage, FUs, Input/OutputFIFOs, instruction memory size, etc. Once the library of WPPEs is completed, theuser is switched to theWPPA Editorwhere the processor architecture is specified athigher level of processor element, array, and interconnect. The user defines a WPPAin form of rows and columns, specifies the type of WPPEs from the WPPE library,or creates a new WPPE architecture in aPE-Parameterizerwindow, defines an inter-

40

Page 41: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

Figure 21: WPPA-Editor tool.

connect topology, and the array of WPPEs is automatically generated. If more thanone type of WPPEs is used in a WPPA architecture, then the user additionally speci-fies the rows, columns, or direct positions of WPPEs of these additional types. Oncethe WPPA architecture design is finished, the user can enter the VLIW-programs toeach PE or PE-Class in aVLIW-Program Editor(see Figure22) and the completecorresponding RTL-level synthesizable netlist and corresponding MAML architec-tural description is automatically extracted. The generated RTL-level netlist can beused for high-speed bit-true cycle-accurate RASIM simulation or for generation ofcorresponding VHDL code with further synthesis.

41

Page 42: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

Figure 22: Specifying the storage elements and VLIW-program of a PE in the PE-Parameterizer window.

6 Conclusions and Future Work

In this technical report, we proposed an architecture description language (ADL)called MAML for the systematic characterization, modeling, simulation and evalua-tion of massively parallel reconfigurable processor architectures that are designed forspecial purpose applications from the domain of embedded systems. Key features, se-mantic, and technical innovations of the ADL MAML for regular processor architec-tures were presented. The usability of the proposed modeling approach was shown ona numerous small examples. In the future, the language is integrated into our frame-work ArchitectureComposer[FTTW03]. Also, further extensions of this frameworkwith a VHDL-backend generator for parallel processor array architectures, and an

42

Page 43: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

demux

i0 i1 i2 i3

InstructionDecoder

o0 o1

Input Registers

OutputRegisters

ALUtype1

mux mux

demux

General Purpose Regs

ip0 ip1 ip2 ip3

op0 op1

pc

BUnit

regFlags

f0 f1 f2 f3

f4 f5 f6 f7

InstrMEM

instr0instr1instr2instr3instr4instr5instr6instr7instr8instr9instr10

...

r0r1r2r3r4r5r6r7r8r9r10r11r12r13r14r15

regGP

rPorts

wPorts

regI

regO

WPP ARCHITECTURE MANAGER

WPPE Library Editor WPPA Editor

I/O I/O

I/O

I/O

I/OI/O

I/O

I/O

PE PE PE PE

PE PE PE PE

PE PE PE PE

WPPE architecture specificationGP registers numberInput regs / FIFOs numberOutput registers numberFlag registers

Instruction memory sizeNumber of FUsInstructions codingWidth of input FIFOs

WPPA architecture specificationProcessor Array sizeInterconnect topologyLocation of I/O portsExternal memory size

Memory levelsInterconnect reconfiguration parameters

ArchitectureComposer CompilerComposer

FUsinstr info

...

Compileroptimizer

...

Regisersets info

...

Ar/CoExplorer

...

CompilerGenerator

Application

Compiler

Binarycode

RASIM C-codeSystemCVerilogVHDLXASM

Simulation

Synthesis

RTL Netlist MAML description

Instruct.coding...

Interconn.topology

...

Local scheduling

...

Compiler grammar rules...

Global scheduling

...

Figure 23: Architecture Manager tool and the design flow of the ArchitectureCom-poser framework.

43

Page 44: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

efficient simulation environment and visualization (extension of RASIM simulator[KHT04a] will be done). This will allow for rapid-prototyping and validation of ourmapping and compilation techniques we are currently developing in parallel for sucharray architectures.

The following issues are left untouched in this report and will be specified in PartII of this report in the near future:

• Modeling of configuration and dynamic reconfiguration support for massivelyparallel processor architectures,

• Modeling of I/O-ports/methods (streaming ports, addressable external mem-ory), and

• Modeling of global/distributed memory.

Finally, we will present ideas how to generate highest speed yet equally accurate(as compared to RTL) simulators in order to validate multi-processor architectures.

44

Page 45: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

A MAML Document Type Definition

<?xml version="1.0" encoding="UTF-8"?>

<!-- DTD for WPPA description with MAML --><!-- VERSION_3 -->

<!-- Main element maml --><!ELEMENT maml (ProcessorArray, PEClass+)><!ATTLIST maml name CDATA #IMPLIED>

<!-- Element ProcessorArray --><!ELEMENT ProcessorArray (PElements, PEInterconnectWrapper, ICDomain+, ClassDomain+)><!ATTLIST ProcessorArray

name CDATA #REQUIREDversion CDATA #IMPLIED

>

<!-- Element PElements --><!ELEMENT PElements EMPTY><!ATTLIST PElements

name CDATA #REQUIREDrows CDATA #REQUIREDcolumns CDATA #REQUIRED

>

<!-- Element PEInterconnectWrapper --><!ELEMENT PEInterconnectWrapper (Channels, PElementPorts)><!ELEMENT Channels (Southward, Northward, Eastward, Westward)><!ELEMENT Southward EMPTY><!ATTLIST Southward index CDATA #REQUIRED bitwidth CDATA #REQUIRED><!ELEMENT Northward EMPTY><!ATTLIST Northward number CDATA #REQUIRED bitwidth CDATA #REQUIRED><!ELEMENT Eastward EMPTY><!ATTLIST Eastward number CDATA #REQUIRED bitwidth CDATA #REQUIRED><!ELEMENT Westward EMPTY><!ATTLIST Westward number CDATA #REQUIRED bitwidth CDATA #REQUIRED><!ELEMENT PElementPorts (Inputs, Outputs)><!ELEMENT Inputs EMPTY><!ATTLIST Inputs number CDATA #REQUIRED><!ELEMENT Outputs EMPTY><!ATTLIST Outputs number CDATA #REQUIRED>

<!-- Element ICDomain --><!ELEMENT ICDomain (Interconnect, ElementsPolytopeRange*, ElementAt*, ElementsDomain*)><!ATTLIST ICDomain

name CDATA #REQUIREDselection (addition|subtraction|intersection) "addition"

><!ELEMENT Interconnect (AdjacencyMatrix)><!ATTLIST Interconnect type (static|dynamic) "static"><!ELEMENT AdjacencyMatrix (NInput*, EInput*, SInput*, WInput*, POutput*)><!ELEMENT NInput EMPTY><!ATTLIST NInput idx CDATA #REQUIRED row CDATA #REQUIRED><!ELEMENT EInput EMPTY><!ATTLIST EInput idx CDATA #REQUIRED row CDATA #REQUIRED><!ELEMENT SInput EMPTY><!ATTLIST SInput idx CDATA #REQUIRED row CDATA #REQUIRED><!ELEMENT WInput EMPTY>

45

Page 46: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<!ATTLIST WInput idx CDATA #REQUIRED row CDATA #REQUIRED><!ELEMENT POutput EMPTY><!ATTLIST POutput idx CDATA #REQUIRED row CDATA #REQUIRED>

<!ELEMENT ElementsPolytopeRange (MatrixL*, VectorM+, MatrixA+, VectorB+)><!ELEMENT MatrixL EMPTY><!ATTLIST MatrixL row CDATA #REQUIRED><!ELEMENT VectorM EMPTY><!ATTLIST VectorM value CDATA #REQUIRED><!ELEMENT MatrixA EMPTY><!ATTLIST MatrixA row CDATA #REQUIRED><!ELEMENT VectorB EMPTY><!ATTLIST VectorB value CDATA #REQUIRED>

<!ELEMENT ElementAt EMPTY><!ATTLIST ElementAt row CDATA #REQUIRED column CDATA #REQUIRED>

<!ELEMENT ElementsDomain EMPTY><!ATTLIST ElementsDomain instance CDATA #REQUIRED>

<!-- ClassDomain --><!ELEMENT ClassDomain (ElementsPolytopeRange*, ElementAt*, ElementsDomain*)><!ATTLIST ClassDomain

name CDATA #REQUIREDpeclass CDATA #REQUIREDselection (addition|subtraction|intersection) "addition"

>

<!-- PEClass Element --><!ELEMENT PEClass (IOPorts, Resources, StorageElements, Opnames,

Operations, Units, Resmap, Transport?)><!ATTLIST PEClass

name CDATA #REQUIREDimplements CDATA #IMPLIED

>

<!-- Element IOPorts --><!ELEMENT IOPorts (Port*)>

<!ELEMENT Port EMPTY><!ATTLIST Port

name CDATA #REQUIREDbitwidth CDATA #REQUIREDdirection (in|out|inout) #REQUIREDtype (data|ctrl) "data"

>

<!-- Element Resource --><!ELEMENT Resources (Resource+)>

<!ELEMENT Resource EMPTY><!ATTLIST Resource

name CDATA #REQUIREDnum CDATA #REQUIRED

>

<!-- --><!-- StorageElements --><!-- --><!-- --><!-- --><!-- -->

46

Page 47: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<!ELEMENT StorageElements (Register*,RegisterBank*,FIFO*,FIFOBank*,FBFIFO*,FBFIFOBank*,SpRegister*,LocalMemory*,InstructionMemory,PortMapping)>

<!ELEMENT Register EMPTY><!ATTLIST Register

name CDATA #REQUIREDbitwidth CDATA #REQUIREDtype (data|ctrl) "data"

><!ELEMENT RegisterBank EMPTY><!ATTLIST RegisterBank

name CDATA #REQUIREDnumber CDATA #REQUIREDbitwidth CDATA #REQUIREDtype (data|ctrl) "data"namespace CDATA #REQUIRED

><!ELEMENT FIFO EMPTY><!ATTLIST FIFO

name CDATA #REQUIREDbitwidth CDATA #REQUIREDdepth CDATA #REQUIREDtype (data|ctrl) "data"

><!ELEMENT FIFOBank EMPTY><!ATTLIST FIFOBank

name CDATA #REQUIREDnumber CDATA #REQUIREDbitwidth CDATA #REQUIREDdepth CDATA #REQUIREDtype (data|ctrl) "data"namespace CDATA #REQUIRED

>

<!ELEMENT FBFIFO EMPTY><!ATTLIST FBFIFO

name CDATA #REQUIREDbitwidth CDATA #REQUIREDdepth CDATA #REQUIREDtype (data|ctrl) "data"

><!ELEMENT FBFIFOBank EMPTY><!ATTLIST FBFIFOBank

name CDATA #REQUIREDnumber CDATA #REQUIREDbitwidth CDATA #REQUIREDdepth CDATA #REQUIREDtype (data|ctrl) "data"namespace CDATA #REQUIRED

>

<!ELEMENT SpRegister EMPTY><!ATTLIST SpRegister

registername CDATA #REQUIREDbankname CDATA #REQUIREDregisternumber CDATA #REQUIRED

>

<!ELEMENT LocalMemory EMPTY>

47

Page 48: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<!ATTLIST LocalMemory name CDATA #REQUIRED size CDATA #REQUIRED>

<!ELEMENT InstructionMemory EMPTY><!ATTLIST InstructionMemory name size CDATA #REQUIRED>

<!ELEMENT PortMapping (Bank*,Element*)><!ELEMENT Bank EMPTY><!ATTLIST Bank

name CDATA #REQUIREDindex CDATA #REQUIREDport CDATA #REQUIRED

><!ELEMENT Element EMPTY><!ATTLIST Element

name CDATA #REQUIREDport CDATA #REQUIRED

>

<!-- --><!-- Opnames: Element Opname: --><!-- Alle im Instruktionssatz vorhandenen --><!-- Operationen --><!-- -->

<!ELEMENT Opnames (Opname+)><!ELEMENT Opname (functionDescription+)>

<!ATTLIST name (#PCDATA)><!ATTLIST code (#PCDATA)>

<!ELEMENT functionDescription (inputData,outputData, outputFunction)>

<!ELEMENT inputData (word*)><!ATTLIST inputData namespace CDATA "op" number CDATA "2">

<!ELEMENT outputData (word*)><!ATTLIST outputData namespace CDATA "res" number CDATA "1">

<!ELEMENT word (subword+)><!ATTLIST word subwords CDATA "1">

<!ELEMENT subword EMPTY><!ATTLIST subword sign (yes|no) "yes"><!ATTLIST subword type (int|float) "int"><!ATTLIST subword width CDATA #REQUIRED><!ATTLIST subword mantissa CDATA #IMPLIED>

<!ELEMENT outputFunction (function+)><!ELEMENT function #PCDATA)>

<!ATTLIST function pos CDATA "">

<!-- --><!-- Operations : Element Operationset: --><!-- Zusammenfassung aller Operationen, --><!-- die Opdirection und Resrequire --><!-- meinsam haben. --><!-- Element Opdirection: --><!-- Festlegung der Operationsrichtung --><!-- Element Resrequire: --><!-- Angabe der benoetigten Ressourcen --><!-- und der Zyklen fuer ein --><!-- Operationset --><!-- Element Zyklus: --><!-- Angabe, ob ein neuer Zyklus --><!-- beginnt oder nicht --><!-- Element Exeunit: --><!-- Angabe der Aufuehrungseinheit oder -->

48

Page 49: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<!-- einer Ausfuehrungssrufe --><!-- Element Input: --><!-- Angabe eines Busses und readports --><!-- oder den Funktionen get_bus/rport --><!-- die in Resmap definiert werden --><!-- Element Output: --><!-- Aequivalent zu Input --><!-- -->

<!ELEMENT Operations (Operationset+)>

<!ELEMENT Operationset (Opname+,Opdirection*,Input,Execution,Output)><!ATTLIST Operationset exelength CDATA #REQUIRED>

<!ELEMENT Opdirection EMPTY><!ATTLIST Opdirection direction (in|out|inout) #REQUIRED>

<!ELEMENT Input (input*)>

<!ELEMENT Execution (exeunit+)>

<!ELEMENT Output (output+)>

<!ELEMENT input EMPTY><!ATTLIST input cycle CDATA #REQUIRED

name CDATA #REQUIREDsource CDATA #REQUIRED><!-- name : Referenz aus Resources -->

<!ELEMENT exeunit EMPTY><!ATTLIST exeunit cycle CDATA #REQUIRED

name CDATA #REQUIRED><!-- name : Referenz aus Resources oder Resmap -->

<!ELEMENT output EMPTY><!ATTLIST output cycle CDATA #REQUIRED

name CDATA #REQUIREDtarget CDATA #REQUIRED><!-- name : Referenz aus Resources -->

<!-- --><!-- Units : Element Unit: --><!-- Angabe der Funktionalen Einheiten, sowie --><!-- der Operationen, die sie ausfuehren --><!-- koennen. --><!-- Element Operands: Angabe aus welchen --><!-- Registerbaenken die Operanden gelesen --><!-- werden bzw. wohin das Ergebnis --><!-- geschrieben wird. --><!-- -->

<!ELEMENT Units (Unit+)>

<!ELEMENT Unit (Unitopset+)><!ATTLIST Unit name CDATA #REQUIRED

number CDATA #REQUIREDstages CDATA #REQUIRED

>

<!ELEMENT Unitopset ((Opname+,Read1+,Read2+,Write+)+)>

49

Page 50: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<!ELEMENT Read1 EMPTY><!ATTLIST Read1 bankname CDATA #REQUIRED><!-- Referenz aus Registerbank -->

<!ELEMENT Read2 EMPTY><!ATTLIST Read2 bankname CDATA #REQUIRED><!-- Referenz aus Registerbank -->

<!ELEMENT Write EMPTY><!ATTLIST Write bankname CDATA #REQUIRED><!-- Referenz aus Registerbank -->

<!-- --><!-- Resmap : Element Rport: --><!-- Zuweisung des rports in Abbhaengigkeit --><!-- der angesprochenen Registerbank --><!-- Element Wport: --><!-- Aequivalent zu Rport --><!-- Element Bus: --><!-- Zuweisung des Busses in Abhaengigkeit --><!-- der -->

<!ELEMENT Resmap (Rport?,Wport?,Bus?,Unit_exe_stages?)>

<!ELEMENT Rport (case_bank_of_get_rport+)>

<!ELEMENT case_bank_of_get_rport EMPTY><!ATTLIST case_bank_of_get_rport

bank CDATA #REQUIREDrport CDATA #REQUIRED

><!-- bank aus Registerbank, rport aus Resources -->

<!ELEMENT Wport (case_bank_of_get_wport+)>

<!ELEMENT case_bank_of_get_wport EMPTY><!ATTLIST case_bank_of_get_wport

bank CDATA #REQUIREDwport CDATA #REQUIRED

><!-- bank aus Registerbank, wport aus Resources -->

<!ELEMENT Bus (case_unit_of+)>

<!ELEMENT case_unit_of (case_bank_of_get_bus+)><!ATTLIST case_unit_of name CDATA #REQUIRED><!-- Referenz aus Units -->

<!ELEMENT case_bank_of_get_bus EMPTY><!ATTLIST case_bank_of_get_bus

bank CDATA #REQUIREDbus CDATA #REQUIRED

><!-- bank aus Registerbank, bus aus Resources -->

<!ELEMENT Unit_exe_stages (Unit_exe_stage+)>

<!ELEMENT Unit_exe_stage (case_unit_of_get_exeunit+)><!ATTLIST Unit_exe_stage name CDATA #REQUIRED>

<!ELEMENT case_unit_of_get_exeunit EMPTY><!ATTLIST case_unit_of_get_exeunit

fu CDATA #REQUIREDexeunit CDATA #REQUIRED

><!-- Unit aus Units, exeunit aus Resources -->

50

Page 51: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<!-- --><!-- Element Transport mit transport --><!-- -->

<!ELEMENT Transport (transport+)>

<!ELEMENT transport EMPTY><!ATTLIST transport

operation CDATA #REQUIREDbank1 CDATA #REQUIREDbank2 CDATA #REQUIRED

> <!-- bank1, bank2 aus Registerbank -->

51

Page 52: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

B Example: WPPA Description in MAML

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE maml SYSTEM "maml_v3.dtd">

<maml name="wppa.maml">

<ProcessorArray name="PA" version="1.0"><PElements name="pe" rows="4" columns="12"/><PEInterconnectWrapper>

<Channels><Southward index="0" bitwidth="32"/><!-- data path--><Southward index="1" bitwidth="32"/><!-- data path--><Southward index="2" bitwidth="1"/><!-- control path--><Southward index="3" bitwidth="1"/><!-- control path--><Northward index="0" bitwidth="32"/><Northward index="1" bitwidth="32"/><Northward index="2" bitwidth="1"/><Northward index="3" bitwidth="1"/><Eastward index="0" bitwidth="32"/><Eastward index="1" bitwidth="32"/><Eastward index="2" bitwidth="1"/><Eastward index="3" bitwidth="1"/><Westward index="0" bitwidth="32"/><Westward index="1" bitwidth="32"/><Westward index="2" bitwidth="1"/><Westward index="3" bitwidth="1"/>

</Channels><PElementPorts>

<Inputs number="4"> <!-- max number of PE inputs--><Outputs number="4"> <!-- max number of PE outputs-->

</PElementPorts></PEInterconnectWrapper><ICDomain name="d1">

<Interconnect type="static"><!--Side Outputs --><!--NNNNEEEESSSSWWWWPPPP-->

<AdjacencyMatrix> <!--01230123012301230123--><WInput idx="0" row="00000000000000001000"/><WInput idx="1" row="00000000000000000100"/><WInput idx="2" row="00000000000000000010"/><WInput idx="3" row="00000000000000000001"/><POutput idx="0" row="00000001000000000000"/><POutput idx="1" row="00000010000000000000"/><POutput idx="2" row="00000100000000000000"/><POutput idx="3" row="00001000000000000000"/>

</AdjacencyMatrix></Interconnect><ElementsPolytopeRange>

<MatrixA row=" 1 0"/><MatrixA row="-1 0"/><MatrixA row=" 0 1"/><MatrixA row=" 0 -1"/><VectorB value=" 1"/><VectorB value="-2"/><VectorB value=" 1"/><VectorB value="-4"/>

</ElementsPolytopeRange></ICDomain><ICDomain name="d2">

<Interconnect type="static"><!--Side Outputs -->

52

Page 53: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<!--NNNNEEEESSSSWWWWPPPP--><AdjacencyMatrix> <!--01230123012301230123-->

<NInput idx="0" row="00000000000010001000"/><NInput idx="1" row="00000000000000000100"/><EInput idx="0" row="00000000000000000010"/><EInput idx="1" row="00000000000000000001"/><POutput idx="0" row="00000000000001000000"/><POutput idx="1" row="00000000100000000000"/><POutput idx="2" row="00000000010000000000"/><POutput idx="3" row="00000001000000000000"/>

</AdjacencyMatrix></Interconnect><ElementsPolytopeRange>

<MatrixA row=" 1 0"/><MatrixA row="-1 0"/><MatrixA row=" 0 1"/><MatrixA row=" 0 -1"/><VectorB value=" 3"/><VectorB value="-6"/><VectorB value=" 1"/><VectorB value="-4"/>

</ElementsPolytopeRange></ICDomain><ICDomain name="d3">

<Interconnect type="static"><!--Side Outputs --><!--NNNNEEEESSSSWWWWPPPP-->

<AdjacencyMatrix> <!--01230123012301230123--><WInput idx="0" row="00000000000000011000"/><WInput idx="1" row="00000000000000001100"/><WInput idx="2" row="00000000000000000110"/><WInput idx="3" row="00000000000000000001"/><POutput idx="0" row="00000001000001000000"/><POutput idx="1" row="00000010000100000000"/><POutput idx="2" row="00000100000010010000"/><POutput idx="3" row="00001000001000000000"/>

</AdjacencyMatrix></Interconnect><ElementsPolytopeRange>

<MatrixA row=" 1 0"/><MatrixA row="-1 0"/><MatrixA row=" 0 1"/><MatrixA row=" 0 -1"/><VectorB value=" 7"/><VectorB value="-10"/><VectorB value=" 1"/><VectorB value="-4"/>

</ElementsPolytopeRange></ICDomain><ICDomain name="d4">

<Interconnect type="static"><!--Side Outputs --><!--NNNNEEEESSSSWWWWPPPP-->

<AdjacencyMatrix> <!--01230123012301230123--><WInput idx="0" row="00000000000000001000"/><WInput idx="1" row="00000000000000000100"/><WInput idx="2" row="00000000000000000010"/><WInput idx="3" row="00000000000000000001"/><POutput idx="0" row="00000001000000000000"/><POutput idx="1" row="00000010000000000000"/><POutput idx="2" row="00000100000000000000"/><POutput idx="3" row="00001000000000000000"/>

</AdjacencyMatrix>

53

Page 54: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

</Interconnect><ElementsPolytopeRange>

<MatrixA row=" 1 0"/><MatrixA row="-1 0"/><MatrixA row=" 0 1"/><MatrixA row=" 0 -1"/><VectorB value=" 11"/><VectorB value="-12"/><VectorB value=" 1"/><VectorB value="-4"/>

</ElementsPolytopeRange></ICDomain><ClassDomain name="dc1" peclass="c1">

<ElementsDomain instance="d1"/></ClassDomain><ClassDomain name="dc2" peclass="c2">

<ElementsDomain instance="d2"/></ClassDomain><ClassDomain name="dc3" peclass="c3">

<ElementsDomain instance="d3"/></ClassDomain><ClassDomain name="dc4" peclass="c1">

<ElementsDomain instance="d4"/></ClassDomain>

</ProcessorArray>

<PEClass name="c1"><!-- IOPorts -->

<IOPorts><!-- Data ports --><Port name="ip0" bitwidth="32" direction="in" type="data"/><Port name="ip1" bitwidth="32" direction="in" type="data"/><Port name="op0" bitwidth="32" direction="out" type="data"/><Port name="op1" bitwidth="32" direction="out" type="data"/><!-- Control ports --><Port name="ic0" bitwidth="1" direction="in" type="ctrl"/><Port name="ic1" bitwidth="1" direction="in" type="ctrl"/><Port name="oc0" bitwidth="1" direction="out" type="ctrl"/><Port name="oc1" bitwidth="1" direction="out" type="ctrl"/>

</IOPorts>

<!-- Resources --><Resources>

<Resource name="ALU" num="1" /><Resource name="BUS1" num="32" /><Resource name="BUS2" num="32" /><Resource name="BUnit" num="1" /><Resource name="LoadStoreUnit" num="1" /><Resource name="rPort1" num="100" /><Resource name="rPort2" num="100" /><Resource name="rPort3" num="100" /><Resource name="rPort4" num="100" /><Resource name="wPort1" num="10" /><Resource name="wPort2" num="10" /><Resource name="wPort3" num="10" />

</Resources><!-- StorageElements --><StorageElements>

<RegisterBank name="regConst" number="2" bitwidth="32" type="data" namespace="c"/><RegisterBank name="regGP1" number="48" bitwidth="32" type="data" namespace="gp"/><RegisterElement name="pc" bitwidth="16" type="ctrl" /><SpRegister registername="nullreg" bankname="regConst" registernumber="0" />

54

Page 55: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<SpRegister registername="ONE" bankname="regConst" registernumber="1" /><SpRegister registername="sp" bankname="regGP1" registernumber="0" /><SpRegister registername="iparams" bankname="regGP1" registernumber="1-6" />

<!-- Control Path Registers --><RegisterBank name="rCtrl" number="8" bitwidth="1"

type="ctrl" namespace="RC"/><FIFOBank name="iFIFOCtrl" number="2" bitwidth="1" depth="4"

type="ctrl" namespace="IC"/><FBFIFOBank name="FBFIFOCtrl" number="4" bitwidth="1" depth="8"

type="ctrl" namespace="FC"/>

<!-- Data Path Registers --><RegisterBank name="rData" number="16" bitwidth="32"

type="data" namespace="RD"/><FIFOBank name="iFIFOData" number="2" bitwidth="32" depth="4"

type="data" namespace="ID"/><SpRegister registername="OD" bankname="rData" registernumber="14-15"/><FBFIFOBank name="FBFIFOData" number="4" bitwidth="32" depth="8"

type="data" namespace="FD"/>

<LocalMemory name="MEM" bitwidth="32" size="256"/><InstructionMemory bitwidth="128" size="32"/><PortMapping>

<Bank name="iFIFOData" index="0" port="ip0"/><Bank name="iFIFOData" index="1" port="ip1"/><Bank name="rData" index="14" port="op0"/><Bank name="rData" index="15" port="op1"/>

<Bank name="iFIFOCtrl" index="0" port="ic0"/><Bank name="iFIFOCtrl" index="1" port="ic1"/><Bank name="rCtrl" index="14" port="oc0"/><Bank name="rCtrl" index="15" port="oc1"/>

</PortMapping></StorageElements><!-- Opnames --><Opnames>

<Opname name="add" code="000"><functionDescription>

<inputData namespace="op" number="2"></inputData><outputData namespace="res" number="1"></outputData><outputFunction>

<function> res0 = op0 + op1 </function></outputFunction>

</functionDescription></Opname><Opname name="muladd" code="111">

<functionDescription><inputData namespace="op" number="2"></inputData><outputData namespace="res" number="1"></outputData><outputFunction>

<function> res0[i] = op0[2*i]*op1[2*i]+op0[2*i+1]*op1[2*i+1] </function></outputFunction>

</functionDescription></Opname><Opname name="sub" code="001">

<functionDescription><inputData namespace="op" number="2"></inputData>

55

Page 56: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<outputData namespace="res" number="1"></outputData><outputFunction>

<function> res0 = op0 - op1 </function></outputFunction>

</functionDescription></Opname><Opname name="and" code="010">

<functionDescription><inputData namespace="op" number="2"></inputData><outputData namespace="res" number="1"></outputData><outputFunction>

<function> res0 = op0 & op1 </function></outputFunction>

</functionDescription></Opname>

<Opname name="or" code="011"><functionDescription>

<inputData namespace="op" number="2"></inputData><outputData namespace="res" number="1"></outputData><outputFunction>

<function> res0 = op0 | op1 </function></outputFunction>

</functionDescription></Opname><Opname name="shr" code="000">

<functionDescription><inputData namespace="op" number="2"></inputData><outputData namespace="res" number="1"></outputData><outputFunction>

<function> res0 = op0 >> op1 </function></outputFunction>

</functionDescription></Opname><Opname name="not" code="000">

<functionDescription><inputData namespace="op" number="1"></inputData><outputData namespace="res" number="1"></outputData><outputFunction>

<function> res0 = !op0 </function></outputFunction>

</functionDescription></Opname><Opname name="mul" code="000">

<functionDescription><inputData namespace="op" number="2"></inputData><outputData namespace="res" number="1"></outputData><outputFunction>

<function> res0 = op0 * op1 </function></outputFunction>

</functionDescription></Opname>

</Opnames>

56

Page 57: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<!-- Operations --><Operations>

<!-- Opset0 --><Operationset exelength="1">

<Opname name="add" /><Opname name="sub" /><Opname name="and" /><Opname name="or" />

<Opdirection direction="out" /><Opdirection direction="in" /><Opdirection direction="in" /><Input>

<input cycle="0" name="get_rport" source="1" /><input cycle="0" name="BUS1" source="1" /><input cycle="0" name="get_rport" source="2" /><input cycle="0" name="BUS1" source="2" />

</Input><Execution>

<exeunit cycle="0" name="exe1" /></Execution><Output>

<output cycle="0" name="get_wport" target="1" /><output cycle="0" name="BUS2" target="1" />

</Output></Operationset><!-- Opset1 --><Operationset exelength="1">

<Opname name="shr" />

<Opdirection direction="out" /><Opdirection direction="in" /><Opdirection direction="in" /><Input>

<input cycle="0" name="get_rport" source="1" /><input cycle="0" name="BUS1" source="1" /><input cycle="0" name="get_rport" source="2" /><input cycle="0" name="BUS1" source="2" />

</Input><Execution>

<exeunit cycle="0" name="exe1" /></Execution><Output>

<output cycle="0" name="get_wport" target="1" /><output cycle="0" name="BUS2" target="1" />

</Output></Operationset><!-- Opset2 --><Operationset exelength="1">

<Opname name="not" />

<Opdirection direction="out" /><Opdirection direction="in" /><Input>

<input cycle="0" name="BUS1" source="1" /><input cycle="0" name="rPort1" source="1" />

</Input><Execution>

<exeunit cycle="0" name="exe1" /></Execution><Output>

<output cycle="0" name="BUS2" target="1" /><output cycle="0" name="wPort1" target="1" />

57

Page 58: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

</Output></Operationset><!-- Opset3 --><Operationset exelength="1">

<Opname name="mul" />

<Opdirection direction="out" /><Opdirection direction="in" /><Opdirection direction="in" /><Input>

<input cycle="0" name="get_rport" source="1" /><input cycle="0" name="BUS1" source="1" /><input cycle="0" name="get_rport" source="2" /><input cycle="0" name="BUS1" source="2" />

</Input><Execution>

<exeunit cycle="0" name="exe1" /></Execution><Output>

<output cycle="0" name="wPort1" target="1" /><output cycle="0" name="BUS2" target="1" />

</Output></Operationset>

</Operations><!-- Units --><Units>

<!-- Unit: ALU --><Unit name="ALU" number="1" stages="1">

<Unitopset><Opname name="add" /><Opname name="sub" /><Opname name="and" /><Opname name="or" />

<Read1 bankname="regGP1" /><Read1 bankname="regConst" />

<Read2 bankname="regGP1" /><Read2 bankname="regConst" />

<Write bankname="regGP1" /></Unitopset><Unitopset>

<Opname name="shl" /><Opname name="shr" /><Opname name="ashr" />

<Read1 bankname="regGP1" />

<Read2 bankname="regGP1" /><Read2 bankname="regConst" />

<Write bankname="regGP1" /></Unitopset><Unitopset>

<Opname name="cmp" />

<Read1 bankname="regGP1" /><Read1 bankname="regConst" />

<Read2 bankname="regGP1" /><Read2 bankname="regConst" />

58

Page 59: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<Write bankname="regCtrl" /></Unitopset><Unitopset>

<Opname name="not" />

<Read1 bankname="regGP1" />

<Read2 bankname="regGP1" />

<Write bankname="regGP1" /></Unitopset><Unitopset>

<Opname name="mul" />

<Read1 bankname="regGP1" /><Read1 bankname="regConst" />

<Read2 bankname="regGP1" /><Read2 bankname="regConst" />

<Write bankname="regGP1" /></Unitopset>

</Unit><!-- Unit: LoadStoreUnit --><Unit name="LoadStoreUnit" number="1" stages="1">

<Unitopset><Opname name="loadw" />

<Read1 bankname="regGP1" />

<Read2 bankname="norb" />

<Write bankname="regGP1" /></Unitopset><Unitopset>

<Opname name="const" />

<Read1 bankname="norb" />

<Read2 bankname="norb" />

<Write bankname="regGP1" /></Unitopset><Unitopset>

<Opname name="storew" />

<Read1 bankname="regGP1" /><Read1 bankname="regConst" />

<Read2 bankname="regGP1" />

<Write bankname="norb" /></Unitopset><Unitopset>

<Opname name="move" />

<Read1 bankname="regGP1" /><Read1 bankname="regConst" /><Read1 bankname="regCtrl" /><Read1 bankname="regPC" />

<Read2 bankname="norb" />

59

Page 60: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<Write bankname="regGP1" /><Write bankname="regCtrl" /><Write bankname="regPC" />

</Unitopset></Unit><!-- Unit: BUnit --><Unit name="BUnit" number="1" stages="1">

<Unitopset><Opname name="b" />

<Read1 bankname="norb" />

<Read2 bankname="norb" />

<Write bankname="norb" /></Unitopset><Unitopset>

<Opname name="beq" /><Opname name="bne" /><Opname name="bgt" /><Opname name="ble" /><Opname name="bcs" /><Opname name="bcc" />

<Read1 bankname="regCtrl" />

<Read2 bankname="norb" />

<Write bankname="norb" /></Unitopset>

</Unit></Units><!-- Resmap --><Resmap>

<Rport><case_bank_of_get_rport bank="regConst" rport="rPort2" /><case_bank_of_get_rport bank="regCtrl" rport="rPort3" /><case_bank_of_get_rport bank="regGP1" rport="rPort1" /><case_bank_of_get_rport bank="regPC" rport="rPort4" />

</Rport><Wport>

<case_bank_of_get_wport bank="regCtrl" wport="wPort2" /><case_bank_of_get_wport bank="regGP1" wport="wPort1" /><case_bank_of_get_wport bank="regPC" wport="wPort3" />

</Wport></Resmap>

</PEClass>

<PEClass name="c2" implements="c1"><!-- IOPorts --><IOPorts>

<!-- Data ports --><Port name="ip0" bitwidth="32" direction="in" type="data"/><Port name="ip1" bitwidth="32" direction="in" type="data"/><Port name="ip2" bitwidth="32" direction="in" type="data"/><Port name="ip3" bitwidth="32" direction="in" type="data"/><Port name="op0" bitwidth="32" direction="out" type="data"/><Port name="op1" bitwidth="32" direction="out" type="data"/><Port name="op2" bitwidth="32" direction="out" type="data"/><!-- Control ports --><Port name="oc0" bitwidth="1" direction="out" type="ctrl"/>

</IOPorts><!-- StorageElements -->

60

Page 61: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

<StorageElements><RegisterBank name="regConst" number="2" bitwidth="32" type="data" namespace="c"/><RegisterBank name="regGP1" number="48" bitwidth="32" type="data" namespace="gp"/><RegisterElement name="pc" bitwidth="16" type="ctrl" /><SpRegister registername="nullreg" bankname="regConst" registernumber="0" /><SpRegister registername="ONE" bankname="regConst" registernumber="1" /><SpRegister registername="sp" bankname="regGP1" registernumber="0" /><SpRegister registername="iparams" bankname="regGP1" registernumber="1-6" />

<!-- Control Path Registers --><FBFIFOBank name="FBFIFOCtrl" number="1" bitwidth="1" depth="3"

type="ctrl" namespace="FC"/><SpRegister registername="OC" bankname="FBFIFOCtrl" registernumber="0"/>

<!-- Data Path Registers --><RegisterBank name="rData" number="16" bitwidth="32"

type="data" namespace="RD"/><FIFOBank name="iFIFOData" number="4" bitwidth="32" depth="4"

type="data" namespace="ID"/><SpRegister registername="OD" bankname="rData" registernumber="13-15"/><FBFIFOBank name="FBFIFOData" number="6" bitwidth="32" depth="8"

type="data" namespace="FD"/>

<LocalMemory name="MEM" bitwidth="32" size="256"/><InstructionMemory bitwidth="128" size="32"/><PortMapping>

<Bank name="iFIFOData" index="0" port="ip0"/><Bank name="iFIFOData" index="1" port="ip1"/><Bank name="iFIFOData" index="2" port="ip2"/><Bank name="iFIFOData" index="3" port="ip3"/><Bank name="rData" index="13" port="op0"/><Bank name="rData" index="14" port="op1"/><Bank name="rData" index="15" port="op2"/>

<Bank name="FBFIFOCtrl" index="0" port="oc0"/></PortMapping>

</StorageElements></PEClass>

<PEClass name="c3" implements="c2"><!-- Opnames --><Opnames>

<Opname name="add" code="000"><functionDescription>

<inputData namespace="op" number="2"></inputData><outputData namespace="res" number="1"></outputData><outputFunction>

<function> res0 = op0 + op1 </function></outputFunction>

</functionDescription></Opname><Opname name="muladd" code="111">

<functionDescription><inputData namespace="op" number="2"></inputData><outputData namespace="res" number="1"></outputData><outputFunction>

<function> res0[i] = op0[2*i]*op1[2*i]+op0[2*i+1]*op1[2*i+1] </function></outputFunction>

</functionDescription>

61

Page 62: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

</Opname></Opnames>

</PEClass></maml>

62

Page 63: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

References

[ACL+06] F. Angiolini, J. Ceng, R. Leupers, F. Ferrari, C. Ferri, and L. Benini. AnIntegrated Open Framework for Heterogeneous MPSoC Design SpaceExploration. InAccepted for publication at Design, Automation & Testin Europe (DATE), Munich, Germany, March 2006.

[Aka96] H. Akaboshi. A Study on Design Support for Computer ArchitectureDesign. PhD thesis, Depart. of Information Systems, Kyushu University,Japan, January 1996.

[BEM+03] V. Baumgarte, G. Ehlers, Frank May, A. Nuckel, Martin Vorbach, andMarkus Weinhardt. PACT XPP – A Self-Reconfigurable Data ProcessingArchitecture.The Journal of Supercomputing, 26(2):167–184, 2003.

[BHE91] D.G. Bradlee, R.R. Henry, and S.J. Eggers. The Marion System forRetargetable Instruction Scheduling. InProc. ACM SIGPLAN91 Conf.on Programming Language Design and Implementation, pages 229–240,Toronto, Canada, June 1991.

[Fau95] A. Fauth. Beyond Tool-Specific Machine Descriptions. In P. Marwedeland G. Goossens, editors,Code Generation for Embedded Processors,pages 138–152. Kluwer Academic Publishers, 1995.

[FTTW01] Dirk Fischer, Jurgen Teich, Michael Thies, and Ralph Weper. Designspace characterization for architecture/compiler co-exploration. InACMSIG Proceedings International Conference on Compilers, Architecturesand Synthesis for Embedded Systems (CASES 2001), pages 108–115, At-lanta, GA, U.S.A., November 2001.

[FTTW02] D. Fischer, J. Teich, M. Thies, and R. Weper. Efficient architec-ture/compiler co-exploration for asips. InACM SIG Proceedings Inter-national Conference on Compilers, Architectures and Synthesis for Em-bedded Systems (CASES 2002), pages 27–34, Grenoble, France, 2002.

[FTTW03] D. Fischer, J. Teich, M. Thies, and R. Weper. BUILDABONG: A Frame-work for Architecture/Compiler Co-Exploration for ASIPs.Journal forCircuits, Systems, and Computers, Special Issue: Application SpecificHardware Design, pages 353–375, 2003.

[FVF95] A. Fauth, J. Van Praet, and M. Freericks. Describing Instruction SetProcessors using nML. InProceedings on the European Design and TestConference, Paris, France, pages 503–507, March 1995.

63

Page 64: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

[GVL+96] G. Goossens, J. Van Praet, D. Lanneer, W. Geurts, and F. Thoen. Pro-grammable Chips in Consumer Electronics and Telecommunications. InG. de Micheli and M. Sami, editors,Hardware/Software Co-Design,volume 310 ofNATO ASI Series E: Applied Sciences, pages 135–164.Kluwer Academic Publishers, 1996.

[Han99] S.Z. Hanono.Aviv: A Retargetable Code Generator for Embedded Pro-cessors. PhD thesis, Massachusetts Inst. of Tech., June 1999.

[HGK+99] Ashok Halambi, Peter Grun, Asheesh Khare, Vijay Ganesh, Nikil Dutt,and Alex Nicolau. EXPRESSION: A Language for Architecture Ex-ploration through Compiler/Simulator Retargetability. InProceedingsDesign Automation and Test in Europe (DATE’1999), 1999.

[HRD99] G. Hadjiyiannis, P. Russo, and S. Devadas. A Methodology for Accu-rate Performance Evaluation in Architecture Exploration. InProc. 36thDesign Automation Conference (DAC99), pages 927–932, New Orleans,LA, June 1999.

[Kae00] D. Kaestner.Retargetable Postpass Optimization by Integer Linear Pro-gramming. PhD thesis, Saarland University, Germany, 2000.

[KHT04a] Alexey Kupriyanov, Frank Hannig, and Jurgen Teich. Automatic andOptimized Generation of Compiled High-Speed RTL Simulators. InPro-ceedings of Workshop on Compilers and Tools for Constrained Embed-ded Systems (CTCES 2004), Washington, DC, U.S.A., September 2004.

[KHT04b] Alexey Kupriyanov, Frank Hannig, and Jurgen Teich. High-SpeedEvent-Driven RTL Compiled Simulation. InProceedings of the 4th In-ternational Samos Workshop on Systems, Architectures, Modeling, andSimulation (SAMOS 2004), Island of Samos, Greece, July 2004.

[KPBT06] S. Kunzli, F. Poletti, L. Benini, and L. Thiele. Combining Simulation andFormal Methods for System-Level Performance Analysis. InProc. De-sign, Automation and Test in Europe (DATE), Munich, Germany, March2006.

[Kr u02] Heiko Krugel. Entwicklung eines Werkzeugs zur automatischen Gener-ierung der Schnittstellendateien zu einem retargierbaren Compiler Gen-erator. Master’s thesis, Universitat-GH Paderborn, April 2002.

[LM98] R. Leupers and P. Marwedel. Retargetable Code Generation based onStructural Processor Descriptions. InProceedings on Design Automationfor Embedded Systems, volume 3, pages 1–36, March 1998.

64

Page 65: MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

[MD05] P. Mishra and N. Dutt. Architecture Description Languages for Pro-grammable Embedded Systems. InIEE Proceedings on Computers andDigital Techniques, Toronto, Canada, 2005.

[PHM00] S. Pees, A. Hoffmann, and H. Meyr. Retargeting of Compiled Simulatorsfor Digital Signal Processors Using a Machine Description Language. InProceedings Design Automation and Test in Europe (DATE’2000), Paris,March 2000.

[QM02] W. Qin and S. Malik. Architecture Description Languages for Retar-getable Compilation. InThe compiler design handbook: optimizationsmachine code generation. CRC Press, 2002.

[THG+99] H. Tomiyama, A. Halambi, P. Grun, N. Dutt, and A. Nicolau. Architec-ture Description Languages for System-on-Chip Design. InProc. Proc.APCHDL, Fukuoka, Japan, October 1999.

[TKW00] J. Teich, P. Kutter, and R. Weper. Description and simulation of mi-croprocessor instruction sets using asms. InInternational Workshop onAbstract State Machines, Lecture Notes on Computer Science (LNCS),pages 266–286. Springer, 2000.

[TPE01] A.S. Terechko, E.J.D Pol, and J.T.J van Eijndhoven. PRMDL: A Ma-chine Description Language for Clustered VLIW Architectures. InPro-ceedings Design Automation and Test in Europe (DATE’2001), page 821,Munich, Germany, March 2001.

[Tri] Trimaran.http://www.trimaran.org .

65