Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Computer EngineeringMekelweg 4,
2628 CD DelftThe Netherlands
http://ce.et.tudelft.nl
2005
MSc THESIS
H/W Architecture Design for the Publish andSubscribe Mechanism
Maomei Chen
Abstract
Faculty of Electrical Engineering, Mathematics and Computer Science
CE-MS-2005-03
Nowadays, as distributed systems become a popular way of imple-menting many embedded computing applications, the difficulty ofdeciding tradeoffs on system architecture challenges designers of dis-tributed embedded systems. This is in part due to the fact thatcommunication, the data exchange among system components, mayvary according to the function changes in each of the system compo-nent during the design process. Besides, the data dependence amongsystem components may cause the whole system collapse due to one
component’s failure. The question then becomes how to design systems that have these robust and highlydynamic properties. In this thesis, a hardware architecture called agent is proposed, aiming at providingflexible, easy and firm construction of distributed embedded system. The agent works for an embeddedboard in a distributed system much the same as that a network interface card works for a computer in anetwork. A publish and subscribe mechanism is chosen as the protocol for the communication. A softwareimplementation for the mechanism has already been developed. Applying the software implementation onthe distributed embedded system causes the embedded processor to handle heavy communication loadsbesides its own computation tasks, which refrains the embedded processor from making fully use of itsprocessing ability. For DSPs, which has special architectures and instructions for the digital signal process-ing, the problem becomes quite critical. A hardware module that helps with communications and allowsDSPs mainly doing digital signal processing seems a promising solution. In addition to the proposed hard-ware architecture in this thesis work, a functional model in VHDL is built to verify the design. Simulationwaves show that all functional blocks in the design work correctly as expected. The devised architecture inthis thesis is a prototype for the publish and subscribe mechanism, paving the way for the future work.
H/W Architecture Design for the Publish andSubscribe Mechanism
THESIS
submitted in partial fulfillment of therequirements for the degree of
MASTER OF SCIENCE
in
COMPUTER ENGINEERING
by
Maomei Chenborn in Guilin, P.R.China
Computer EngineeringDepartment of Electrical EngineeringFaculty of Electrical Engineering, Mathematics and Computer ScienceDelft University of Technology
H/W Architecture Design for the Publish andSubscribe Mechanism
by Maomei Chen
Abstract
Nowadays, as distributed systems become a popular way of implementing many embeddedcomputing applications, the difficulty of deciding tradeoffs on system architecture chal-lenges designers of distributed embedded systems. This is in part due to the fact that
communication, the data exchange among system components, may vary according to the func-tion changes in each of the system component during the design process. Besides, the datadependence among system components may cause the whole system collapse due to one com-ponent’s failure. The question then becomes how to design systems that have these robust andhighly dynamic properties. In this thesis, a hardware architecture called agent is proposed, aim-ing at providing flexible, easy and firm construction of distributed embedded system. The agentworks for an embedded board in a distributed system much the same as that a network interfacecard works for a computer in a network. A publish and subscribe mechanism is chosen as theprotocol for the communication. A software implementation for the mechanism has already beendeveloped. Applying the software implementation on the distributed embedded system causesthe embedded processor to handle heavy communication loads besides its own computation tasks,which refrains the embedded processor from making fully use of its processing ability. For DSPs,which has special architectures and instructions for the digital signal processing, the problembecomes quite critical. A hardware module that helps with communications and allows DSPsmainly doing digital signal processing seems a promising solution. In addition to the proposedhardware architecture in this thesis work, a functional model in VHDL is built to verify the de-sign. Simulation waves show that all functional blocks in the design work correctly as expected.The devised architecture in this thesis is a prototype for the publish and subscribe mechanism,paving the way for the future work.
Laboratory : Computer EngineeringCodenumber : CE-MS-2005-03
Committee Members :
Advisor: Arjan van Genderen, CE, TU Delft
Advisor: Frits van der Wateren, Chess
Chairperson: Stamatis Vassiliadis, CE, TU Delft
Member: Koen Langendoen, ST, TU Delft
i
ii
Dedicated with love and gratitude to my mother Yuhuan Chenand the memory of my father Ningsen Yu
iii
iv
Contents
List of Figures viii
List of Tables ix
Acknowledgments xi
1 Introduction 11.1 A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Research Questions and Methodology . . . . . . . . . . . . . . . . . . . . 31.3 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background 52.1 The Distributed Embedded System . . . . . . . . . . . . . . . . . . . . . . 52.2 Digital Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Publish and Subscribe Mechanism . . . . . . . . . . . . . . . . . . . . . . 82.4 Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.5 TCP/IP Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.6 Functional Verification of HDL Models . . . . . . . . . . . . . . . . . . . . 12
3 Publish and Subscribe Architecture Design 153.1 Architecture Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Original Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.1.2 Redesigned Architecture . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Modeling and Results 214.1 Functional Block Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.1.2 Data Descriptor Format Design . . . . . . . . . . . . . . . . . . . . 224.1.3 Frame Structure Design . . . . . . . . . . . . . . . . . . . . . . . . 264.1.4 Functional Units Description . . . . . . . . . . . . . . . . . . . . . 27
4.2 Functional Model and Testbench Construction . . . . . . . . . . . . . . . 294.2.1 Function Block Realization . . . . . . . . . . . . . . . . . . . . . . 294.2.2 Testbench Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.2.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Exploration On IP Block Development Method in Xilinx’s Embedded Sys-tem Development Environment . . . . . . . . . . . . . . . . . . . . . . . . 354.3.1 Embedded System Development Platform . . . . . . . . . . . . . . 354.3.2 Implementation of the Transmitting Channel . . . . . . . . . . . . 36
4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
v
5 Conclusions 415.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Bibliography 46
A Publish and Subscribe Process Description in the Original Architec-ture 47
B Behavioral Model Testbench Design 51
C Behavioral Model Simulation Waveforms 55
D Do File for Running BFM Test Modules in Modelsim Simulator 61
E The Transmitting Channel Simulation Waveforms 63
vi
List of Figures
1.1 A basic SONAR system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 A bus solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 An embedded system architecture . . . . . . . . . . . . . . . . . . . . . . 62.2 Publish and subscribe system architecture . . . . . . . . . . . . . . . . . . 82.3 Comparison between TCP/IP layers and ISO/OSI layers . . . . . . . . . . 112.4 TCP/IP frame format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1 Agent’s architecture design . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2 Redesigned architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1 Elements that influence the data descriptor format design . . . . . . . . . 234.2 Mapped architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.3 Agent’s functional model architecture . . . . . . . . . . . . . . . . . . . . 304.4 Test frames design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.5 The transmitting channel with testbench . . . . . . . . . . . . . . . . . . . 37
A.1 The publish process on a publishing agent . . . . . . . . . . . . . . . . . . 47A.2 Agents’ reaction to publication announcements . . . . . . . . . . . . . . . 48A.3 The subscribe process on a subscribing agent . . . . . . . . . . . . . . . . 48A.4 Agents’ reaction to subscription announcements . . . . . . . . . . . . . . . 49A.5 The write process on a writing agent . . . . . . . . . . . . . . . . . . . . . 49A.6 Agents’ reaction to the write operation . . . . . . . . . . . . . . . . . . . . 50A.7 The read process on a reading agent . . . . . . . . . . . . . . . . . . . . . 50
B.1 The CSMA/CD state machine adopted in the MAC behavioral model . . 51B.2 Read process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52B.3 Write process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52B.4 Behavioral model testbench . . . . . . . . . . . . . . . . . . . . . . . . . . 53
C.1 Test case 1: one sending node, one receiving node, bus clock 125MHz,transmitting MAC FIFO depth:12 frames . . . . . . . . . . . . . . . . . . 55
C.2 Test case 2: one sending node, one receiving node, bus clock 125MHz,transmitting MAC FIFO depth:4 frames . . . . . . . . . . . . . . . . . . 56
C.3 Test case 3: four sending nodes, two receiving nodes, bus clock 125MHz,transmitting MAC FIFO depth:12 frames . . . . . . . . . . . . . . . . . . 57
C.4 Test case 4: four sending nodes, two receiving nodes, bus clock 1250MHz,transmitting MAC FIFO depth:12 frames . . . . . . . . . . . . . . . . . . 58
C.5 Test case 5: ten sending nodes, one receiving node, bus clock 125MHz,transmitting MAC FIFO depth:12 frames . . . . . . . . . . . . . . . . . . 59
C.6 Test case 6: ten sending nodes, one receiving node, bus clock 1250MHz,transmitting MAC FIFO depth:12 frames . . . . . . . . . . . . . . . . . . 60
vii
E.1 Transmitting channel testbench result (a) . . . . . . . . . . . . . . . . . . 63E.2 Transmitting channel testbench result (b) . . . . . . . . . . . . . . . . . . 64
viii
List of Tables
2.1 The Ethernet frame format . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1 Possible data descriptor content . . . . . . . . . . . . . . . . . . . . . . . 163.2 Resemblances between multicast and the redesigned architecture . . . . . 19
4.1 Context data descriptor format . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Periodical data descriptor format . . . . . . . . . . . . . . . . . . . . . . . 254.3 Context data subscription list item . . . . . . . . . . . . . . . . . . . . . . 264.4 Periodical data subscription list item . . . . . . . . . . . . . . . . . . . . . 264.5 Frame structure design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.6 The idle time of the network under the different data generation rates . . 354.7 Resources of Virtex II Pro 20 . . . . . . . . . . . . . . . . . . . . . . . . . 36
ix
x
Acknowledgements
The thesis was carried out at Chess embedded Technology BV from Nov. 2004till June 2005. During this time I have been a master student at ComputerEngineering (CE) Laboratory of Delft University of Technology (TU Delft).
I wish to express my gratitude to Frits van der Wateren, my supervisor at ChesseT, whose brilliant ideas, impressive thoroughness and warm support has broughtthis work forth. It has been a privilege to work with him. I have deeply appreciatedour conversations on the thesis work. I am also grateful to him for giving me theopportunity to finish my master thesis at Chess.
The unfailing support of my colleagues at Chess eT has made it possible for meto complete my thesis study. I would like to thank Ed, Collin, the administrators ofChess, who tried their best in helping me in the computer administration. I also extendmy thanks to Robert. I have benefitted from his expertise and enjoyed his timelysupport. I have also greatly appreciated my friendly colleagues for their considerationson a foreign afstudeerder.
I am sincerely grateful to my supervisor at CE group, Arjan van Genderen, forpatiently guiding me through the whole process of my thesis work, providing me withsoftware supports and devoting his precious time in correcting my thesis.
I am in debt to Weidong and Claudiu, who gave me kindly help on LaTeX andthesis writing. Without their help, things would be much harder.
Special thanks to Prof. Stamatis Vassiliadis, Head of our CE group, for his everlasting encouragement and optimism.
I would also like to thank Georgi Gaydadjiev, Sorin Cotofana, Stephan Wong,Koen Langendoen and all other people who helped me in making the thesis possible. Iwas so lucky to meet you and get your assistance during my master’s study.
The support from Yuan Tong has been invaluable. Thank you for your patience.
Delft, The Netherlands Maomei ChenJune, 2005
xi
xii
Introduction 1We only think when we are confronted with a problem.
- John Dewey
Nowadays, Distributed systems become a popular way of implementing manyembedded computing applications [1]. However, designers of distributed embeddedsystems always run into the difficulties of deciding the tradeoffs while determining
a system architecture or redefining an existing design[2]. Communication, the dataexchange among system components, may vary according to the function changes in eachof the system component during the design process. Besides, the data dependence amongsystem components may cause the whole system collapse due to one component’s failure.The question then becomes how to design systems that have these robust and highlydynamic properties. In [3] Maarten Boasson proposes an Applications-Agents-Networkarchitecture for easy developing software for control systems. In [4], he describes thesame architecture in the environment of distributed embedded systems. The architectureworks through a publish and subscribe mechanism. The proposed architecutre solvesthe above two problems by splitting processing and communication functions in eachcomponent of distributed systems. A software implementation has been developed forit [5]. In a design case faced by Chess, the processing unit in each component of theembedded systems are DSPs. Applying the software implementation of publish andsubscribe mechanism on DSPs will refrain the DSP from making fully use of its signalprocessing ability. Here comes the need for the hardware implementation of the samemechanism.
In the ensuing sections, you may find in Section 1.1 a real case that has the de-sign problems mentioned above. Section 1.2 poses the research questions and addressesthe methodology followed in designing the hardware architecture of the publish andsubscribe mechanism. Section 1.3 concludes this chapter with an overview of the thesis.
1.1 A Case Study
As shown in Figure 1.1, a basic SONAR receive system consists of a sensor array,a data acquisition unit, a beam forming unit, a detection processing unit, a displayprocessing unit and a display unit. In practice, the processing units are composed ofan array of function blocks. A detailed implementation of the system can be seen inFigure 1.2. In order to achieve the flexibility and scalability of the whole system, in thisparticular design, PC (pulse compression), ↓ (down sampling) and beamforming aredistributed on different Digital Signal Processors (DSPs). Due to the data processing
1
2 CHAPTER 1. INTRODUCTION
Sensor Array Data Acquisition Beam Forming Detection
Processing Display Processing
Convert acoustic
signals to electrical
signals
Digitizes data for
processing Forms defined
beams from omni
element data
Extracts signals from
noise/reverberation
Associates target
tracks and
normalizes data
Display
Basic SONAR receive system
Figure 1.1: A basic SONAR system
requirements, each output of the down sampling units is going to be processed ineach of the beamformers. To realize this kind of communication with the availablecommunication ports on each DSP is not possible for so many nodes, say 64. The reasonresides in the limited number of communication ports on the DSPs. Normally, it is nomore than 4. However, by means of a shared medium, as shown in Figure 1.3, all theprocessing nodes can be linked together. In this way, the demand on communicationports on DSPs can also reduce to a reachable number that the current DSPs couldprovide.
A/D PC BF FFT Detection 40MHz
16-bit I,Q
40MHz
m-point FFT
every m
sweeps
2.5MHz 2.5/m MHz
A/D PC BF FFT Detection 40MHz
16-bit I,Q 40MHz 2.5MHz 2.5/m MHz
A/D PC BF FFT Detection 40MHz
16-bit I,Q
40MHz 2.5MHz 2.5/m MHz
80-point
FIR@40MHz
2.5MHz
Figure 1.2: An example
PC BF BF ... ... PC PC PC
Figure 1.3: A bus solution
What does the traffic look like when all nodes use one way to send their data? Anestimation on the bandwidth of the shared medium, which is the data rate at the input
1.2. RESEARCH QUESTIONS AND METHODOLOGY 3
of the beam-former, can be calculated roughly by the following equation1:
64× 2.5× 106 × 2× 16 = 5Gb/s (1.1)
10 Gigabit Ethernet is the highest speed Ethernet that can be adopted to connect allthese devices together in local area network (LAN). It satisfies the bandwidth demandof the above case. Another reason to choose Ethernet to carry out the transmission ofthe data is the wide availability of standard network devices for it.
Another problem comes after: what will be the paradigm for each node to com-municate with each other? The publish and subscribe mechanism is chosen by Chessfor the situation. It splits the processing and communicating tasks, which eases thedesign of the software on DSPs; it supports addition and removal of nodes in a systemat runtime, which increases the flexibility of the system architecture; and fault-tolerancecan also be achieved, which is very important for applications such as the above case.
Although a software implementation has been developed to serve as a middle-ware in each DSP, as can be seen from the application, each DSP also has a very heavydigital processing task. It will greatly lower the performance if a lot of clock cycles aredevoted to communications. To develop special hardware to deal with communicationtasks and make the processor run at full speed on the digital signal processing is a goodidea. It is also Chess’ desire for this research work.
1.2 Research Questions and Methodology
Considering the above case, research questions arises:
• How does the publish and subscribe mechanism work?
• How to start the architecture design of the publish and subscribe mechanism?
• Will the designed architecture function correctly?
• What improvements can be made to improve the prototype design?
The methodology of the research is thus as follows:
• The publish and subscribe mechanism is examined.
• Based on the concept of the publish and subscribe mechanism, an architecture isdevised. The functional model of the architecture is created.
• Testbenches are constructed for the functional model to verify the architecturedesign.
12 in the equation represents two channels: I and Q, 16 represents the resolution of ADC, which is16-bit.
4 CHAPTER 1. INTRODUCTION
1.3 Thesis Overview
This section gives an overview of the remainder of this thesis.
Chapter 2 introduces the background knowledge related to the H/W architecturedesign of the publish and subscribe mechanism. It includes the introduction to distrib-uted embedded systems. Digital signal processing and beamforming’s basic conceptsare presented afterwards. Special architectures and techniques that are adopted bydigital signal processors are presented too. The publish and subscribe mechanism isintroduced. The general topics on Ethernet and TCP/IP are also discussed in thischapter. Finally the functional verification of HDL models is introduced. It is themethod employed during the architecture design .
In Chapter 3, based on the understanding of the publish and subscribe mecha-nism, an architecture is devised and explained to demonstrate how it works. The finalversion is worked out for the prototype design after the analysis of the transmissionmethods.
Chapter 4 describes the modeling and results of the H/W architecture design ofthe publish and subscribe mechanism described in the previous chapter. Consideringthis is the prototype design, some assumptions are made regarding the practical issues.A functional model is then constructed to verify the architecture. Main components ofthe architecture are explained one by one. Then a testbench is built to verify the design.Foreign Language Interface provided by Modelsim is adopted to create the data sourceand data sink for testing. The prototype design will be implemented in Xilinx embeddedsystem development environment. In order to explore the IP block development methodin such an environment, the transmitting part of the architecture is constructed by Xilinxtools. This part of design could totally be synthesized. Bus Functional Model for IBMPowerPC’s Processor Local Bus is applied to build the testbench. The goal is to inves-tigate the verification method for IP blocks that are attached to the PowerPC’s PLB bus.
Chapter 5 presents the conclusions. First, the summary of the conclusions inthis thesis is given. Besides, future research directions are presented.
Background 2Before all else, be armed.
- Niccolo Machiavelli
Concerned with H/W architecture design for the publish and subscribe mechanism,background knowledge is reviewed in this chapter. Section 2.1 provides an intro-duction to the distributed embedded systems. The distributed embedded system (es-
pecially with DSPs(Digital Signal Processor) as the processors) is the target system thatthe thesis will address. In Section 2.2, the digital signal processing and beamforming’sbasic concepts are presented. They are the typical applications that are running in thetarget distributed embedded system. Digital signal processors’ special architectures andtechniques are also presented. Section 2.3 introduces the concept of the publish and sub-scribe mechanism. The hardware architecture design finds its root in this mechanism.The general topics on Ethernet and TCP/IP network architecture are introduced in Sec-tion 2.4 and Section 2.5. The nodes in the target system are linked by Ethernet devicessince it is easy to construct a local area network using off-shelf products and Ethernetoffers competent network capacity. TCP/IP protocol suites are supported protocols inthese devices. Thus, for each node in the target system, they should be able to generateand recognize the format of the information conveyed in the network. The idea of func-tional verification is discussed in Section 2.6. The author adopts this concept to do thefunctional verification of the architecture design.
2.1 The Distributed Embedded System
As for the distributed embedded system, two words build up the special characteristicof this kind of system. Undoubtedly, one is embedded, the other is distributed. Forembedded, literally, it means integrated. Then what are integrated in an embeddedsystem? Generally, they are a processor, memory and I/O ports. It resembles acomputer system, which is also composed of the above elements with the processorexecuting software and performing specific predetermined operations; the memorystoring software and data; I/O ports inputting and outputting information through I/Odevices, for example parallel ports, serial ports, general I/O ports. However, it is moreappropriate to add ’micro’ before ’computer’ to address an embedded system. Theword ’micro’ emphasizes more on the size of the computer. From the computing powerpoint of view, it can range from a simple AT89C52 - 8-bit msc51 core to a PentiumIV. The software together with the I/O ports and associated interface circuits givean embedded computer system its distinctive characteristics[6]. Such as the one in abread machine to control the temperature of the container and actions of the engine
5
6 CHAPTER 2. BACKGROUND
to bake bread with different flavors. This special system consists of a microcomputer.The control software running on it takes in the information from sensors and analog-to-digital converter, then makes decisions upon this information to generate propercontrol signals for the engine and the heater. Applications of the embedded systemdo not only exist in every day life, like the bread machine control system mentionedabove, but vary from the ones in the medical instruments to the control systems in tanks.
registers
control unit ALU
bus
interface
unit
processor RAM
ROM
I/O Ports
bus
Electrical,
Mechanical,
Chemical,
Optical devices
embedded system
microcomputer
Figure 2.1: An embedded system architecture is always composed of a processor, memoryand I/O ports. The processor executes software by means of the cooperation of registers,an arithmetic logic unit, a control unit and a bus interface which is used to connect thefollowing three components: a)read only memory that is used to store software and fixedconstant data, b) random access memory to store temporary information and c) I/Oports by a bus. The special I/O devices with special software provide the necessaryfunctionality for the embedded systems[6].
As for the word ’distributed’, normally, it refers to the system that consists of a group ofcomputers. They are independent of each other but do the tasks for the user as a whole.This is accomplished by a network and distribution middleware in each computer forcoordinating their activities and to share the resources of the system[7]. A distributedembedded system contains the embedded system as its working elements instead oftypical computers. Being distributed, the embedded system can improve its performanceby achieving parallelism existing between the computation tasks.
2.2 Digital Signal Processing
Digital signal processing (DSP) sounds more familiar if considering its numerousapplications in telephony, mobile radio, satellite communications, speech processing,video and image processing, biomedical applications, radar, and sonar.
The objects that digital processing methods deal with are digital signals. It isdifferent from the ones in nature that are continuous. For a digital signal processingsystem, it is always composed of a) a sample and hold unit to make the continuoussignals from some signals sources to be discrete, b) an analog to digital converterconverting the discrete signals presented in real numbers to numbers presented bycertain bits of ’0’ and ’1’, c) a digital processing unit or processor executing different
2.2. DIGITAL SIGNAL PROCESSING 7
digital signal methods such as FIR filters, d) a digital to analog converter to generatedesired control signals to complete the interactions with the environment.
A digital signal processor (DSP) is a specialized microprocessor designed specifi-cally for the digital signal processing. It can be the processor used in the embeddedsystem introduced in the above section. DSPs employ different architectures andtechniques, such as:
• Multiply-accumulate (MAC) operations which is good for all kinds of matrix op-erations;
• Deep pipelining;
• The ability to act as a direct memory access device for the host environment;
• Saturation arithmetic, in which operations that produce overflows will accumulateat the maximum or minimum values that the register can hold rather than wrappingaround;
• Separate program and data memories (Harvard architecture);
• Most DSPs are fixed-point, because in real world signal processing, extra precisionis often not required, and there is a large speed benefit; however, floating pointDSPs are common for scientific and other applications where precision is required;
• Specialized instructions for modulo addressing in ring buffers and bit-reversed ad-dressing mode for FFT cross-referencing.
The special architectures and techniques favor the regulative operations, unexpectedbranches during the execution will lower down the performance.Beamforming is a kind of digital signal processing method. It performs spacial filteringon the samples collected by an array of sensors. Spacial filtering means separating thesignal coming from a particular direction with interference and signals originating fromother directions. The signal sources are propagating waves such as acoustic waves,ultra-sound waves, electro-magnetic waves and so on [8].
Like other digital signal processing method, beamforming requires sensors such asa set of antennas, sampling units and digitizers. Before the digital signals are processed,they undergo a preprocessing procedures. Techniques such as pulse conversion, downsampling are adopted in these procedures. The objective is to make the signalsof interest be at the optimum part of the spectrum for processing and minimize thebandwidth of them while preserving the quality of the process results at the same time[9].
It is natural to think of a beamforming system to be composed of channels forsignals from various locations. Each channel consists of the same set of processesdescribed above. These processes are usually realized on digital signal processors. Thecomputation tasks in these processors are heavy, due to the complex algorithms torealize these processes and the continuous processing requirements of the system.
8 CHAPTER 2. BACKGROUND
2.3 Publish and Subscribe Mechanism
The original target application system for the publish and subscribe mechanism is thecontrol system [3]. The origin of the idea is due to the development of modern computersystems. Although more complicated control systems can be implemented, difficultiesin doing this also raise levels. The publish and subscribe mechanism which depicts asystem architecture for this kind of application tries to reduce the pain in designinghighly advanced control systems and meet other functional or nonfunctional require-ments. These requirements include a certain degree of fault-tolerance, distribution overheterogeneous processing elements and adaptability. In [4], the same idea is describedin the context of distributed embedded system.
The basic architecture of the suggested system consists of three components, which areapplications, agents and network. They are related with each other as shown in Figure2.2. Each component is described as follows:
A0
Agent0
A1
Agent1
AN
AgentN
Network
Figure 2.2: Publish and subscribe system architecture
Application Each application executes part of the function of the whole system. How-ever, all applications are independent of each other, in the sense that they do notcommunicate directly with each other.
Agent The communication among different applications is managed by the agent. Eachapplication has one agent attached to it. Application and agent’s functions arequite clear - the application works with data processing and the agent handlesdata passing. The detailed function of the agent is described as follows.
• When the application needs a certain type of data A, it informs the agent. Theagent interprets this request as the interest for the instance of data A fromnow on. It then announces that the application attached to it is a subscriberto data A in the whole system by broadcasting the subscription messages toall other nodes.
• When the application produces a particular kind of data B that is needed bysome other application, it makes the publication intention known by the agent.The agent then stores the data for the moment and announces that the currentapplication to be a publisher of data B by broadcasting the information overthe network. The agent only stores the latest instance of data B produced bythe application.
2.3. PUBLISH AND SUBSCRIBE MECHANISM 9
• When the agent gets publication announcements from other agents, it writesdown the data type and the publisher’s network address.
• When the agent receives subscription announcements from other agents, ittakes down the data types and their network addresses.
• The agent serving the application that produces the subscribed data will makethe newly produced data get to all established subscriber agents and do thisas fast as possible. The communication method could be broadcast, multicastand unicast.
• Agents receiving data first judge whether the data is needed by doing thematching. If it is subscribed, the agent stores the data to its memory. Theapplication then reads the needed data from the agent.
This paradigm helps isolating all applications from each other. The only importantthing is that the necessary data is produced somewhere in the network.
Network Data from different agents passes through the network to reach other agents.The requirements for the network is that it should support broadcasting, uni-casting and multi-casting.
The system architecture is data-oriented. Each piece of data that might be useful andsent on the network is labeled. It works much like a letter with not only an addresson it, but also the topic on it. Communication between different agents is built on thesubscription for ”letters” with certain topics. The data sent on the network is detailedup to a particular type needed by applications.
The data model of the publish and subscribe mechanism includes sorts, worldsand categories. Sorts are the basic entities of communication. They enable differenttypes of information to be distinguishable. Different sorts are represented by differentvalues of its key fields. Worlds are a way of constructing sub-shared-dataspace. Thedata produced in one world can not be seen by other worlds. There are three categoriesof data. They are periodical data, context data and persistent data. Descriptions ofdifferent categories are as follows:
• Periodical data is the default data type and is made available to all consumers ofthis type as soon as the data is produced. This category of data is volatile. Sincenormally data samples obtained from the data acquisition equipment are definedto be this kind.
• Context data is sent to all currently-known consumers and a copy is kept to makeit ready for new subscribers to data of this category. It changes less frequentlycompared to the periodical data and is used for starting a process in a predefinedstate.
• Persistent data is kept in a persistent memory. This category of data could beused for restarting the whole system. Thus the key system information is definedto be this kind of data.
10 CHAPTER 2. BACKGROUND
The category, world, sort and other necessary information together make up thedescription of the data. The combination of them are called data descriptor.
The granularity of data descriptor design is critical. Extreme cases should be avoidedto take full advantage of the mechanism. For example, only assigning one descriptorfor all data produced in one application or assigning one descriptor for every instanceof the same kind of data generated in one application are considered to be inappropriate.
Due to the reason that applications in the real time domain deal mostly withdata instances that have a continuous nature, such as the temperature acquired fromthermo sensors, data consistency requirements generally related to distributed systemsare thus relaxed. For the same reason, it is also not harmful for an application to missone sample occasionally. This also puts limits on the decomposition of algorithms toapplication processes. The processes running on different nodes should not require muchsynchronization operations.
2.4 Ethernet
Ethernet is one of the most popular link-layer technology for connecting computersand networked peripherals to Local Area Networks(LAN). It provides a way formultiple end systems to communicate by a common transmission medium. TheEthernet LAN specification describes a contention Media Access Control (MAC)protocol called Carrier Sense Multiple Access with Collision Detection (CSMA/CD).Each attached system waits for the shared medium to be idle before transmittingdata (CSMA) and during the transmitting, it listens for interference (CD). Theinterference happens when several systems begin transmitting at the same time. If acollision is discovered, the participating systems stop the transmission and back off for arandom period of time, thus reduce the possibility of the reoccurrence of the collision.[10]
Ethernet frame is the object of transport in an Ethernet network. The Ethernetframe includes several different fields. The frame starts with an eight-byte preamblefor synchronizing operations and informing the Ethernet Network Interface Card (NIC)to accept incoming data. The frame next features a six-byte destination address field,six-byte source address field, a two-byte type field, and a data field that containsa maximum of 1500 bytes. The Ethernet frame ends with a four-byte frame checksequence field for verifying data integrity. Ethernet frames vary in length and size from64-byte packets to 1518-byte packets counting from the Destination MAC Address fieldthrough the Frame Check Sequence. The Ethernet frame format is used consistentlyacross Ethernet, Fast Ethernet, Gigabit Ethernet and 10 Gigabit Ethernet platforms.
2.5. TCP/IP PROTOCOL 11
Preamble Destination Address Source Address Frame Type Frame Data CRC64 bits 48 bits 48 bits 16 bits 46 - 1500 Bytes 32 bits
Table 2.1: The Ethernet frame format
2.5 TCP/IP Protocol
TCP/IP or Transmission Control Protocol over Internet Protocol is a family of protocolsto connect a number of different networks designed by different vendors into a networkof networks (the ”Internet”)[11]. It was designed to be independent of host hardwareor operating system, as well as media and data-link technologies. Compared with OpenSystems Interconnection (OSI) by the International Organization for Standardization(ISO), TCP/IP only has four layers. The comparison on layers is shown in Figure 2.3.The four layers of TCP/IP are:
Application
TCP UDP
IP
Data Link
Physical
Network
Data Link
Physical
Transport
Session
Presentation
Application
Figure 2.3: Comparison between TCP/IP layers and ISO/OSI layers
Physical Layer In this layer, data is presented as 1s and 0s. This layer puts 0s and 1son a medium, such as physical media or connectors.
Data Link Layer Data at this layer is organized into units called frames. Frameshave headers to include the address and control information and trailers for errordetection.
Network Layer Datagrams are the units for data on this layer. This layer allows datatraversing a single link or several links in an internetwork.
Transport Layer There are two kinds of protocols at this layer, one is TCP, the otheris UDP. TCP consists of reliable connection-oriented transfer of a byte stream.While, UDP provides no mechanisms for error recovery or flow control. UDP isused for applications that require quick but necessarily reliable delivery.
Before the data can be transferred, it is packaged as shown in Figure 2.4.
12 CHAPTER 2. BACKGROUND
Frame Header IP Header TCP or UDP
Header Data Frame Trailer
Application Layer
(data)
Transport Layer
(TCP/UDP message)
Network Layer
(IP datagram)
Data Link Layer
(frame)
Figure 2.4: TCP/IP frame format
In TCP/IP, there are three types of addresses for transmitting messages between twomachines within the same network. They are Ethernet Address (MAC address), IPAddress and Port Number. Ethernet Address is a 48-bit address working on the datalink level. It uniquely identifies one network interface. IP Address stays at the networklayer. It contains 32 bits and identifies a host connected to the Internet. The 32-bitare divided into two parts, one part represents the network identifier (netID), the otherpart represents host identifier (hostID). Depending on the number of bits for netID andhostID, the IP address can be categorized into five classes - A, B, C, D, E. Class A, B,C each contains 1 byte, 2 bytes and 3 bytes for netID. Class D is multicast address andClass E is reserved for future use. Port is an endpoint to a logical connection in TCP/IPand UDP networks. The port number identifies what type of port it is. For example,port 80 is used for HTTP traffic.
2.6 Functional Verification of HDL Models
Functional verification introduced in [12] refers to a process to examine the functionalcorrectness of a hardware design written by hardware description languages andto ensure that the design implements intended functionality. The challenge for theverification process is to decide the input patterns for the design and expect the properlyworking design’s output. Verification could tell the presence of errors, what it could notdo is to tell the absence of errors. However, the number of errors will decrease as theeffort put on the verification increases.
Nowadays, verification dominates the design effort in the multi-million gate ASICs,reusable Intellectual Property (IP) and System-on-a-Chip (SoC) designs. In thesefields, design reuse is considered to be more and more important. Design verificationestablishes the trustworthiness of a design, thus helps the design to be reusable.Although it is not the verification that creates profits finally, but verification assistsbuilding the correctly functioning design, which benefits the customers.
There are three approaches to verification. They are called black box, white boxand grey box. As their names suggest, black box verification faces a design withoutknowing its implementation details; white box verification has the insight to the designand could control the internal structure and implementation of the design; grey boxmethod is somewhere between the above two methods. The advantage of the firstmethod is that it shows whether a design implements the objective of a specification
2.6. FUNCTIONAL VERIFICATION OF HDL MODELS 13
regardless of the implementation. While, the second approach is tightly integratedwith a specific implementation and can not be adopted for other implementation of thedesign. Detailed knowledge is required to generate critical conditions and observe theresults. Once more, it is easy to prove that the design does not do what it is supposedto do by identifying the discordance during simulation. But no one can prove there isno discordance in the design. Still, designs could be made robust by verifications.
14 CHAPTER 2. BACKGROUND
Publish and SubscribeArchitecture Design 3
The great successful men of the world have used their imaginations. Theythink ahead and create their mental picture, and then go to work materializ-ing that picture in all its details, filling in here, adding a little there, alteringthis bit and that bit, but steadily building, steadily building.
- Robert Collier
In this chapter, based on the understanding of the publish and subscribe mechanism, anarchitecture is devised and explained to demonstrate how it works. The final versionis worked out for the prototype design after the analysis of the transmission methods.
3.1 Architecture Design
3.1.1 Original Design
The original architecture is obtained from the study of the publish and subscribemechanism.
As shown in Figure 3.1, this architecture consists of the following functional units:
• Data Descriptor Generation Unit
• Data Descriptor Absorption Unit
• Local Subscription List
• Local Description List
• Foreign Subscription List
• Foreign Description List
• Local Subscription / Publication Control Unit
• Foreign Subscription / Publication Control Unit
• LSFP Matching Unit
• LPFS Matching Unit
• Data Storage
• Egress Data Buffer
15
16 CHAPTER 3. PUBLISH AND SUBSCRIBE ARCHITECTURE DESIGN
app id host id category id world id sort id version id certain requirements ...
Table 3.1: Possible data descriptor content
• Ingress Data Buffer
• Network Interface
• Unit Control
The detailed description of the above functional units is as follows:Data Descriptor Generation Unit / Data Descriptor Absorption Unit DataDescriptor Generation Unit generates descriptors for the data required by the process.The descriptor may include the information shown in Table 3.1. This unit ensures thedescriptor is a common signature for the data within the system, such that other nodesin the system use the same descriptor for subscribing or publishing the same kind ofdata. Data Descriptor Absorption Unit removes the descriptor and leaves the puredata for the process. It is a reverse process of the data descriptor generation for a clearinterface between the agent and the process.
Local Subscription / Local Publication / Foreign Subscription / ForeignPublication List These lists keep records for the subscription and publicationannouncements from the current node or other nodes in the system. Local indicatesthat the origin of the announcements is the current process, while foreign tells thatthe announcements are from other nodes in the system. Subscription and publicationannouncements are stored separately.
Local Subscription / Publication Control Unit This unit deals with the announce-ments from the current process. It manages the Local Subscription / Publication Lists.Basic jobs include recognizing the type of the announcements, adding new subscriptionsor publications to each list and announcing publications and subscriptions to othernodes in the system.
Foreign Subscription / Publication Control Unit This unit is the counterpart ofthe above unit for the announcements from other nodes in the system.
LSFP Matching Unit LSFP is the abbreviation for Local Subscription Foreign Publi-cation. It is the engine for accepting subscribed data from the network. The engine worksby doing matching on the Local Subscription List and Foreign Publication List. For theincoming data, this unit decides whether it should be forwarded to the data storage unit.
LPFS Matching Unit LPFS is the abbreviation for Local Publication ForeignSubscription. It is the engine for writing the published data to subscribers of the datain the system. The engine works by doing matching on the Local Publication List andForeign Subscription List. For the outgoing data, this unit decides whether it should be
3.1. ARCHITECTURE DESIGN 17
forwarded to the network from the data storage unit.
Data Storage This unit stores the published data by the current node and subscribeddata from other nodes.
Egress / Ingress Data Buffer The units buffer the outgoing data and incoming datafor the agent.
Network Interface The interface enables the agents in the system connecting witheach other by the network.
Unit Control This is a virtual unit aiming at gathering information from each of theabove functional units, processing the information and reacting. The control unit canbe a central one or distributed in each of the units under control.
The Figure A.1 through Figure A.7 in Appendix A depict how this architecture realizesthe publications and subscriptions.
3.1.2 Redesigned Architecture
The redesigned architecture is shown in Figure 3.2. The publish and subscribe mecha-nism is first described using this functional block diagram, followed by the reason forthe simplified architecture.
After being generated by the process, data goes through the Data DescriptorGeneration Unit. This unit adds the descriptor to the data. Then the data is fed tothe Egress Data Buffer and gets published in this way. When a process requires certaindata during its operation, it expresses this demand by storing the data descriptor in theattached agent’s Local Subscription List. When the agent gets data from the network,it uses this list as a mask to filter the data. The subscribed data is then stored in theData Storage Unit. Process reads data from the Data Storage Unit.
Compared with the original design, the redesigned architecture saves the publica-tion announcements and does the subscription only locally. The decision comes fromthe analysis on the communication methods. In the original design, the communicationmethods include unicast, multicast and broadcast. It is necessary for the publisherto know the exact address of the subscriber if a unicast is adopted to do the datatransmission. However, if all the data transfer is realized by broadcast, then thepublication and subscription announcements are not essential.
Because of resemblances between multicast and the simplified publish and subscribe ar-chitecture as shown in Table 3.2 [13], the multicast seems to be more interesting thanbroadcast. And by examining the normal traffic on any network, we would see a largenumber of broadcasts. If assuming most nodes are interrupt driven, broadcast can placea large overhead on CPUs - CPU needs to decide whether to discard the incoming data
18 CHAPTER 3. PUBLISH AND SUBSCRIBE ARCHITECTURE DESIGN
Local
Subscriptions List
Local
Publications List
Foreign
Subscriptions List
Foreign
Publications List
Local Subscription
/ Publication
Control
Foreign
Subscription /
Publication
Control
Process
Data Storage
LSFP Matching
LPFS Matching
Da
ta D
escrip
tor
Ge
ne
ratio
n
Da
ta D
escrip
tor
Ab
so
rptio
n
Eg
ress D
ata
Bu
ffe
r
Ing
ress D
ata
Bu
ffe
r
Network Interface
Network
Agent
Announcement/Data
Announcement/Data
Unit Control
Figure 3.1: Agent’s architecture design
or not each time there is an interrupt. Multicasts on the other hand can drop data atthe Data Link layer when the host has no interest in the data received, and it is thisfeature that makes multicasting so appealing [14]. Actually, after careful analysis, it isobvious that the advantage of multicast can be seen only when certain layers of the net-work protocols are realized by software. However, in our design, the hardware helps theprocessor to deal with incoming uninterested data without interrupting the processor.The advantage of multicast diminishes. Another fact that needs to be noticed is when-ever the published data is needed by one or more subscribers, the traffic on the networkcaused by broadcast would be the same as those caused by multicast and unicast. In
3.1. ARCHITECTURE DESIGN 19
Local
Subscriptions List
Process
Data Storage
Da
ta D
escrip
tor
Ge
ne
ration
Da
ta D
escrip
tor
Ab
sorp
tio
n
Eg
ress D
ata
Bu
ffe
r
Ing
ress D
ata
Bu
ffe
r
Network Interface
Network
Agent
Subscription/Data
Unit Control
Figure 3.2: Redesigned architecture
Multicast Simplified publish and subscribearchitecture
multicast group subscribers to a certain sort of dataa node can belong to more than one one subscriber can subscribe to moremulticast group than one data sortsthe node that multicasts data does not publisher does not know the subscribers toknow the destination node’s identity the data it producesthe destination node does not know the subscribers do not know the publisher of thesource node’s identity subscribed sort of data
Table 3.2: Resemblances between multicast and the redesigned architecture
20 CHAPTER 3. PUBLISH AND SUBSCRIBE ARCHITECTURE DESIGN
other words, using broadcast does not burden the network with more traffic. But usingbroadcast could simplify the architecture to a large extent - since the announcements ofpublishers and subscribers are left out, the Local Publication List, Foreign SubscriptionList, Foreign Publication List, Foreign Subscription / Publication Control Unit, LSFPMatching Unit and LPFS Matching Unit all disappear from the original design. Remov-ing announcements from the network also help to reduce the network traffic. Althoughbroadcast does not guarantee fail-proof data transmission, considering the high reliabil-ity of Ethernet LAN adopted in the target system, broadcast is appropriate. Besides,problems caused by unreliable transmission can be remedied at higher layers.
3.2 Conclusions
The final H/W architecture design of the publish and subscribe mechanism is devised inthis section as shown in Figure 3.2. The simplification done to the original architectureresides on two points. One is the choice on broadcast to be the transmission method,which is based on the high reliable LAN connection. The other is the hardware processingof Ethernet frames which makes broadcast cost as much as multicast and unicast for aprocessor. Considering the situations in the case study that the outputs of the down-sampling units need to be fed to each beamforming units, broadcast also brings highefficient data transmissions. Compared with the original architecture, the final one savesa lot of resources for the publication / subscription management and reduces the designcomplexity greatly. The traffic on the network is also reduced due to the disappearanceof the announcements. For the time being, it is a good direction for prototyping thepublish and subscribe mechanism.
Modeling and Results 4We shall escape the uphill by never turning back.
- Christina G. Rossetti
This chapter describes the modeling and results of the publish and subscribe architec-ture described in the previous chapter. Considering this is the prototype hardwareimplementation , some assumptions are made regarding the practical issues. A
functional model is then constructed for verifying the hardware architecture of the pub-lish and subscribe mechanism. Main components of the architecture are explained oneby one. Then a testbench is built to do the verification. Foreign Language Interfaceprovided by Modelsim is adopted to create the published data and record the subscribeddata. The prototype design will be implemented in Xilinx embedded system developmentenvironment. In order to explore the IP block development method in such an environ-ment, the transmitting part of the architecture is constructed by Xilinx tools. It is a littledifferent from the transmitting channel structure for the verification due to the reasonthat certain logic blocks need to be constructed to connect different Xilinx’s IP blocks tomake the design work. This part of design could totally be synthesized. Bus FunctionalModel for IBM PowerPC’s Processor Local Bus is applied to build the testbench. Thegoal is to investigate the verification method for IP blocks that are attached to PowerPCPLB bus. The research is useful for developing a complete synthesizable agent design inthe near future.
4.1 Functional Block Mapping
In order to realize the architecture described in the preceding chapter, different func-tional blocks need to be mapped onto working functional units. ’Working’ emphasizesimplementation detail that the functional units should present. For example, the LocalSubscription List in the final architecture should be substituted with a memory blockand the rule to manage it. Some assumptions are made according to some pragmaticcircumstances. Then the system architecture with working functional units is derivedand explained.
4.1.1 Assumptions
• There is only one application in one node. In other words, one agent only servesone application. Thus agents do not need to provide special ports for differentapplications.
21
22 CHAPTER 4. MODELING AND RESULTS
• A local area network is adopted to build up the network. The LAN is of highreliability and the agent provides ’best effort’ service.
• There are three categories of data - context data, persistent data and periodicaldata as introduced in Section 2.3. For the prototype design, two of them - contextdata and periodical data are chosen to be implemented. Extension of the persistentdata support is discussed in Section 5.2.
• User has the control over all the data, assigning certain nodes to hold the contextdata, designating data with different descriptors and making sure the descriptorsare recognized in the whole system.
• For the first implementation, the Data Descriptor Generation and Data DescriptorAbsorption as displayed in Figure 3.2 are carried out by the process. The datasent to the agent is of fixed length. Short data will be padded with 0s at the end.
4.1.2 Data Descriptor Format Design
Assigning a descriptor for the data is an essential concept in the publish and subscribemechanism. It provides an approach for the nodes in the system to recognize datathey want and they produce, thus makes it possible for each node to subscribe andpublish data. During the subscribe process, data descriptors are stored in the LocalSubscription List. During the publish process, data descriptors are packed as headerswith data to be transmitted.
As discussed in Section 2.3, categories, worlds, sorts and key fields can be used todifferentiate different data. So, they are chosen to make up descriptors. They are thebasic index in the Local Subscription List, and they are transferred along with databeing published.
After the data descriptor content is fixed, the length of each component needs to bedecided. The length decision is very important. Because it determines the number ofcategories, worlds, sorts and key fields that an agent could support. And it also relatesto other problems such as the methods to organize the subscription lists, the types ofmemories to store the data and so on.
Before making decision on the length of categories, worlds, sorts and key fields, we takea look at the conditions and requirements we will cope with, as shown in Figure 4.1.We will analyze these four factors respectively.
Resource provided by the development boardSince the implementation will be tried out on the Xilinx ML10G board, the availableresources need to be examined. Table 4.7 and Section 4.3 present summaries of theresources. The figures that are of most interests are the memory resources. On-chipmemory includes 290 Kb distributed RAM and 1584 Kb block RAM. Since there is a184-pin DDR SDRAM socket on the board, DDR SDRAM could be used as the off-chip
4.1. FUNCTIONAL BLOCK MAPPING 23
Resource provided
by the development
board
Requirements on
worlds, sorts and
key fields number
obtained from
reference designs
Subscription Lists
design
Data base design
?
A
B
C
D
Figure 4.1: Elements that influence the data descriptor format design
data storage device.
Requirements on worlds, sorts and key fields number obtained from referencedesignsChess has set the following reference figures on the context data.
• Number of descriptors: 4676
• Data size : 96 bits - 1536 bits
• Total amount of the context data: 1566 K bits.
Also, according to the statistics of real world implementations, the major part of thedata (e.g. 90%) tends to be the periodical data, while the number of descriptors forperiodical data takes up only a small part, say, 10%.
Subscription List DesignThe subscription lists design also influences the data descriptor format design. Thesubscription List is crucial in the agent. The application expresses its interests in certaindata by adding to the list the data descriptor and does query on the list to check if thewanted context data is in the data storage device . The agent uses Local SubscriptionList as a filter to drop unwanted periodical and context data. The list also serves as acue for indexing the wanted context data. Since the subscription list will be checked bythe agent and process, it should do the matching as fast as possible. Two methods torealize the matching come into sight.
Content Addressable Memory The difference between CAM and traditional memory,such as RAM, is that the memory uses the content to store and retrieve information instead of addresses [15]. This method is attractive for this application since using thecontent to address data corresponds to using descriptors as index to search data. Butthis method costs more resources than the traditional one. Trade-off needs to be made
24 CHAPTER 4. MODELING AND RESULTS
between the data searching speed and the resources on the chip. Another fact worthnoticing is that Xilinx provides CAM IP cores based on different technologies such asSRL16E-based CAM, Distributed SelectRAM-based CAM, and Block SelectRAM+memory-based CAM, targeting different applications. The width and depth of theseCAM IP cores can be found in Table 1 of [16].
Hash Coding It is a method for using the traditional memory to realize content-addressable access [17]. Before the data is stored into the memory, it goes through hashcoding. Hash coding uses the entire or part of the data to calculate the address to storethe data. It is the same for data reading process. Searching time will be deterministic,which means that it will not increase as the contents of the memory grow. The problemis that there may be more than one data item being mapped to the same address afterthe hash calculation. Solutions for the collided data might be a) searching seriallyfor the next empty entry; b) using double hashing; c) using another memory for thecollision data.
Context Subscription List Storage Design Because of the amount of context datadescriptors, it is unlikely to use CAM for storing the context subscription list. Anotherreason for this is that it will cost too much resources on the FPGA, making otherlogic impossible to fit in. For example, a CAM with 4096 depth and 16 width willcost 4096 RAM16X1, and the access time will be longer if the CAM is too large. Hashcoding might be a solution, but the collision problem causes troubles for hardwareimplementation. Will the traditional location-addressable memory help solving theproblem? If we consider the context data descriptor as an address, it will be quite easyto search certain context data by just using its descriptor. For the prototype design,4096 context data descriptors are supported. Thus the context data subscription listis built as a table of 4096 entries. The content of each entry consists of two bits. Oneis a empty sign and the other is a subscription sign. The empty sign indicates theavailability of the data in the context data storage, and the subscription sign showswhether the corresponding context data is subscribed.
Periodical Data Subscription List Storage Design Since the number of periodical datadescriptors is much less than that of context data and Xilinx provides CAM IP cores,the periodical data subscription list will be implemented by CAM. For the prototypedesign, the number of supported periodical data descriptor is 256. The selected CAMis with 24-bit width and 256-entry depth, which is a typical configuration for CAM byXilinx.
Data Base DesignContext Data Storage Design Context data should be stored for future usage. Theorganization of the context data memory is straightforward. It is divided into equallength blocks(192 bytes). Each block could hold the longest context data content and isindexed in the same way as the context data subscription list. There are totally 4096entries in the context data memory. On-chip memory is not enough for the design,off-chip memory DDR SDRAM is adopted as the context data storage. The address of
4.1. FUNCTIONAL BLOCK MAPPING 25
each block can be calculated by its address in the context data subscription list and thelength of each block.
Periodical Data Storage Design For periodical data, according to its volatile character-istic as mentioned in Section 2.3, we store them in FIFOs. After the data is publishedby a publisher, it is sent to the network. Every agent gets the data, but only thesubscribers store the data to their FIFOs. Subsequently, the data is read by the processand disappears from the agent.
Taking into account of all the above factors, the data descriptor format for the contextdata is designed as shown in Table 4.1 and the data descriptor format for periodicaldata is as shown in Table 4.2.
2 bits a bits b bits c bitscategory id world id sort id key
Table 4.1: Context data descriptor format. a,b,c represent numbers such that a + b +c = 12. Total supported context data descriptor number is 212, the application coulddecide how many of bits are assigned to world id, sort id and key.
2 bits 6 bits 8 bits 8 bitscategory id world id sort id key
Table 4.2: Periodical data descriptor format. 26 worlds, 28 sorts, 28 key values canbe set. But only 256 combinations of worlds, sorts and key values are supported. Theapplication can decide if it wants 1 world, 28 sorts, and one key value per sort or 2 world,24 sorts, 8 key values per sort, or other combinations.
The descriptor format is fixed now. The detailed structure for the context datasubscription list and periodical data subscription list are defined as shown in Table4.3 and Table 4.4. The context data subscription list is a table with 212 entries.Context data descriptor bits form the index to visit the table, that is, 12-bit contextdata descriptor is now the address for accessing the context data subscription list.When the application wants to subscribe to context data A, which is presented by0000 0000 0001 for instance, it sets subscription sign of Address 0000 0000 0001 in thecontext data subscription list. When the agent receives data with 0000 0000 0001, andwants to check if the application has subscribed to it, it checks the subscription bitof Address 0000 0000 0001 in the context data subscription list. If it is 1, then theincoming data is subscribed and should be stored in the context data storage, otherwise,the agent will drop the data. After the agent moves the data into the context datastorage, it sets the empty sign bit of the context data subscription list. The applica-tion will use this bit to decide whether the subscribed data is in the context data storage.
26 CHAPTER 4. MODELING AND RESULTS
1-bit 1-bitempty sign subscription sign
Table 4.3: Context data subscription list item
6-bit 8-bit 8-bitworld id sort id key
Table 4.4: Periodical data subscription list item
Periodical data subscription list is realized by the CAM IP with width 24-bit, depth 256.The periodical data descriptor is the data content of this CAM. When the applicationsubscribes to periodical data, for example, B, which is presented by 11000 00000 000000000011, it writes to the CAM the content 11000 00000 00000 0000011. When the agentwants to know whether the data with 11000 00000 00000 0000010 is subscribed by theapplication, it supplies 11000 00000 00000 0000010 as the input. The CAM will assertthe match signal if the data descriptor is found in the periodical data subscription list.Or else, the agent drops the data and continues analyzing the incoming data.
4.1.3 Frame Structure Design
As introduced in Section 2.4, the Ethernet frame is the object of transport in anEthernet network. In order to send data from publishers to subscribers, data needs tobe packed in the Ethernet frames. IP Header and UDP Header are also added for futureextension on the design, such as supporting transport layer transmission check by UDPchecksum.
Thanks to the auto-generation of MAC Header’s Preamble, Start Frame Delimiter andFrame Check Sequence by Xilinx’s MAC IP Core, the frame content to be produced byapplications only need to include the rest of MAC Header, IP Header, UDP Header,data descriptor and data. How long should the frames be? It greatly depends on howlong the data content is. For the prototype design, the supported data content lengthis 192 bytes, which could hold the longest context data.
The frame content is shown in Table 4.5. Total Length value in the IP Header andLength value in the UDP Header are fixed, because the data field is chosen to be of thesame length. UDP Header’s Checksum value is filled with 0, meaning this field will notbe checked. The setting of other fields is as shown in Table 4.5. Header Checksum inthe IP Header is thus fixed, since the IP plus UDP headers are identical for each node.Fixed values make it easy for the process to add these headers ahead of the data.
4.1. FUNCTIONAL BLOCK MAPPING 27
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Destination Address(FF FF FF FF FF FF )
Source Address
Length or Type (0800) Version (4) Internet Header Length (5) Type of Service (00)
Total Length Identification (0)
Flags and Fragment Offset (4000) Time to Live (FF) Protocol (11)
Header Checksum Source IP Address
Destination IP Address (FF FF FF FF)
Source Port (0)
Destination Port (0) Length
Checksum (0) Category id Context Data ID Padding
Periodical Data ID
Data Length Data
Data (continued)
Table 4.5: The frame that is used for the publish and subscribe architecture follows theTCP/IP frame structure as shown in Figure 2.4. Because of the current application, mostof the TCP/IP frame field values are fixed. Fixed value can be seen in the brackets aftereach field name. Data descriptor information is added before the data. Data length isan optional field indicating the real length of the data field, because data can be paddedwith 0s to reach the same length.
4.1.4 Functional Units Description
After the above analysis, a detailed design of the agent is worked out and shown inFigure 4.2. Three operations are supported. They are subscribe, write and read. Eachof these operations is explained as follows.
Subscribe A node uses subscribe to express its interests in certain data from othernodes in the system. The subscribed data descriptors are kept inside each node.For context data descriptors, they are stored in Context Data Subscription List.For periodical data, they are stored in Periodical Data Subscription List. When anapplication needs certain data to process, it uses subscribe to express the require-ment. This requirement includes the specification of the category, world, sort andkey field of the data. Because the agent and the processor are connected by thesystem bus, each of the functional blocks that need to be accessed by the processorhas an entry in the memory map for the processor. Processor can use differentaddresses to visit different blocks in the agent. The agent deals with subscribe byfirst splitting the data requirement by the category. When the required data is ofthe context data type, the processor selects the Context Data Subscription Listby its address in the memory map, then the descriptor is fed to the Context DataSubscription List. The agent then changes the subscription sign bit from 0 to 1to indicate that the node has subscribed to the data with this descriptor. When
28 CHAPTER 4. MODELING AND RESULTS
the data is of the periodical data type, the agent writes the data descriptor into aPeriodical Data Subscription List by using the Periodical Data Subscription List’saddress in the memory map as the address and the descriptor as the data.
MAC
1000BASE-T PHY
Ingress Data Buffer
(FIFO)
Egress Data Buffer
(FIFO)
Processor
Context Data
Subscription List
(RAM)
Periodical Data
Subscription List
(CAM)
Local Subscription List Frame Processor
Frame Analysis Logic
Frame Analysis Buffer
(Dual Port Memory)
Data Storage
Periodical Data Buffer
(FIFO)
Context Data Storage
(DDR SDRAM)
System Bus
Network
Figure 4.2: Mapped architecture
Write When the processor produces data, it puts the data in the frame as shown inTable 4.5. Then it sends the frame to the Egress Data Buffer and the networkafterwards.
Since the frame is broadcasted over the network, each agent receives it.First, the MAC Header is detached by the MAC, and a CRC Check is done bythe MAC, too. IP Header and UDP Header are not processed as discussed. Datawith its descriptor is then stored in the Frame Analysis Buffer. Frame analysislogic extracts the data descriptor from certain address of the dual port mem-ory. It then bases on the category id to decide which subscription list to check with.
If the data belongs to the context data type, it uses the rest of index bitsto address the subscription sign bits. If that bit indicates the data is subscribed,then the analysis logic moves the data from the Frame Analysis Buffer to the Con-text Data Storage and sets the empty sign bit of the Context Data Subscription
4.2. FUNCTIONAL MODEL AND TESTBENCH CONSTRUCTION 29
List.The context data with the same descriptor will overwrite the existing data.If the subscription sign shows that the data is not subscribed, the frame in theFrame Analysis Buffer will be replaced by new data received from the network.
If the data belongs to periodical data type, the analysis logic feeds thedata descriptor to the CAM. The CAM will produce a match signal if the datawith this descriptor is needed. If this is the case, the Frame Analysis Logic movesthe data from the Frame Analysis Buffer to the Periodical Data Buffer. Otherwisethe data in the Frame Analysis Buffer will be discarded.
Read This operation is sent by the processor to get data that is needed for the currentprocess. read function is composed of two operations: query and get.
For the context data type, it needs to consult the Context Data Subscrip-tion List. There are three situations that need to be considered.
1. if there is no match → send the no match information back to the processorand let it decide what to do next;
2. if there is a match but it is empty → send the empty information back to theprocessor and let it decide what to do next;
3. if there is a match and it is not empty → get the address of the data and readfrom the Context Data Storage.
Because of the possibilities of common access from the processor and the networkto the same memory blocks such as the Context Data Subscription List, theContext Data Storage, arbiters need to be set in front of the data and address busof the memory. It assigns the priority for the read and write operations from theProcessor and the Frame Analysis Logic.
For the periodical data, a query is unnecessary, it just reads from the PeriodicalData Buffer. However, subscribed periodical data with different descriptors maycome in an order that is not the same as expected by the process. The process willdeal with this problem.
4.2 Functional Model and Testbench Construction
The design devised in the above sections needs to be verified before implementing on anFPGA. As introduced in Section 2.6, to verify a design is very important for IP blockdesign. A Functional model is then built for this reason. The model helps to ignore theimplementation details for the moment and concentrate on building a working model ofthe design. The behavior of the model could also be studied.
4.2.1 Function Block Realization
Each agent model is mainly composed of three parts – the transmitting channel, thereceiving channel and the MAC, as shown in Figure 4.3.
30 CHAPTER 4. MODELING AND RESULTS
mac_beha
rx_mac_fifo
receive_control
write
p_fifo c_ddr cam c_dpm
f_dpm
Valid signal
retrieval
Good/bad frame
generator Address
generator
Tx/Rx Controller
System bus
read subscribe subscribe read
tx_mac_fifo
control
Figure 4.3: Agent’s functional model architecture
• Transmitting channelA transmitting MAC FIFO is the central part of the channel. The FIFO statuspartially controls the frame transmission. If the FIFO is not full, frames can bewritten to this FIFO. Besides, if the FIFO is not empty and the current node is intransmit state1, frames could be read out from the FIFO and sent to the network.
• Receiving ChannelCompared to the transmitting channel, the receiving channel is more complex. Itcontains a receiving MAC FIFO, frame analysis logic and memories for storing thesubscription list and received subscribed data. The receiving MAC FIFO storesthe good frame and drops the bad frame2 by resetting the head of the FIFO afterobtaining a bad frame signal. Similar to the transmitting MAC FIFO status’
1Transmit state will be explained in MAC behavioral Model part2Good frames are those frames that arrive without being damaged, bad frames happen when there
are collisions during the transmission of the frame.
4.2. FUNCTIONAL MODEL AND TESTBENCH CONSTRUCTION 31
functionality, the receiving MAC FIFO’s status controls the data flow from thenetwork to the processor. Its full signal prevents the network from writing to it,and the empty signal tells the frame analysis logic that it has to wait for newframes to arrive.
The not-empty signal of the receiving MAC FIFO and the complete signal of theframe analysis logic initiate each round of the frame analysis. First, the categoryID is obtained from certain location of the frame which is stored in a frame analysisbuffer. If the category ID does not belong to either defined category : periodicalor context, the frame analysis stops here. If this is not true, the analysis continues.Based on the category ID, the rest of the data descriptor is then extracted from theframe and checked with the corresponding subscription list. Flags are set accordingto the result of the query. If flags show that there is no match in the subscriptionlist, the analysis ends. Otherwise, the data in the frame analysis buffer is movedto the desired storage memory - for periodical data, it is the periodical data FIFO,for the context data, it is the context data DDR SDRAM. Once the movement isfinished, the current analysis is finished. In the functional model, all memory blocksare abstracted as arrays in VHDL. CAM and FIFO have additional functions forimitating their special characteristics.
• MAC Behavioral ModelIn order to construct a testbench to test the functional model, a MAC(MediaAccess Control) model is included in each node. This MAC model provides an8-bit port for communicating with other MACs by means of an 8-bit bus.
If no transmission is taking place at the time being, the particular node can trans-mit frames. If two nodes attempt to transmit simultaneously, a collision occurs.This collision is detected by all participating nodes. After a random time interval,the node that collided attempts to transmit again. If another collision occurs, eachtransmitting node waits for another random time interval. The rule is similar tothat in CSMA/CD as introduced in Section 2.4. The random waiting time afterthe collision is not increased step by step as the means adopted in IEEE 802.3CSMA/CD standard.
The transmit/receive controller of this MAC functional model is realized by a statemachine. Three states of the state machine, as shown in Figure B.1, are sense state,wait state and transmit state.
Besides the transmit/receive controller described above, the MAC functional modelconsists of a valid signal retrieving block and a good / bad frame signal generator.A bad frame signal is generated for those received frames during whose transmissionthere are collisions. A good frame signal is produced for uncollided frames. Validsignal retrieval signals the beginning and ending of a frame and provides the wayto feed data to the receiving MAC FIFO.
32 CHAPTER 4. MODELING AND RESULTS
4.2.2 Testbench Design
The constructed testbench for the functional model is displayed in Figure B.4. Thereare 6 nodes in total. Four of them mainly write data to the network, and two of themmainly read from the network and do the analysis. In other words, the testbench iscomposed of four publishers and two subscribers.
In order to test the agent’s functionality, an interactive model needs to be built. Itshould be able to do the subscribe, write and read operations described in Section 4.1.4,representing an application’s behavior as shown in Figure 2.2. The model should alsotranslate the above operations to the correct address bus and data bus signals that aresupplied to the agent. Foreign Language Interface (FLI) is adopted to construct such acomponent for the testbench.
FLI functions are C programming language functions that offer procedural access toinformation within Model Technology’s HDL simulator, vsim [18]. Applications writtenby users can make use of these routines to traverse the structure of an HDL design,obtain information from and set the values of VHDL objects in the design. In this way,a simulation could be controlled by the user.
The flowchart for the read process is shown in Figure B.2 and the one for the writeprocess is shown in Figure B.3. For the node that mainly reads from the network, the Cmodel first subscribes to particular types of periodical data and context data. Then itchecks the periodical data FIFO to see if it is empty. When the FIFO is not empty, theC model sends out the read enable signal and asserts the periodical data FIFO addressto the address bus. It then gets the data from the data bus. If the periodical data FIFOis empty, it continues to check if certain context data is subscribed and exists in thecontext data memory - read the data from the memory if that is the case, or goes backto check periodical data FIFO. For the node that mainly writes data to the network,the program first checks if the transmitting MAC FIFO is full. If this is the case, itcontinues checking. Or else, it reads from a text file the frame data and writes to thetransmitting MAC FIFO.
The frames instances that are fed to the write process are designed in the followingway. Since the main function of the agent is analyzing frames and determining differentroutes to go according to the data descriptor. If the frame belongs to either data category,the agent goes on checking if the sort is subscribed by examining the subscription list.Otherwise, it loads the next frame. If the data sort is one of the subscribed sorts, theagent transfers the data to the destination memory. The test frames should be able tofind out if the agent acts correctly under different situations. As displayed in Figure 4.4,for a signal frame, different cases include:
• It is subscribed periodical data;
• It is periodical data, but is not subscribed;
• It is subscribed context data;
4.2. FUNCTIONAL MODEL AND TESTBENCH CONSTRUCTION 33
• It is context data, but is not subscribed;
• It is neither periodical data nor context data.
Therefore, these frames should all be included in the test frames. The testing order isalso important. As introduced in Section 2.6, this verification method belongs to thewhite box kind. During the frame analysis, different signs are set and reset accordingto the subscription situation. In order to check if these signs are set or reset correctly,the order should be considered. The number of ways of obtaining an ordered subset of2 elements from a set of 5 elements are totally 20. Cases that two subscribed periodicaldata or two subscribed context data are in a row should also be considered. Underthis situation, the data sorts may be the same or different. That is, although the dataare all subscribed periodical data, test frame should cover the circumstance that twosubscribed periodical data that belong to the same data sorts are received in successionand two subscribed periodical data belonging to different data sorts are received insuccession. Taking into consideration of all the above situations, 27 frames are designedand be read by the process described in Figure B.3 in the representative order duringthe simulation.
S
periodical data
context data
data that does not
belong to one of
the above
categories.
subscribed data
not subscribed
data E
Figure 4.4: Test frames design
A testbench with only two nodes, one mainly publishing data and the other subscribingdata is also built for the verification, considering this is the situation that there is nocollision on the network. Experiments are also done with 10 publishers and 1 subscriber.Since each node is a big consumer of memory during simulation and taking into accountof the limited amount of memory on the simulation machine, the testbench with morethan 11 nodes has not been constructed.
4.2.3 Result
The agent works well with the frame series in the testbench. This is investigated bytracking the signals in the simulator after the simulation. After receiving a frame fromthe network, the agent can be seen to store the frame to the receiving MAC FIFO.When the frame analysis logic finishes one round of analysis and the transmitting MACFIFO is not empty, the frame in the tail position of the receiving MAC FIFO is observedto be moved to the frame analysis dual port memory. The flags described in Section
34 CHAPTER 4. MODELING AND RESULTS
4.2.1 are set correctly depending on the data descriptor. The context data memory andthe periodical data memory contents show that the subscribed data has been writtento the correct memory. The output files, one for recording the subscribed context dataand the other for recording the subscribed periodical data, also demonstrate the correctfunctioning of the agent.
When there are only two nodes in the system, one sending data and the otherreceiving data, there is no collision on the network. This is the ideal situation. Underthis situation, from the waveform C.1, we can see that if the data generated by theprocessor is faster than the network capacity on average, no matter how deep thetransmitting MAC FIFO is, it will be fed up finally. Several other settings of the FIFOdepth and the network capacity tell the same fact as shown in Figure C.2 to FigureC.6. Since the depth of the transmitting MAC FIFO does not help solving the problem.The data generation rate that is supported by the system should be applied during thereal applications. In Section 5.2, the author describes the way to obtain the highestsupported data generation rate of the system. A theoretical value could be calculatedin the following way.
data generation rate of the process =maximum throughput of the network
number of transmitting nodes in the system(4.1)
The maximum normalized throughput of the Ethernet LAN standard is approxi-mately 1/(1 + 6.44a) [19], where a = tpropR/L. 3 If assuming v = 3 × 108 meter/s ,R = 1 Gbps, L = 256 bytes and varying d between 1 meter to 100 meters, we couldcalculate the requirement on the data generation rate of the process for the case inSection 1.1. Given Equation 4.1 and the above assumptions, the requirement rangesfrom 15.5 Mbps to 7.6 Mbps. This also reflects the outgoing data throughput of the agent.
If the network throughput is higher than the incoming data throughput of an agent,the receiving MAC FIFO will get full, too. This helps setting up the real hardwareimplementation requirement on the incoming data throughput of the agent. Using thesame assumptions as the above, the requirement on the incoming data throughput ofthe agent ranges from 990 Mbps to 488 Mbps, in other words, the agent should processthe incoming data at the speed from 990 Mbps to 488 Mbps.
The idle time of the network is observed in the testbench with 10 transmitting nodesand 1 receiving node. For 10 transmitting nodes, according to the above estimation, thetheoretical highest data generation rate ranges from 99 Mbps for 1-meter LAN to 48Mbps for 100-meter LAN. Supported data generation rates have been supplied to theagent in the testbench. The following table shows the idle time of the network in relationto the different data rates.
3tprop is the propagation delay, tprop = d/v, d is the distance of two nodes in the network, v is thespeed of light in the medium. R is the bit rate of the network. L is the average packet size transmittedon the network.
4.3. EXPLORATION ON IP BLOCK DEVELOPMENT METHOD IN XILINX’SEMBEDDED SYSTEM DEVELOPMENT ENVIRONMENT 35
Data generation rate 75 Mbps 50 Mbps 35 Mbps 25 MbpsIdle time of the network 30.88% 55.11% 66.37% 75.53%
Table 4.6: The idle time of the network under the different data generation rates
4.3 Exploration On IP Block Development Method in Xil-inx’s Embedded System Development Environment
Since the prototype design will be realized in Xilinx’s embedded system developmentplatform, it is necessary to examine the IP block’s development method in this environ-ment. The difference compared to other IP development methods is that many toolsintegrated in Xilinx’s development platform can be applied to assist complex IP cores’designs. In this research, the transmitting channel in the agent is constructed. It is usedfor seeking the ways to:
1. adding new peripherals to the IBM Processor Local Bus of PowerPC integrated inXilinx’s FPGA
2. generating memory blocks and other IP cores by Xilinx’s Core Generator
3. using the transmitting and receiving MAC FIFOs and MAC core
4. verifying the design by means of IBM Bus Functional Model
The transmitting channel is also a starting point for the whole synthesizable design’sdevelopment. In other words, it is the transmitting channel in the final synthesizablerealization.
4.3.1 Embedded System Development Platform
Hardware The design will be realized on a Xilinx ML10G development board. Theresource of this board is described as follows [20]:
• XC2VP20 device soldered to the board. The resources description ofXC2VP20 is shown in Table4.7.
• 50-Ω transmission lines for all clock and data lines
• Dell Laptop power supply cable
• 10/100/1000Base-T Ethernet PHY
• On-board user interface with LCD screen
• RS232 serial ports
• DDR-SDRAM DIMM socket for expandable memory capabilities
• 68-pin connector for interfacing to PC
• System ACE Compact Flash capabilities for loading configurations
• Xenpack and HMZD connectors for XAUI applications
36 CHAPTER 4. MODELING AND RESULTS
CLB ( 1= 4 slices Block
RocketIO PowerPC =max 128 bits SelectRAM+ 18X18 Bit Maximum
Transceiver Processor Logic Max Distr Max Block Multiplier User
Device Blocks Blocks Cells Slices RAM (Kb) RAM (Kb) Blocks DCMs I/O Pads
XC2VP20 8 2 20,880 9,280 290 1,584 88 8 564
Table 4.7: Resources of Virtex II Pro 20
Software The available software for developing the system is as follows:
• Xilinx Platform Studioan integrated environment for generating the software and hardware specifi-cation flows for an Embedded Processor system. It includes many tools, suchas Library Generator, GNU Compiler Tools, Platform Generator, SimulationModel Generator, Makefile, System ACE and so on.
• Third party tools
– Mentor Graphics HDL Designer– Model Technology ModelSim
4.3.2 Implementation of the Transmitting Channel
4.3.2.1 Block Description
The prototype hardware implementation of the publish and subscribe mechanism isgoing to be delivered as an IP core that connects to a processor’s system bus. Sincethe development hardware platform is the Xilinx’s Virtex II Pro development board,the interface is chosen to be the PowerPC Processor Local Bus (PLB) interface. Thisinterface can be generated by Xilinx Platform Studio (XPS). As can be seen from [21],the PLB bus interface provides a wide range of options for users’ applications. Choicescould be made during the generation of the PLB peripherals.
The transmitting channel’s structure is a little different from that in Figure 4.3. Becausedifferent IP blocks need certain signals to get working and certain logic blocks are builtfor this reason. As shown in Figure 4.5, a frame dual port memory is connected to thebus. It is for the temporary storage of the current frame. The address for this dualport memory has to be assigned during the generation of this peripheral. The dual portmemory is generated by Xilinx Core Generator. One port of the dual port memory isconnected to the PLB IP interface, the other port is connected to Xilinx transmittingMAC FIFO. The Xilinx transmitting MAC FIFO IP core’s client side interface is XilinxLocal Link Interface [22] who needs two signals indicating the starting and ending ofa frame. They are generated by the data valid signal produced by control logic block.The control logic block also realizes the hand-shaking between the agent’s transmittingchannel and the processor. When the processor finishes sending a frame, it sets a signbit for moving the data to the transmitting MAC FIFO. When the transmission finishes,the sign bit is reset by the transmitting channel. The transmitting MAC FIFO has a
4.3. EXPLORATION ON IP BLOCK DEVELOPMENT METHOD IN XILINX’SEMBEDDED SYSTEM DEVELOPMENT ENVIRONMENT 37
standard interface to connect to the Xilinx MAC IP core. It is the same for the receivingMAC FIFO. The transmitting channel together with the testbench is shown in Figure4.5.
PLB IPIF
Frame dual port
memory
To Local Link
Interface
MAC tx FIFO
MAC
Local Link To
MAC rx FIFO
Control
frame.bfl
receiving part
(to be finished)
Figure 4.5: The transmitting channel with testbench
4.3.2.2 Testbench Construction
Generally, there are two approaches to certify the correct functionality of a hardwarecomponent that has a bus interface. One is to create a test bench. The other is togenerate a larger system with other working components producing or responding tobus transactions. Creating a test bench is not an easy job - defining the connectionsand test vector for all different combinations of bus transactions will cost a lot of designeffort. Creating a larger system including the design under test requires, on the onehand, describing the connections done to the device under test and on the other hand,programming other components in the system to interact with the device under test.To realize these, creating some code, compiling it, storing it in some memory for thecomponents to read and generate the right bus transactions are always involved[23].
Bus Functional Simulation simplifies the verification of hardware components thatattach to a bus. It provides the ability of generating bus stimulus without the need ofgoing through the previously described approaches.
A Bus Functional Model(BFM), a Bus Functional Language and a Bus Func-tional Compiler are provided by Xilinx’s Bus Functional Simulation. The BFM can
38 CHAPTER 4. MODELING AND RESULTS
be generated in Xilinx Platform Studio(XPS) when the component that attaches toa system bus is generated by the tools in XPS. Codes need to be written in the BusFunctional Language in a .bfl file. Then the Bus Functional Model could use thecomplied codes to generate bus stimulus as specified in the codes during the simulation.The testing for the component can be done quite easily. The procedure is as follows:
1. Compile the simulation HDL files
2. Load the system into the simulator
3. Initialize the BFM
4. Create a waveform for examining the interesting signals
5. Provide the clock and reset stimulus to the system
6. Run the simulation
The above procedures can be executed in Modelsim simulator as shown in .do file inAppendix D.
The simulation result is shown in Appendix E. The result shows that the com-piled .bfl file is written correctly to the dual port memory and the frame is moved to thetransmitting MAC FIFO after the processor sets the start sign. Then by incorporatinga connection unit, which has a good/bad frame generator, the output of the receivingMAC FIFO is as expected.
4.4 Conclusions
Based on the assumptions in Section 4.1.1, the final architecture is devised in moredetail in this chapter. Data storage devices are chosen to be the DDR SDRAM for thecontext data and FIFO for the periodical data. Local subscription lists are going tobe realized by a CAM for the periodical data type and a look-up table for the contextdata type. The data descriptor format has been designed. The structure of framesthat are transmitted and received by the agent is devised to include Ethernet Header,IP Header, UDP Header, the data descriptor and the data. A functional model ofthe detailed architecture design has been developed to do the verification. The modelbehaves as an agent, as described in Section 4.1.4. The testbench has been constructedto verify the correctness of the functional model. Foreign Language Interface is adoptedto create a stimulus module connecting to the agent. The module is written to beable to simulate read, write and subscribe operations of the process. It also generatesbus transactions for the agent. The module in the transmitting node reads from textfiles the frame sequences that have been designed. In the receiving node, the modulereads subscribed data from the memory and stores them to a text file. The modulealso does the subscription by reading in the subscribed data descriptors from a textfile. Testbenches have been built with several agents and their stimulus modules. The
4.4. CONCLUSIONS 39
simulation results show that the agent could send out data and store the subscribeddata to the correct data storage based on the analysis on the data descriptor.
In this chapter, the IP block development method in Xilinx development environ-ment has also been explored by building the synthesizable agent’s transmitting channel,since the prototype architecture design of the publish and subscribe mechanism willbe first realized as an IP core in the Xilinx embedded system development platform.The BFM module is investigated for certifying the functionality of IP block attached tothe PowerPC PLB bus. It is studied and tried on the transmitting channel testbench.The result shows that BFM generates correct bus signals for the agent as indicatedby the compiled .bfl file. This research helps the future prototype implementation andverification work.
40 CHAPTER 4. MODELING AND RESULTS
Conclusions 5As we gain knowledge, we do not become more certain, we become certainof more.
- Ayn Rand
In this thesis, the publish and subscribe mechanism is studied. The H/W architectureof the mechanism has been devised in Chapter 3. The HDL functional model hasbeen built to verify the architecture. Testbench results show that the agent can store
subscribed data to the desired data storage and drop the unwanted data, behaving asexpected in the architecture design.
This chapter presents the summary of the thesis work in Section 5.1. Besides,future research directions are presented.
5.1 Summary
The background knowledge related to the H/W architecture design of the publish andsubscribe mechanism has been introduced. The mechanism is studied and explained.
Based on the understanding of the publish and subscribe mechanism, an archi-tecture has been devised and explained to demonstrate how it works. The final versionis worked out for the prototype architecture design after the analysis of the transmissionmethods. Broadcast is chosen for the data transmitting. Publication announcementsare eliminated and subscription announcements are recorded within the agent. Thesehelp to simplify the original design and decrease the network traffic.
Some assumptions are made depending on the practical issues. Subject to theseassumptions, the final architecture is devised in more detail. Data storage devices arechosen to be the DDR SDRAM for the context data and FIFO for the periodical data.Local subscription lists are going to be realized by a CAM for the periodical data typeand a look-up table for the context data type. The data descriptor format has beendesigned. The structure of frames that are transmitted and received by the agent isdevised to include Ethernet Header, IP Header, UDP Header, the data descriptor andthe data. A functional model of the detailed architecture design has been developed todo the verification. The testbench has been constructed to verify the correctness of thefunctional model. Foreign Language Interface is adopted to create a module connectingto the agent. The module is written to be able to simulate read, write and subscribeoperations of the process. It also translates the operations to proper bus signals for the
41
42 CHAPTER 5. CONCLUSIONS
agent. The module in the transmitting node reads from text files the frame sequencesthat have been designed. In the receiving node, the module reads subscribed data fromthe memory and stores them to a text file. The module also does the subscriptionby reading in the subscribed data descriptors from a text file. The simulation resultsshow that the agent could send out data and store the subscribed data to the correctdata storage based on the analysis on the data descriptor. The IP block developmentmethod in Xilinx development environment has also been explored by building thesynthesizable agent transmitting channel, since the prototype architecture design ofthe publish and subscribe mechanism will be first realized as an IP core in the Xilinxembedded system development platform. The BFM module is investigated for certifyingthe functionality of IP block attached to the PowerPC PLB bus. It is studied and triedon the transmitting channel testbench, which is connected to the PowerPC PLB bus.The result shows that BFM generates correct bus signals for the agent as indicated bythe compiled .bfl file. This research helps the future prototype implementation work.
5.2 Future Research Directions
Future research directions are discussed from two aspects. One is from the agent’sarchitecture viewpoint, the other is from the implementation perspective.
• Architecture
– Supporting persistent data Another category of data – persistent datashould be considered in the agent design, because this category of data isdesigned to be the basic element to restart a collapsed node. Compared tothe other two types of data, the persistent data is stored in the persistentmemory as introduced in Section 2.3. So, a persistent memory controllershould be added to write data from the network to the memory and readdata out to the processor. The method can be the same as context datamanagement. The difference resides in the different control methods on thememory. Or, the processor embedded in the FPGA, such as the PowerPC inVirtex II Pro can be applied to control the persistent memory.
– Supporting variable length frame Variable length frame support relatesto the following three aspects in the architecture design.
∗ Variable length in Egress Data Buffer∗ Variable length in Ingress Data Buffer∗ Variable length in Data Storage
For the first two aspects, the MAC FIFO IP cores have already supportedvariable length of frames. The status of FIFOs needs to be adopted to controlthe data flow from the processor to the transmitting MAC FIFO. Special logicblock needs to generate correct start and end signals for each variable lengthframe. For the periodical data base, when the frame is writing to it, the datacould be followed by certain delimiter. Thus different frames are separated bynoticeable boundaries. Or according to the length information in the frame
5.2. FUTURE RESEARCH DIRECTIONS 43
structure, the processor could first read the header of the data then decides toread how many bytes out of the periodical data FIFO. For the Context DataStorage, dividing the whole memory space to be equal-length block is an easyway to manage the memory. But it causes the waste of memory. In the futuredesign, a more efficient way of managing the memory should be worked out.
– Supporting more applications attached to one agent UDP header in-formation will be used to separate data for different applications. A processingelement for UDP header analysis should be added.
– Totally splitting the communication task In the current design, theprocessor still needs to participate in the communication process, like gener-ating the frame header. In the future, by adding a header generation block,it is expected that the processor concentrates on its processing tasks.
– Adding drivers for this IP core Users could easily attach this core to thesystem bus and call the driver program to do the subscribe, write and readoperations.
– Adding a time stamp to the context data to prevent reading of the samedata for several times. This information can be added in the Context DataSubscription List.
– Changing the interface to adapt DSPs Since the final design will workwith DSPs, the future work includes changing the interface from PowerPCPLB bus to the DSP interface. DMA controllers in DSPs could be adopted tomove data between Context Data Storage, Periodical Data FIFO and proces-sor’s memory. In this way, data from publishers can write directly to theprocessor’s memory in subscribers.
• Implementation
– Since the development environment has been built up, future work shouldinclude the complete realization of the agent design on the development board.
– Synthesizable HDL design will make it possible to measure the performanceof the agent based on the FPGA vendors’ tools in a reliable way. The speedthat an agent could process the data could be obtained by doing the timingsimulation. After the design is implemented on the FPGAs, several nodescould be connected by Ethernet cables to construct a real testbench to exam-ine the relations between the data throughput and the data transfer latency.Different data throughput is added and the data transfer latency is measured.The highest supported data throughput could be obtained in this way andthe reference data throughput could be given to the design case mentioned inSection 1.1.
44 CHAPTER 5. CONCLUSIONS
Bibliography
[1] S. Srinivasan and N. K. Jha, “Hardware-software Co-Synthesis of Fault-TolerantReal-Time Distributed Embedded Systems,” in Proc. European Design AutomationConference with EURO-VHDL (EURO-DAC’95), Brighton, Great Britain, 1995,pp. 334–339.
[2] R. B. Ortega and G. Borriello, “Communication Synthesis for Distributed Embed-ded Systems,” in IEEE/ACM International Conference on Computer-Aided Design(ICCAD’98), San Jose, United States, 1998, pp. 437–444.
[3] M.Boasson, “Control System Software,” IEEE Trans. on Automatic Control, vol. 38,pp. 1094–1107, 1993.
[4] M. Boasson, “Subscription as a Model for the Architecture of Embedded Systems,”in Second IEEE International Conference on Engineering of Complex ComputerSystems, Montreal, Canada, 1996, pp. 130–133.
[5] “Software Architecture for Large Embedded Systems,”http://homepages.cwi.nl/ marcello/SAPapers/.
[6] V. Oklobdzija and K. Nowka, The Computer Engineering Handbook. Taylor &Francis, 1997.
[7] “Distributed System Principles,” www.cs.ucl.ac.uk/staff/W.Emmerich.
[8] B. D. V. Veen and K. Buckley, “Beamforming: A Versatile Approach to SpatialFiltering,” IEEE ASSP Magazine.
[9] “Digital Down Converter Demo/Framework for HERON modules with FPGA,”http://www.hunteng.co.uk/.
[10] J. P. Sterbenz and J. D. Touch, High-speed Networking: A Systematic Approach toHigh-bandwidth Lowlatency Communication. John Wiley & Sons, Inc., 2001.
[11] S. Feit, TCP/IP : Arcitecture, Protocols, and Implementation With IPv6 and IPSecurity. McGraw-Hill Professional, 1999.
[12] J. Bergeron, Writing Testbenches : Functional Verification of HDL Models. KluwerAcademic, 2000.
[13] M. C. Julia Hunter and P. Chernett, “The use of multicast in subscription architec-ture,” in IEEEInformaiton Technology Conference, Syracuse, United States, 1998,pp. 49–52.
[14] P. Miller, TCP/IP explained. Digital, 1997.
[15] L. Chisvin and R. J. Duckworth, “Content-addressable and associative memory :Alternatives to the ubiquitous ram,” IEEE Computer Magazine, pp. 51–64, 1989.
45
46 BIBLIOGRAPHY
[16] Xilinx, “An Overview of Multiple CAM Designs in Virtex Family Devices,” 1999.
[17] R. Ramakrishnan and J. Gehrke, Database Management Systems, 3rd. edition.McGraw-Hill Higher Education, 2003.
[18] ModelSim, “Foreign Language Interface,” 2003.
[19] A. Leon-Garcia and I. Widjaja, Communication Networks. McGraw-Hill, 2000.
[20] Xilinx, “RocketPHY Development Kit User Guide,” 2003.
[21] ——, “PLB IPIF Product Specification,” 2004.
[22] ——, “Parameterizable LocalLink FIFO,” 2004.
[23] ——, “BFM Simulation in Platform Studio,” 2004.
Publish and Subscribe ProcessDescription in the OriginalArchitecture A
Local
Subscriptions List
Local
Publications List
Foreign
Subscriptions List
Foreign
Publications List
Local Subscription
/ Publication
Control
Foreign
Subscription /
Publication
Control
Process
Data Storage
LSFP Matching
LPFS Matching
Data
Descripto
r
Gen
era
tion
Data
Descripto
r
Absorp
tio
n
Egre
ss D
ata
Bu
ffe
r
Ing
ress D
ata
Buff
er
Network Interface
Network
Agent
Announcement/Data
Announcement/Data
Unit Control
Process Produces
data
Data Descriptor
Generation Unit
generates descriptor for
the data
Local Subscription /
Publication Control Unit
puts the descriptor to the Local
Publication List. The data is
stored in the Data Storage
Unit.
Local Subscription /
Publication Control Unit
announces the
publication through the
Egress Data Buffer,
Network Interface and
the Network
End
Begin
(a) Block diagram (b) Flow chart
Figure A.1: The publish process on a publishing agent
47
48 APPENDIX A. PUBLISH AND SUBSCRIBE PROCESS DESCRIPTION IN THEORIGINAL ARCHITECTURE
Local
Subscriptions List
Local
Publications List
Foreign
Subscriptions List
Foreign
Publications List
Local Subscription
/ Publication
Control
Foreign
Subscription /
Publication
Control
Process
Data Storage
LSFP Matching
LPFS Matching
Data
Descripto
r
Genera
tion
Data
Descripto
r
Absorp
tion
Egre
ss D
ata
Buffer
Ingre
ss D
ata
Buffer
Network Interface
Network
Agent
Announcement/Data
Announcement/Data
Unit Control
Announcements
from the Network,
the Ingress Data
Buffer are
recognized
Foreign Subscription /
Publication Control Unit
judges if the announcement is
publication or subscription (in
this case, it is the publication
message)
Foreign Subscription /
Publication Control Unit
puts the published data
descriptor to the Foreign
Publication List
End
Begin
(a) Block diagram (b) Flow chart
Figure A.2: Agents’ reaction to publication announcements
Local
Subscriptions List
Local
Publications List
Foreign
Subscriptions List
Foreign
Publications List
Local Subscription
/ Publication
Control
Foreign
Subscription /
Publication
Control
Process
Data Storage
LSFP Matching
LPFS Matching
Data
Descripto
r
Genera
tion
Data
Descripto
r
Absorp
tion
Egre
ss D
ata
Buffer
Ingre
ss D
ata
Buffer
Network Interface
Network
Agent
Announcement/Data
Announcement/Data
Unit Control
Process requests
data
Data Descriptor
Generation Unit
generates descriptor for
the data
Local Subscription /
Publication Control Unit
puts the descriptor to the Local
Subscription List.
Local Subscription /
Publication Control Unit
announces the
subscription through the
Egress Data Buffer,
Network Interface and
the Network
End
Begin
(a) Block diagram (b) Flow chart
Figure A.3: The subscribe process on a subscribing agent
49
Local
Subscriptions List
Local
Publications List
Foreign
Subscriptions List
Foreign
Publications List
Local Subscription
/ Publication
Control
Foreign
Subscription /
Publication
Control
Process
Data Storage
LSFP Matching
LPFS Matching
Data
Descripto
r
Genera
tion
Data
Descripto
r
Absorp
tion
Egre
ss D
ata
Buffer
Ingre
ss D
ata
Buffer
Network Interface
Network
Agent
Announcement/Data
Announcement/Data
Unit Control
Announcements
from the Network,
the Ingress Data
Buffer are
recognized
Foreign Subscription /
Publication Control Unit
judges if the announcement is
publication or subscription (in
this case, it is the subscription
message)
Foreign Subscription /
Publication Control Unit
puts the subscribed data
descriptor to the Foreign
Subscription List
End
Begin
(a) Block diagram (b) Flow chart
Figure A.4: Agents’ reaction to subscription announcements
Local
Subscriptions List
Local
Publications List
Foreign
Subscriptions List
Foreign
Publications List
Local Subscription
/ Publication
Control
Foreign
Subscription /
Publication
Control
Process
Data Storage
LSFP Matching
LPFS Matching
Data
Descripto
r
Genera
tion
Data
Descripto
r
Absorp
tion
Egre
ss D
ata
Buffer
Ingre
ss D
ata
Buffer
Network Interface
Network
Agent
Announcement/Data
Announcement/Data
Unit Control
LPFS Matching
Unit checks the
Local Publication
List and the Foreign
Subscription List
LPFS Matching Unit
puts the publication data
which has subscribers
to it in the data storage
to the Egress Data
Buffer and the Network
End
Find a pair? No
Yes
Begin
(a) Block diagram (b) Flow chart
Figure A.5: The write process on a writing agent
50 APPENDIX A. PUBLISH AND SUBSCRIBE PROCESS DESCRIPTION IN THEORIGINAL ARCHITECTURE
Local
Subscriptions List
Local
Publications List
Foreign
Subscriptions List
Foreign
Publications List
Local Subscription
/ Publication
Control
Foreign
Subscription /
Publication
Control
Process
Data Storage
LSFP Matching
LPFS Matching
Data
Descripto
r
Genera
tion
Data
Descripto
r
Absorp
tion
Egre
ss D
ata
Buffer
Ingre
ss D
ata
Buffer
Network Interface
Network
Agent
Announcement/Data
Announcement/Data
Unit Control
Published data
from the Network,
the Ingress Data
Buffer are
recognized
LSFP Matching Unit
checks the Local
Subscription List and the
Foreign Publication List
Put the data to the Data
Storage Unit
End
Begin
Find a pair?
(a) Block diagram (b) Flow chart
Figure A.6: Agents’ reaction to the write operation
Local
Subscriptions List
Local
Publications List
Foreign
Subscriptions List
Foreign
Publications List
Local Subscription
/ Publication
Control
Foreign
Subscription /
Publication
Control
Process
Data Storage
LSFP Matching
LPFS Matching
Data
Descripto
r
Genera
tion
Data
Descripto
r
Absorp
tion
Egre
ss D
ata
Buffer
Ingre
ss D
ata
Buffer
Network Interface
Network
Agent
Announcement/Data
Announcement/Data
Unit Control
Read the data from the
Data Storage Unit
End
No matching
No
Yes
Process generates
data query for the
Data Storage Unit
Begin
(a) Block diagram (b) Flow chart
Figure A.7: The read process on a reading agent
Behavioral Model TestbenchDesign B
wait state sense state
transmit state
The tra
nsm
ittin
g M
AC
FIF
O is
not em
pty
and the n
et w
ork
is id
le
The transmitting MAC
FIFO is empty or the
net work is busy
Wait state counter counts to 0
Wait counter
does not
reach 0
Collis
ion d
ete
cte
d
No collision detected and the
current frame transmission is not
over
The c
urr
ent fram
e
transm
issi
on is
ove
r
Figure B.1: The CSMA/CD state machine adopted in the MAC behavioral model
51
52 APPENDIX B. BEHAVIORAL MODEL TESTBENCH DESIGN
subscribe periodical data
(read periodical data subscription
file and supply the data bus the
subscribed data descriptor, the
address bus Periodical Data
Subscription List address)
subscribe context data
(read context data subscription file
and supply the data bus the
subscribed data descriptor, the
address bus Context Data
Subscription List address)
begin
Check
Periodical
Data FIFO
Read
Periodical
Data FIFO
Read
subscribed
context data
Check
subscribed
context
data
Periodical Data
FIFO not ready
Periodical Data
FIFO ready
Subscribed context
data not ready
Subsc
ribed
con
text
data
read
y
Write to periodical
data record txt file
done
Write
to
co
nte
xt
da
ta r
eco
rd t
xt
file
do
ne
Figure B.2: Read process
begin
check MAC FIFO
full status
publish data
(read data from publish data txt file
and supply the data bus the data,
the address bus MAC FIFO
address)
Y
N
Figure B.3: Write process
53
No
de
A1
N
od
e B
1
fra
me
_0
.txt
–
fra
me
_2
6.t
xt
c_
drive
r_a
1
c_
drive
r_b
1
b1
_su
b_
c.t
xt
b1
_su
b_
p.t
xt
b1
_re
c_
p.t
xt
b1
_re
c_
c.t
xt
No
de
A2
fr
am
e_
0.t
xt
–
fra
me
_2
6.t
xt
c_
drive
r_a
2
No
de
A3
fr
am
e_
0.t
xt
–
fra
me
_2
6.t
xt
c_
drive
r_a
3
No
de
A4
fr
am
e_
0.t
xt
–
fra
me
_2
6.t
xt
c_
drive
r_a
4
No
de
B2
c_
drive
r_b
2
b2
_su
b_
c.t
xt
b2
_su
b_
p.t
xt
b2
_re
c_
p.t
xt
b2
_re
c_
c.t
xt
Figure B.4: Behavioral model testbench
54 APPENDIX B. BEHAVIORAL MODEL TESTBENCH DESIGN
Behavioral Model SimulationWaveforms C
transm
it_s
040 u
s80 u
s
sim
:/w
1r1
/bus2ip
_clk
sim
:/w
1r1
/bus2ip
_re
set
sim
:/w
1r1
/gtx
_clk
sim
:/w
1r1
/eport
sim
:/w
1r1
/sendin
g_node1/c
urr
ent_
sta
tetr
ansm
it_s
sim
:/w
1r1
/sendin
g_node1/tx_m
ac_fifo
_fu
ll
sim
:/w
1r1
/sendin
g_node1/tx_m
ac_fifo
_em
pty
sim
:/w
1r1
/receiv
ing_node1/m
ac_re
c_fifo
sim
:/w
1r1
/receiv
ing_node1/m
ac_fifo
_fu
ll
sim
:/w
1r1
/sendin
g_node1/tx_m
acfifo
_cnt
sim
:/w
1r1
/receiv
ing_node1/fra
me_no
Entity
:w1r1
A
rchitectu
re:s
truct D
ate
: T
hu J
un 0
2 1
0:2
1:4
3 M
ES
T 2
005 R
ow
: 1 P
age: 1
Figure C.1: Test case 1: one sending node, one receiving node, bus clock 125MHz,transmitting MAC FIFO depth:12 frames
55
56 APPENDIX C. BEHAVIORAL MODEL SIMULATION WAVEFORMS
tra
nsm
it_
s
04
0 u
s8
0 u
s
sim
:/w
1r1
/bu
s2
ip_
clk
sim
:/w
1r1
/bu
s2
ip_
rese
t
sim
:/w
1r1
/gtx
_clk
sim
:/w
1r1
/ep
ort
sim
:/w
1r1
/se
nd
ing
_n
od
e1
/cu
rre
nt_
sta
tetr
an
sm
it_
s
sim
:/w
1r1
/se
nd
ing
_n
od
e1
/tx_
ma
c_
fifo
_fu
ll
sim
:/w
1r1
/se
nd
ing
_n
od
e1
/tx_
ma
c_
fifo
_e
mp
ty
sim
:/w
1r1
/re
ce
ivin
g_
no
de
1/m
ac_
rec_
fifo
sim
:/w
1r1
/re
ce
ivin
g_
no
de
1/m
ac_
fifo
_fu
ll
sim
:/w
1r1
/se
nd
ing
_n
od
e1
/tx_
ma
cfifo
_cn
t
sim
:/w
1r1
/re
ce
ivin
g_
no
de
1/f
ram
e_
no
En
tity
:w1
r1
Arc
hite
ctu
re:s
tru
ct
Da
te:
Th
u J
un
02
09
:51
:28
ME
ST
20
05
R
ow
: 1
Pa
ge
: 1
Figure C.2: Test case 2: one sending node, one receiving node, bus clock 125MHz,transmitting MAC FIFO depth:4 frames
57
wait_s
wa
it_
sw
ait_
str
an
sm
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_s
wait_
sw
ait_s
wa
it_
sw
ait_s
wait_
sw
ait_s
wa
it_s
wa
it_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
wait_s
wa
it_
sw
ait_s
wait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_s
wait_
sw
ait_s
tra
nsm
it_
sw
ait_
sw
ait_s
wait_s
wait_
sw
ait_s
tra
nsm
it_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
str
an
sm
it_
s
wait_s
wa
it_
sw
ait_s
wa
it_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_s
wa
it_
sw
ait_s
wait_s
wa
it_
sw
ait_s
wa
it_
sw
ait_s
tra
nsm
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_s
wait_s
wa
it_
sw
ait_s
wa
it_
s
wait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
str
an
sm
it_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
str
an
sm
it_
sw
ait_s
wait_s
wa
it_
sw
ait_s
tra
nsm
it_
s
wait_
sw
ait_s
wa
it_
sw
ait_s
wait_
sw
ait_s
wait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_s
wa
it_
s
se
nse
_s
01
00
us
20
0 u
s3
00
us
40
0 u
/w4
r2/g
tx_
clk
/w4
r2/e
po
rt
/w4
r2/s
en
din
g_
no
de
1/c
urr
en
t_sta
tew
ait_s
wa
it_
sw
ait_
str
an
sm
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_s
wait_
sw
ait_s
wa
it_
sw
ait_s
wait_
sw
ait_s
wa
it_s
wa
it_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
/w4
r2/s
en
din
g_
no
de
1/t
x_
ma
c_
fifo
_fu
ll
/w4
r2/s
en
din
g_
no
de
1/t
x_
ma
cfifo
_cn
t
/w4
r2/s
en
din
g_
no
de
2/c
urr
en
t_sta
tew
ait_s
wa
it_
sw
ait_s
wait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_s
wait_
sw
ait_s
tra
nsm
it_
sw
ait_
sw
ait_s
wait_s
wait_
sw
ait_s
tra
nsm
it_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
str
an
sm
it_
s
/w4
r2/s
en
din
g_
no
de
2/t
x_
ma
c_
fifo
_fu
ll
/w4
r2/s
en
din
g_
no
de
2/t
x_
ma
cfifo
_cn
t
/w4
r2/s
en
din
g_
no
de
3/c
urr
en
t_sta
tew
ait_s
wa
it_
sw
ait_s
wa
it_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_s
wa
it_
sw
ait_s
wait_s
wa
it_
sw
ait_s
wa
it_
sw
ait_s
tra
nsm
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_s
wait_s
wa
it_
sw
ait_s
wa
it_
s
/w4
r2/s
en
din
g_
no
de
3/t
x_
ma
c_
fifo
_fu
ll
/w4
r2/s
en
din
g_
no
de
3/t
x_
ma
cfifo
_cn
t
/w4
r2/s
en
din
g_
no
de
4/c
urr
en
t_sta
tew
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
str
an
sm
it_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
str
an
sm
it_
sw
ait_s
wait_s
wa
it_
sw
ait_s
tra
nsm
it_
s
wait_
sw
ait_s
wa
it_
sw
ait_s
wait_
sw
ait_s
wait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_s
wa
it_
s
/w4
r2/s
en
din
g_
no
de
4/t
x_
ma
c_
fifo
_fu
ll
/w4
r2/s
en
din
g_
no
de
4/t
x_
ma
cfifo
_cn
t
/w4
r2/r
ece
ivin
g_
no
de
1/c
urr
en
t_sta
tese
nse
_s
/w4
r2/r
ece
ivin
g_
no
de
1/g
oo
d_
fra
me
/w4
r2/r
ece
ivin
g_
no
de
1/b
ad
_fr
am
e
/w4
r2/r
ece
ivin
g_
no
de
1/m
ac_
fifo
_fu
ll
/w4
r2/r
ece
ivin
g_
no
de
1/f
ram
e_
no
En
tity
:w4
r2
Arc
hite
ctu
re:s
tru
ct
Da
te:
Mo
n J
un
06
09
:18
:18
ME
ST
20
05
R
ow
: 1
Pa
ge
: 1
Figure C.3: Test case 3: four sending nodes, two receiving nodes, bus clock 125MHz,transmitting MAC FIFO depth:12 frames
58 APPENDIX C. BEHAVIORAL MODEL SIMULATION WAVEFORMS
se
nse
_s
01
00
us
20
0 u
s3
00
us
40
0 u
/w4
r2/g
tx_
clk
/w4
r2/e
po
rt
/w4
r2/s
en
din
g_
no
de
1/c
urr
en
t_sta
te
/w4
r2/s
en
din
g_
no
de
1/t
x_
ma
c_
fifo
_fu
ll
/w4
r2/s
en
din
g_
no
de
1/t
x_
ma
cfifo
_cn
t
/w4
r2/s
en
din
g_
no
de
2/c
urr
en
t_sta
te
/w4
r2/s
en
din
g_
no
de
2/t
x_
ma
c_
fifo
_fu
ll
/w4
r2/s
en
din
g_
no
de
2/t
x_
ma
cfifo
_cn
t
/w4
r2/s
en
din
g_
no
de
3/c
urr
en
t_sta
te
/w4
r2/s
en
din
g_
no
de
3/t
x_
ma
c_
fifo
_fu
ll
/w4
r2/s
en
din
g_
no
de
3/t
x_
ma
cfifo
_cn
t
/w4
r2/s
en
din
g_
no
de
4/c
urr
en
t_sta
te
/w4
r2/s
en
din
g_
no
de
4/t
x_
ma
c_
fifo
_fu
ll
/w4
r2/s
en
din
g_
no
de
4/t
x_
ma
cfifo
_cn
t
/w4
r2/r
ece
ivin
g_
no
de
1/c
urr
en
t_sta
tese
nse
_s
/w4
r2/r
ece
ivin
g_
no
de
1/g
oo
d_
fra
me
/w4
r2/r
ece
ivin
g_
no
de
1/b
ad
_fr
am
e
/w4
r2/r
ece
ivin
g_
no
de
1/m
ac_
fifo
_fu
ll
/w4
r2/r
ece
ivin
g_
no
de
1/f
ram
e_
no
En
tity
:w4
r2
Arc
hite
ctu
re:s
tru
ct
Da
te:
Mo
n J
un
06
10
:11
:45
ME
ST
20
05
R
ow
: 1
Pa
ge
: 1
Figure C.4: Test case 4: four sending nodes, two receiving nodes, bus clock 1250MHz,transmitting MAC FIFO depth:12 frames
59
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
tra
nsm
it_
str
an
sm
it_
s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
tra
nsm
it_
s
wa
it_
sw
ait_
sw
ait_
sw
ait_
s
sense_s
sense_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sse
nse
_s
wa
it_
str
an
sm
it_
s
wa
it_
str
an
sm
it_
s
se
nse
_s
se
nse
_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
str
an
sm
it_
s
wa
it_
sse
nse
_s
wa
it_
sw
ait_
sse
nse
_s
wa
it_
s
se
nse
_s
wa
it_
sw
ait_
sw
ait_
str
an
sm
it_
s
wa
it_
str
an
sm
it_
s
wa
it_
str
an
sm
it_
s
wa
it_
str
an
sm
it_
s
wa
it_
s
se
nse
_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
str
an
sm
it_
s
wa
it_
sse
nse
_s
wa
it_
sse
nse
_s
wa
it_
s
se
nse
_s
wa
it_
sw
ait_
str
an
sm
it_
s
wa
it_
str
an
sm
it_
s
wa
it_
str
an
sm
it_
s
wa
it_
sw
ait_
sw
ait_s
se
nse
_s
wa
it_
sse
nse
_s
wa
it_
s
se
nse
_s
wa
it_
sw
ait_
str
an
sm
it_
s
wa
it_
ssense_s
wa
it_
str
an
sm
it_
sw
ait_
sw
ait_
sw
ait_
str
an
sm
it_
s
wa
it_
s
wa
it_
sse
nse
_s
wa
it_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
str
an
sm
it_
sw
ait_
sse
nse
_s
wa
it_
sw
ait_
sw
ait_
sse
nse
_s
wa
it_
sse
nse
_s
wa
it_
sse
nse
_s
wa
it_
s
se
nse
_s
wa
it_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
str
an
sm
it_
s
wa
it_
sse
nse
_s
wa
it_
sw
ait_s
tra
nsm
it_
s
wa
it_
str
an
sm
it_
s
wa
it_
s
wa
it_
sse
nse
_s
transm
it_s
wa
it_
str
an
sm
it_
sw
ait_
sw
ait_
str
an
sm
it_
s
wa
it_
ssense_s
wait_s
tra
nsm
it_
sw
ait_s
tra
nsm
it_
s
wa
it_
sse
nse
_s
wa
it_
s
04
0 u
s8
0 u
s
sim
:/w
10
r1/g
tx_
clk
sim
:/w
10
r1/e
po
rt0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
0
sim
:/w
10
r1/s
en
din
g_
no
de
1/c
urr
en
t_sta
tetr
an
sm
it_
str
an
sm
it_
s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
tra
nsm
it_
s
wa
it_
sw
ait_
sw
ait_
sw
ait_
s
sim
:/w
10
r1/s
en
din
g_
no
de
2/c
urr
en
t_sta
tesense_s
sense_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sse
nse
_s
wa
it_
str
an
sm
it_
s
wa
it_
str
an
sm
it_
s
sim
:/w
10
r1/s
en
din
g_
no
de
3/c
urr
en
t_sta
tese
nse
_s
se
nse
_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
str
an
sm
it_
s
wa
it_
sse
nse
_s
wa
it_
sw
ait_
sse
nse
_s
wa
it_
s
sim
:/w
10
r1/s
en
din
g_
no
de
4/c
urr
en
t_sta
tese
nse
_s
wa
it_
sw
ait_
sw
ait_
str
an
sm
it_
s
wa
it_
str
an
sm
it_
s
wa
it_
str
an
sm
it_
s
wa
it_
str
an
sm
it_
s
wa
it_
s
sim
:/w
10
r1/s
en
din
g_
no
de
5/c
urr
en
t_sta
tese
nse
_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
str
an
sm
it_
s
wa
it_
sse
nse
_s
wa
it_
sse
nse
_s
wa
it_
s
sim
:/w
10
r1/s
en
din
g_
no
de
6/c
urr
en
t_sta
tese
nse
_s
wa
it_
sw
ait_
str
an
sm
it_
s
wa
it_
str
an
sm
it_
s
wa
it_
str
an
sm
it_
s
wa
it_
sw
ait_
sw
ait_s
se
nse
_s
wa
it_
sse
nse
_s
wa
it_
s
sim
:/w
10
r1/s
en
din
g_
no
de
7/c
urr
en
t_sta
tese
nse
_s
wa
it_
sw
ait_
str
an
sm
it_
s
wa
it_
ssense_s
wa
it_
str
an
sm
it_
sw
ait_
sw
ait_
sw
ait_
str
an
sm
it_
s
wa
it_
s
sim
:/w
10
r1/s
en
din
g_
no
de
8/c
urr
en
t_sta
tew
ait_
sse
nse
_s
wa
it_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
str
an
sm
it_
sw
ait_
sse
nse
_s
wa
it_
sw
ait_
sw
ait_
sse
nse
_s
wa
it_
sse
nse
_s
wa
it_
sse
nse
_s
wa
it_
s
sim
:/w
10
r1/s
en
din
g_
no
de
9/c
urr
en
t_sta
tese
nse
_s
wa
it_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
str
an
sm
it_
s
wa
it_
sse
nse
_s
wa
it_
sw
ait_s
tra
nsm
it_
s
wa
it_
str
an
sm
it_
s
wa
it_
s
sim
:/w
10
r1/s
en
din
g_
no
de
10
/cu
rre
nt_
sta
tew
ait_
sse
nse
_s
transm
it_s
wa
it_
str
an
sm
it_
sw
ait_
sw
ait_
str
an
sm
it_
s
wa
it_
ssense_s
wait_s
tra
nsm
it_
sw
ait_s
tra
nsm
it_
s
wa
it_
sse
nse
_s
wa
it_
s
sim
:/w
10
r1/s
en
din
g_
no
de
1/t
x_
ma
cfifo
_cn
t
sim
:/w
10
r1/s
en
din
g_
no
de
2/t
x_
ma
cfifo
_cn
t
sim
:/w
10
r1/s
en
din
g_
no
de
3/t
x_
ma
cfifo
_cn
t
sim
:/w
10
r1/s
en
din
g_
no
de
4/t
x_
ma
cfifo
_cn
t
sim
:/w
10
r1/s
en
din
g_
no
de
5/t
x_
ma
cfifo
_cn
t
sim
:/w
10
r1/s
en
din
g_
no
de
6/t
x_
ma
cfifo
_cn
t
sim
:/w
10
r1/s
en
din
g_
no
de
7/t
x_
ma
cfifo
_cn
t
sim
:/w
10
r1/s
en
din
g_
no
de
8/t
x_
ma
cfifo
_cn
t
sim
:/w
10
r1/s
en
din
g_
no
de
9/t
x_
ma
cfifo
_cn
t
sim
:/w
10
r1/s
en
din
g_
no
de
10
/tx_
ma
cfifo
_cn
t
sim
:/w
10
r1/r
ece
ivin
g_
no
de
1/f
ram
e_
no
En
tity
:w1
0r1
A
rch
ite
ctu
re:s
tru
ct
Da
te:
Th
u J
un
02
10
:05
:53
ME
ST
20
05
R
ow
: 1
Pa
ge
: 1
Figure C.5: Test case 5: ten sending nodes, one receiving node, bus clock 125MHz,transmitting MAC FIFO depth:12 frames
60 APPENDIX C. BEHAVIORAL MODEL SIMULATION WAVEFORMS
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wait_s
wait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
04
0 u
s8
0 u
s
/w1
0r1
/gtx
_clk
/w1
0r1
/ep
ort
/w1
0r1
/se
nd
ing
_n
od
e1
/cu
rre
nt_
sta
tew
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
/w1
0r1
/se
nd
ing
_n
od
e2
/cu
rre
nt_
sta
tew
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
/w1
0r1
/se
nd
ing
_n
od
e3
/cu
rre
nt_
sta
tew
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
/w1
0r1
/se
nd
ing
_n
od
e4
/cu
rre
nt_
sta
tew
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
/w1
0r1
/se
nd
ing
_n
od
e5
/cu
rre
nt_
sta
tew
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
/w1
0r1
/se
nd
ing
_n
od
e6
/cu
rre
nt_
sta
tew
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
/w1
0r1
/se
nd
ing
_n
od
e7
/cu
rre
nt_
sta
tew
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
/w1
0r1
/se
nd
ing
_n
od
e8
/cu
rre
nt_
sta
tew
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wait_s
wait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
/w1
0r1
/se
nd
ing
_n
od
e9
/cu
rre
nt_
sta
tew
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wait_s
/w1
0r1
/se
nd
ing
_n
od
e1
0/c
urr
en
t_sta
tew
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_s
wait_s
wa
it_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
sw
ait_
s
/w1
0r1
/se
nd
ing
_n
od
e1
/tx_
ma
cfifo
_cn
t
/w1
0r1
/se
nd
ing
_n
od
e2
/tx_
ma
cfifo
_cn
t
/w1
0r1
/se
nd
ing
_n
od
e3
/tx_
ma
cfifo
_cn
t
/w1
0r1
/se
nd
ing
_n
od
e4
/tx_
ma
cfifo
_cn
t
/w1
0r1
/se
nd
ing
_n
od
e5
/tx_
ma
cfifo
_cn
t
/w1
0r1
/se
nd
ing
_n
od
e6
/tx_
ma
cfifo
_cn
t
/w1
0r1
/se
nd
ing
_n
od
e7
/tx_
ma
cfifo
_cn
t
/w1
0r1
/se
nd
ing
_n
od
e8
/tx_
ma
cfifo
_cn
t
/w1
0r1
/se
nd
ing
_n
od
e9
/tx_
ma
cfifo
_cn
t
/w1
0r1
/se
nd
ing
_n
od
e1
0/t
x_
ma
cfifo
_cn
t
/w1
0r1
/re
ce
ivin
g_
no
de
1/f
ram
e_
no
En
tity
:w1
0r1
A
rch
ite
ctu
re:s
tru
ct
Da
te:
Mo
n J
un
06
10
:42
:30
ME
ST
20
05
R
ow
: 1
Pa
ge
: 1
Figure C.6: Test case 6: ten sending nodes, one receiving node, bus clock 1250MHz,transmitting MAC FIFO depth:12 frames
Do File for Running BFM TestModules in ModelsimSimulator D] Compile BFM test modulesdo bfm system.do] Load BFM test platformvsim bfm system] Load Wave windowdo ../../scripts/wave4.do] Load BFL] do ../../scripts/sample.do] do ../../scripts/samplet1.do do ../../scripts/samplet2.do] do ../../scripts/samplet4.do] Start system clock and reset system force -freezesim:/bfm system/sys clk 1 0, 0 10 ns -r 20 ns force -freezesim:/bfm system/clk 1 0, 0 20 ns -r 40 ns force -freezesim:/bfm system/sys reset 1 force -freezesim:/bfm system/sys reset 0 100 ns, 1 200 ns
] Run test timerun 100 us] Release ModelSim simulation license] quit -sim] Close previous dataset if it exists] if [dataset info exists bfm test] dataset close bfm test] Open and view waveform] dataset open vsim.wlf bfm test] do ../../scripts/wave2.do
61
62 APPENDIX D. DO FILE FOR RUNNING BFM TEST MODULES INMODELSIM SIMULATOR
The Transmitting ChannelSimulation Waveforms E
00000000000000000000000000000000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
0000000000000000000000000000000000000000000000000000000000000000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00000000
00
00
00
00
00
00
00
00
00
00
00
00
0 00
00
00
00
00
00
00
00
00
00
00
00
00000000U
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
U11111111U
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
U
00000000
00000000
00000000
UU
UU
UU
UU
11111111
11111111
00000000
00000000
UU
UU
UU
UU
00000000
00000000
040 u
s80
/bfm
_syste
m/s
ys_re
set_
ibuf
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/clk
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/bus2ip
_clk
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/bus2ip
_re
set
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/bus2ip
_addr
00000000000000000000000000000000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/bus2ip
_data
0000000000000000000000000000000000000000000000000000000000000000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/bus2ip
_be
00
00
00
00
00000000
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/bus2ip
_rn
w
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/bus2ip
_ce
00
00
00
00
00
00
00
00
00
00
00
00
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/bus2ip
_rd
ce
0
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/bus2ip
_w
rce
00
00
00
00
00
00
00
00
00
00
00
00
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/bus2ip
_rd
req
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/bus2ip
_w
rreq
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/ip
2bus_data
00000000U
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
U11111111U
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
UU
U
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/ip
2bus_w
rack
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/rs
t
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/sta
rt
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/done
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/sm
_clk
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/sm
_rs
t
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/sm
_cnt
00000000
00000000
00000000
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/dpm
_a
UU
UU
UU
UU
11111111
11111111
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/dpm
_do
00000000
00000000
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/sta
t_re
g_en
/bfm
_syste
m/tra
ns_ip
if_tb
_1/tra
ns_ip
if_tb
_1/u
ut/user_
logic
_i/fifo
_din
UU
UU
UU
UU
00000000
00000000
Entity
:bfm
_syste
m A
rchitectu
re:s
tructu
re D
ate
: M
on M
ay 3
0 1
7:0
1:4
4 M
ES
T 2
005 R
ow
: 1 P
age: 1
Figure E.1: Transmitting channel testbench result (a)
63
64 APPENDIX E. THE TRANSMITTING CHANNEL SIMULATION WAVEFORMS
00
00
00
00
00
00
00
00
00
00
00
00
UU
UU
UU
UU
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
01
11
00
01
11
00
0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
0000000
00
00
00
00
00
00
1 00
00
00
00
0000000
04
0 u
s8
0
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/fifo
_e
n
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/tx
_d
ata
00
00
00
00
00
00
00
00
00
00
00
00
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/tx
_d
ata
_va
lid
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/tx
_a
ck
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/tx
_u
nd
err
un
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/t_
da
ta_
inU
UU
UU
UU
U0
00
00
00
00
00
00
00
0
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/t_
so
f_in
_n
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/t_
eo
f_in
_n
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/t_
src
_rd
y_
in_
n
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/t_
dst_
rdy_
ou
t_n
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/t_
fifo
_sta
tus_
ou
t_u
c0
00
00
00
00
00
0
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/fifo
_e
n_
pip
e0
00
11
10
00
11
10
00
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/rx
_b
ad
_fr
am
e
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/rx
_d
ata
00
00
00
00
00
00
00
00
00
00
00
00
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/rx
_d
ata
_va
lid
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/rx
_g
oo
d_
fra
me
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/r_
dst_
rdy_
in_
n
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/r_
da
ta_
ou
t0
00
00
00
00000000
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/r_
eo
f_o
ut_
n
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/r_
fifo
_sta
tus_
ou
t_u
c0
00
00
00
00
00
0
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/re
m_
ou
t_u
c1
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/r_
so
f_o
ut_
n
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/r_
src
_rd
y_
ou
t_n
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/llt
o_
da
ta_
ou
t0
00
00
00
00000000
/bfm
_syste
m/t
ran
s_
ipif_
tb_
1/t
ran
s_
ipif_
tb_
1/u
ut/
use
r_lo
gic
_i/llt
o_
da
ta_
v
En
tity
:bfm
_syste
m
Arc
hite
ctu
re:s
tru
ctu
re
Da
te:
Mo
n M
ay 3
0 1
7:0
1:4
4 M
ES
T 2
00
5
Ro
w:
1 P
ag
e:
2
Figure E.2: Transmitting channel testbench result (b)
Curriculum Vitae
Maomei Chen was born in Guilin, P.R.China. In 1997, shefinished her high school study at Beijing No.4 Middle Schooland began her five-year undergraduate study at Beijing Univer-sity of Technology. Her major is Electrical Engineering. Aftergraduation, she worked as an FAE for lattice semiconductor atBeijing CHICOM.
In 2003, she began her Master of Science study in the ComputerEngineering Group of Electrical Engineering Department atDelft University of Technology(TU Delft), Delft, The Nether-lands. In November 2004, she started her M.Sc thesis at Chess,Haarlem, The Netherlands under the supervision of Prof. Arjanvan Genderen from CE group, TU Delft and Frits van der Wa-teren from Chess. Her thesis topic is ”H/W Architecture De-sign for the Publish and Subscribe Mechanism”. Her researchinterests include: Embedded Systems Design, Systems and Net-works on Chip, Advanced Computer Architectures, ComputerArithmetic, Hardware-software co-design, Vector and ParallelProcessors.