7
Proteo: A New Approach to Network-on-Chip AATRAY KUMAR SINGH ELECTRONICS(VLSI & EMBEDDED SYSTEM) MIT ACADEMY OF ENGINEERING PUNE 412105 Email: [email protected] Abstract—The purpose of this paper is to present the basic ideas behind the development of our Network-on-Chip (NoC) architecture, called Proteo. The system designers are moving into higher abstraction levels and usage of reusable IP (Intellectual Property) blocks is increasing. The communication between the IP blocks is of increasing importance and thus also be designed in a reliable and fast way. One proposed solution to this problem is to use Network-On-Chip (NoC) architectures, which are built up from reusable interconnect IP blocks. In this paper an interface router IP for Proteo network is introduced and implemented. The network implements packet switching in a hierarchical topology. A NoC needs a considerable amount of resources that can be shared with other system level tasks, like power saving and fault tolerance mechanisms. I. INTRODUCTION As the feature dimensions scale down to deep submicron regime (below 0.25 m) the integration density is not limited by the individual feature sizes e.g. of circuit metallization layers, but by electrical phenomena, capacitive and inductive crosstalk between the interconnect lines. These effects will have a great impact in maximum operating frequency and power consumption. In this environment, communication within logic blocks will still be synchronous, but between them it will become asynchronous in order to solve the problem of clock skew and delay. This is the Globally Asynchronous Locally Synchronous (GALS) paradigm. In the System-on-Chip (SoC) designs of the future there will be hundreds of functional IP blocks and a large amount of embedded Dynamic Random Access Memory (DRAM) in a single chip. Communication requirements in this kind of systems are very demanding, because each of those IPs can communicate in Gbit/s range. Due to increased communication requirements the traditional bus-based solutions are not useful anymore, thus new kind of communication architectures must be developed. One proposal to solve the communication paradigm in SoC designs is to use NoCs, which are built up from reusable IP blocks. These networks are scalable because there can be as many Intellectual Property (IP) blocks as needed connected into the network, without dramatic problems in wiring delays, capacitance, clocking etc.The global interconnects need to be treated as similar IP blocks as processor cores or embedded memories. New flexible and configurable communication channel architectures need to be identified. These communication channels will not form dedicated buses as currently implemented on-chip and on PCBs, due to noise and scalability speed constraints. Thus, the overall communication scheme will resemble more computer networking than traditional bus based design. The paper is organized as follows: Section 2 provides the reader with a practical understanding of the Proteo network.Then section 3 is dedicated to the architecture of the proteo network on chip. Then in Section 4 the interface

Proteo: A new approach to network on chip

Embed Size (px)

DESCRIPTION

This paper is based on the idea about latest version of system on chip which are more compatible and versatile.

Citation preview

Page 1: Proteo: A new approach to network on chip

Proteo: A New Approach to Network-on-ChipAATRAY KUMAR SINGH

ELECTRONICS(VLSI & EMBEDDED SYSTEM)MIT ACADEMY OF ENGINEERING

PUNE 412105Email: [email protected]

Abstract—The purpose of this paper is to present the basic ideas behind the development of our Network-on-Chip (NoC) architecture, called Proteo. The system designers are moving into higher abstraction levels and usage of reusable IP (Intellectual Property) blocks is increasing. The communication between the IP blocks is of increasing importance and thus also be designed in a reliable and fast way. One proposed solution to this problem is to use Network-On-Chip (NoC) architectures, which are built up from reusable interconnect IP blocks. In this paper an interface router IP for Proteo network is introduced and implemented. The network implements packet switching in a hierarchical topology. A NoC needs a considerable amount of resources that can be shared with other system level tasks, like power saving and fault tolerance mechanisms.

I. INTRODUCTION

As the feature dimensions scale down to deep submicronregime (below 0.25 m) the integration density is not limited by the individual feature sizes e.g. of circuit metallization layers, but by electrical phenomena, capacitive and inductive crosstalk between the interconnect lines. These effects will have a great impact in maximum operating frequency and power consumption. In this environment, communication within logic blocks will still be synchronous, but between them it will become asynchronous in order to solve the problem of clock skew and delay. This is the Globally Asynchronous Locally Synchronous (GALS) paradigm. In the System-on-Chip (SoC) designs of the future there will be hundreds of functional IP blocks and alarge amount of embedded Dynamic Random Access Memory (DRAM) in a single chip. Communication requirements in this kind of systems are very demanding, because each of those IPs can communicate in Gbit/s range. Due to increased communication requirements the traditional bus-based solutions are not useful anymore, thus new kind of communication architectures must be developed. One proposal to solve the communication paradigm in SoC designs is to use NoCs, which are built up from reusable IP blocks. These networks are scalable because there can be as many Intellectual Property (IP) blocks as needed connected into the network, without dramatic problems in wiring delays, capacitance, clockingetc.The global interconnects need to be treated as similar IP blocks as processor cores or embedded memories. New flexible and configurable communication channel architectures need to be identified. These communication channels will not form dedicated buses as currently implemented on-chip and on PCBs, due to noise and scalability speed constraints. Thus, the overall communication scheme will resemble more computer networking than traditional bus based design. The paper is organized as follows: Section 2 provides the reader with a

practical understanding of the Proteo network.Then section 3 is dedicated to the architecture of the proteo network on chip. Then in Section 4 the interface router is explained in more details. Section 5 is dedicated to the synthesis results of the router and Section 6 for the reconfigurable noc design.Then section 7 provides the protocols for the proteo noc. Eventually section in 8, conclusions are drawn and also the present limitations of this router and future research are discussed.

II. PROTEO NETWORK ON CHIP

The NoCs are constructed from several basic buildingblocks, like routers or switches, bridges, links etc. The routers are used to route packets from one place to another. The IP blocks are also connected into them through some fixed interface. The bridges can connect several sub-networks together and the links are used to connect all building blocks together. Characteristics of the network used are strongly depending on how these basic blocks are implemented. This paper presents a flexible layered interface router IP implementation of the network router for the Proteo NoC. Proteo network is an on chip packet-switching network, which is developed in TUT to solve communication problems of the future SoCs. Proteo can be seen as a set of interconnection IP, which connects functional IPs together. The Proteo network is constructed from two kinds of blocks: interface router IP and bridge IP. The interface router IPs are used to connect functional IPs to the network and bridge IP is used to connect several sub-networks together. An example network is shown in Fig. 1. In this example there are three sub-networks connected together with two bridges. Nodes with two input output link pairs have been used in sub-network 1, of which topology is a bi-directional ring. Topology of the other sub-networks can be selected freely. Possible topologies are for example different kinds of trees and meshes. There can be also other sub-networks behind these sub-networks. Proteo does not restrict the complexity of the network topology. The Proteo NoC supports several different kinds of communication protocols, network topologies and packet formats. The Proteo NoC is described on two different abstraction levels. This paper concentrates on the low-level model of the interface router IP. These models are used for logical implementations, as well as to estimate physical properties like area, latency and delays of the Proteo network. There are also high-level models of Proteo building blocks, used for performance estimations and simulations of the larger networks. More detailed description of Proteo architecture, its protocol, packet formats and different kind of building blocks can be found in.

Page 2: Proteo: A new approach to network on chip

Fig. 1: Example Network.

III. THE PROTEO ARCHITECTURE

The architecture of our system is discussed at a more physical level.

A. Overview

The basic hardware elements in our network are hosts, nodesand links. Every host will be connected to the network usinga dedicated node as a wrapper. Our nodes present a VSIA compliant interface. This standard specifies three differentdownwards compatible Virtual Component Interfaces (VCI):peripheral (PVCI), basic (BVCI) and advanced (AVCI). Links and nodes are available as part of a library. They include parameters to customize their number of channels and dimensions, their interface options, supported data sizes and protocol features, based on requirements of functionality, throughput and Quality of Service (QoS). Our target domain is that of heterogeneous systems, with many different types of IPs co-working in the same chip. The system is divided in clusters, using a hierarchical network. This will comprise multiple subnets with different performance, topologies, packet formats, etc. The subnets are typically point-to-point structures, so each link can be effectively tuned to its individual traffic requirements.

B. Topologies

Currently, the topology being explored is a hierarchical network built from a system-wide bidirectional ring and several subnets with star (or bus) topology (Fig.2). The use of regular topologies allows easy routing and direct replication of blocks throughout the system. The connection of a host to the main ring or to a subnet depends on the available information at the host interface level: stars are formed with BVCI elements, while blocksimplementing the AVCI interface can be attached directly to

the ring. An interesting property of stars is that they allow the presence of several BVCI initiators in the same cluster, which is not supported directly by the standard.

Fig. 2: Example topology.

In the star topology well define two types of node: the satellite, which wraps in packets the information presented at its interface, and the hub, which keeps track of the pending transactions and routes the packets from node to node, while connecting the star to the rest of the network. The ring nodes are essentially homogeneous, implementing different features depending only on the needs of the hosts attached to them.

C. Hardware Elements

The architecture of a typical node is inspired in SCIstandard, which is a standard interface developed for multiprocessor systems. SCI implements a rich set of mechanisms covering most of the needs of high performance systems. Our node architecture extends the basic SCI architecture to allow a configurable number of dimensions and channels (Fig.3 and 4).

Fig. 3: Basic node architecture.

Page 3: Proteo: A new approach to network on chip

We have chosen a highly modular structure that makes easy its configuration and tuning. Links provide a high level interface, so they are effectively treated as modular elements and independently tuned. It must be easily modifiable and extensible, so we can use it to compare the behavior of different design choices. It must be relatively lightweight, so that large networks can be simulated. As we develop a synthesizable version of the different blocks, we should be able to backannotate the information we gather from the physical implementation in the high level model. Models at other levels of abstraction can be easily cosimulated, for example synthesizable blocks. We could use the model for verification of the final design.

Fig. 4: Extended node architecture with three I/O links andtwo channels in the first link.

IV. STRUCTURE OF INTERFACE ROUTER IPThe basic structure of the interface router IP consists of oneincoming link and one outgoing link. This approach allowsFig. 5: Structure of the router IP. only one-directional communication in a simple ring topology. If the designer wishes to use bi-directional communication or more complex topologies, the basic structure is duplicated andthe interface router IP is constructed from several layers. This layered approach allows us to build up networks with different kind of topologies. The interface router IP with two layers and two input-output link pairs is presented in Fig. 2. The interface standard used defines that there can be two different kinds of actors in communication, called initiator and target. The initiators can generate requests to the target and the target can only respond to these requests. Because of this definition we must also design two different kinds of interface router IPs. The basic structures of both routers are similar to each other. Because Proteo is a re-usable and flexible communication network there has to be a well-defined interface to connect different kinds of functional IP blocks into it. The interface used between interconnection IPs and functional IPs is Virtual Component Interface (VCI), which is defined by VSI Alliance.

Fig. 5: Structure of the router IP.

The VSI Alliance defines three different versions of the VCI, Peripheral Virtual Component Interface (PVCI), Basic Virtual Component Interface (BVCI) and Advanced Virtual Component Interface (AVCI). Currently Proteo network supports BVCI and AVCI standards. The implementation in this paper uses BVCI. The interface is used to generate Proteo packets from VCI standard signals when the functional IP sends data into the network and on the other end it will extract those signals from packets. There are several FIFOs in each interface router IP. ThoseFIFOs are called Output, Input and ByPass FIFO. All theFIFOs are generic register banks and they are used to storepackets. The Output FIFO is used to store packets from thefunctional IP block to the network. The packets that have been sent are stored in the Output FIFO until the interface deletes them from there. Before deleting, the packet can be sent again if the interface is requested to do so. The Input FIFO is used to store packets from the input link to the interface and the By Pass FIFO is used to store packets which are bypassing the interface router. The Input and By Pass FIFOs are simple FIFO buffers without any kind of re-send capabilities. The Input FIFO is a little different from the By Pass FIFO, because the Overflow Checker can delete the start of the packet from it if the entire packet does not fit into it. The Multiplexer and De-Multiplexer blocks are used to handle traffic between the interface block and different layers. These blocks are left out in case when there is only one layer in the interface router IP. The Multiplexer block is used to direct packets from the Input FIFOs to the Interface. It checks the status signals from Input FIFOs and if there is a packet in some FIFO it will tell that to the Interface. After the Interface has read the entire packet from the FIFO the Multiplexer starts checking FIFOs again. The De-Multiplexer block reads packets from Output FIFO, detects their destination address and according to a routing table it routes packets to the correct Distributor block. The Greeting block receives packets from the previous node. First it detects the packets destination address. Thenit compares the destination address to its IP block address and if they are equal it writes the packet to the Input FIFO through the Overflow Checker. If the addresses are not equal it writes the packet to the By Pass FIFO. The Overflow

Page 4: Proteo: A new approach to network on chip

Checker block receives packets from the Greeting block. This block is used to check the contents of the packet. The Overflow Checker also checks that the entire packet fits into the Input FIFO. In case that the Input FIFO becomes full in the middle of the packet the Overflow Checker controls the Input FIFO so that the start of the packet is deleted. When the Overflow Checker deletes a packet from the Input FIFO it will also generate a Re Send packet to the sender IP block. The Re Send packet will take care of the re-transmission of the original packet. The Distributor block is used to transmit packets to the output link from the FIFOs. The Distributor waits as long as there is a packet in some FIFO and then transmits one packet from that FIFO. The priority of the FIFOs can be changed very easily, but in default the highest priority is for the ByPass FIFO, next priority is for the Overflow Checker and the lowest is in the Request FIFO. This default priority secures that the network traffic through the interface router is not delayed. Also more complex arbitration schemes can be implemented.

V. PROTOCOLS

If the communication needs are characterized correctly, wecan enable/disable protocol features at each node. Just a basic packet format has to be defined and kept throughout the network (Fig.5).

Fig. 6: Generic packet format.

In the stars the transactions can be split or not, depending onthe interfaces involved (PVCI or BVCI). The requester places its request at the node interface. The node converts it to packet format. The star-hub takes this packet and delivers it to the target node, while logging in an internal table the start of the transaction. Given that these basic interfaces dont support out-of-order responses, the next packet presented by the target node will be sent to the first pending requester in the list for that node. In this way well allow the use the PVCI and the BVCI in multi-requester environments. When the target of the transaction is not in the star, the star-hub adds the extrain formation needed to the packet and forwards it to the ring.In the ring, transactions are split and out-of-order responsesare allowed. The AVCI-interface presented by the blocksattached to the ring provides more information about thetransaction, like node, thread and packet identifiers. The nodes can be made quite simple and still form complex

networks, because their functionality is restricted to the lower levels of the protocol stack.

VI. PERFORMANCE

There are several methods how the performance of thenetwork can be estimated. There are several things that affect to total performance figures, like maximum clock frequency, latency through the synchronous parts, etc. Maximum allowed clock speed is estimated from the synthesis results. In this technology the maximum achievable clock frequency is 1GHz. Latency of the interface router IP can vary a lot depending on the current status of the network. The latency through the router can be quite small if there is no packet in the Output FIFO. On the other hand if the Distributor is just sending a packet from the Output FIFO the bypassing packet must wait in the By Pass FIFO until the packet is sent. Minimum latency of the bypassing packet is four clockcycles, when the maximum latency can be calculated as in (1).

Imax=4+ PL/WW ............................(1)

In (1) PL defines the maximum packet length in bits andWW defines the word width in bits. The latency figures from the input link to the functional IP block and from IP block to the output link are very similar to each other. Typical latency in these cases is similar to the maximum latency of the bypassing packet. Total performance of the network can be also estimated with different kinds of test cases. In the test case the network is used to handle traffic caused by some test program, this kind of test cases can be found from.

VII. CONCLUSIONS AND FUTURE RESEARCH

The future of highly integrated systems is pointing at anetwork-on-chip solution to the problems of interconnection, productivity and heterogeneity. We are trying to extend our NoC proposal to the fields of testing, fault tolerance and low-power techniques. The synthesizable interface router IP block for Proteo NoC implementation was presented. The presented interface router IP is constructed from several layers. The interface router IP can be used to implement packet switching NoC architectures with different kinds of topologies.Implementation of the interface router does not restrict thecomplexity of the network topology. The interface routerIP uses VCI interface standard to connect the functional IPblocks into the Proteo network. The interface router IP wasdesigned using VHDL and it was synthesized with Synopsysand 0.18mm standard cell technology. The achieved area and performance figures were also presented. The area figures show that onchip communication can be handled with Proteo network with tolerable area penalty. Future plans include: demonstrate the feasibility of complex hierarchical networks using our approach, obtain estimates of network and protocol performance by means of

Page 5: Proteo: A new approach to network on chip

simulations and finish the implementation of the basic set of building blocks and gather low level statistics.

REFERENCES

[1] David Siguenza-Tortosa, Jari Nurmi”Proteo: A New Approach to Network-on-Chip”, IEEE Oct,2002

[2] Avi Kolodny”Networks on Chips (NoC) Keeping up with Rents Rule and Moores Law”, IEEE March 2007

[3] Kangmin Lee, Se-Joong Lee, and Hoi-Jun Yoo,”Low-Power Networkon- Chip for High-Performance SoC Design” IEEE 2006.

[4] Benini, L., De Micheli, G.: Networks on Chip: A New SoC paradigm. IEEE Computer 35(1), January 2002, pp. 70-78.

[5] Robbe Vancayseele, Brahim Al Farisi, Wim Heirman, Karel Bruneel and Dirk StroobandtRecoNoC: a Reconfigurable Network-on-Chip,. IEEE,2010

[6] P. Guerrier and A. Greiner, ”A Generic Architecture for On-Chip Packet-Switched Interconnections”, Proc. Design, Automation and Test in Europe (DATE) 2000, 250-256.

[7] A. Hemani, A. Jantsch, S. Kumar, A. Postula, J. O berg, M. Millberg, and D. Lindqvist, Network on a chip: An architecture for billion transistor era, in Proceedings of the 18th IEEE NorChip Conference. IEEE, November 2000.

[8] Sonics micronetwork: Technical overview. http://www.sonicsinc.com/Pages/ Networks.html.

[9] SystemC website. http://www.systemc.org/.