Click here to load reader

Networks-on-Chip

  • Upload
    clodia

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

Networks-on-Chip. Ben Abdallah Abderazek The University of Aizu, Graduate school of Computer Science and Eng, Adaptive Systems Laboratory, E-mail: [email protected]. 03/01/2010. - PowerPoint PPT Presentation

Citation preview

  • Ben Abdallah AbderazekThe University of Aizu, Graduate school of Computer Science and Eng,Adaptive Systems Laboratory, E-mail: [email protected]*Hong Kong University of Science and Technology, March 2010Networks-on-Chip03/01/2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Part IINoC topologies NoC Switching strategiesRouting algorithmsFlow control schemesClocking schemesQoSBasic Building Blocks Status and Open Problems *Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • NoC Routing AlgorithmsThey must prevent deadlock, livelock, and starvationChoice of a routing algorithm depends on:Minimizing power required for routingMinimizing logic and routing tables*Hong Kong University of Science and Technology, March 2010Responsible for correctly and efficiently routing packets or circuits from source to destination

    Adaptive Systems Laboratory, Univ. of Aizu

  • Routing Algorithm ClassificationsThree different criteria:Where the routing decision are takenSource routingDistributed routingHow a path is definedStatic (deterministic)Adaptive The path lengthMinimal Nonminimal

    Routing schemes:Static, Dynamic, Distributed, Source routing, Minimal, and non-minimal routing

    *Adaptive Systems Laboratory, Univ. of Aizu

    Adaptive Systems Laboratory, Univ. of Aizu

  • NoC Routing-TableThe Routing Table determines for each PE the route via which it will send packets to other PEs. The routing table directly influences traffic in the NoC.Here we can also distinguish between 2 methods: Static routingDynamic (adaptive) routing

    *Adaptive Systems Laboratory, Univ. of Aizu

    Adaptive Systems Laboratory, Univ. of Aizu

  • Static RoutingThe Routing Table is constant. The route is embedded in the packet header and the routers simply forward the packet to the direction indicated by the headerThe routers are passive in their addressing of packets (simple routers)*Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Dynamic RoutingThe routing table can change dynamically during operation Logically, a route is changed when it becomes slow due to other trafficPossibly out-of-order arrival of packets.Usually requires more virtual channels.In this method we can identify 2 systems:Routing altering decisions are made in the routers (smart routers)Routing altering decisions are made in a dedicated central unit that receives traffic information from all the routers and can decide to change the routing table.

    *Adaptive Systems Laboratory, Univ. of Aizu

    Adaptive Systems Laboratory, Univ. of Aizu

  • Dynamic RoutingMore resources needed to monitor state of the network*XAdaptive routing methodPacketPacketHong Kong University of Science and Technology, March 2010

    321

    213

    Adaptive Systems Laboratory, Univ. of Aizu

  • Routing Algorithms RequirementsRouting algorithm must ensure freedom from deadlockse.g. cyclic dependency shown below

    Routing algorithm must ensure freedom from livelocks and starvation *Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Part IINoC topologies Switching strategiesNoC Routing Flow control schemesClocking schemesQoSBasic Building Blocks Status and Open Problems *Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Flow control schemesSTALL/GOLow overhead schemeRequires only two control wiresOne going forward and signaling data availabilitythe other going backward and signaling either a condition of buffers filled (STALL) or of buffers free (GO)*Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Flow control schemesT-ErrorMore aggressive scheme that can detect faults by making use of a second delayed clock at every buffer stageDelayed clock re-samples input data to detect any inconsistenciesthen emits a VALID control signalResynchronization stage added between end of link and receiving switch*Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Flow control schemesACK/NACKWhen flits are sent on a link, a local copy is kept in a buffer by senderWhen ACK received by sender, it deletes copy of flit from its local bufferWhen NACK is received, sender rewinds its output queue and starts resending flits, starting from the corrupted oneImplemented either end-to-end or switch-to-switch*Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Part IINoC topologies Switching strategiesRouting algorithmsFlow control schemesClocking schemesQoSBasic Building Blocks Status and Open Problems *Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Clocking schemesFully synchronousSingle global clock is distributed to synchronize entire chiphard to achieve in practice, due to process variations and clock skewMesochronousLocal clocks are derived from a global clockNot sensitive to clock skewPleisochronousclock signals are produced locallyAsynchronousclocks do not have to be present at all*Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Clocking schemesMesochronousLocal clocks are derived from a global clockNot sensitive to clock skew*PESYNCNISYNCSWCMUSYNCHong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Part IINoC topologies Switching strategiesRouting algorithmsFlow control schemesClocking schemesQoSBasic Building Blocks Status and Open Problems *Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Quality of Service (QoS)QoS refers to the level of commitment for packet deliveryThree basic categoriesBest effort (BE) Only correctness and completion of communication is guaranteedUsually packet switchedGuaranteed service (GS)makes a tangible guarantee on performance, in addition to basic guarantees of correctness and completion for communicationUsually (virtual) circuit switched Differentiated serviceprioritizes communication according to different categoriesNoC switches employ priority based scheduling and allocation policies*Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Part IINoC topologies Switching strategiesRouting algorithmsFlow control schemesClocking schemesQoSBasic Building Blocks Status and Open Problems *Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Basic NoC Building Blocks*Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Basic NoC Building BlocksPacket format*Message, Packet and Flit FormatsHong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Basic NoC Building Blocks NoC Queuing Schemes*Hong Kong University of Science and Technology, March 2010Output 0Output 1Output 2Output 3Input 1Input 0Input 3Input 3HOL blocking problem in Input Queuing

    1234

    1324

    1243

    4231

    Adaptive Systems Laboratory, Univ. of Aizu

  • Basic NoC Building Blocks Flow Control Schemes*TransmitterStopPacketPacketenableReceiverStop & Go Flow ControlGo thresholdStop thresholdBuffer is occupiedBuffer is releasedMinimum Buffer Size = Flit Size x ( Roverhead + Soverhead + 2 x Link delay)Roverhead: the required time to issue the stop signal at the received router

    Soverhead : the required time to stop sending a flit as soon as the stop signal is received Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Basic NoC Building Blocks Flow Control Schemes*

    TransmitterStopPacketenableReceiverCredit Based (CB) Flow ControlStop thresholdBuffer is releasedCredit is decremented A flit is transferredCredit is incremented CB makes the best use of channel buffers Can be implemented regardless of the link length of the sender & receiver overhead Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Basic NoC Building Blocks Queue and Buffer DesignEffective bandwidth of Data link later is influenced by the traffic pattern and Q sizeQ Buffers consume most of the area and power among all NoC building blocks. *Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Basic NoC Building Blocks Network InterfaceDifferent interface shall be connected to the networkThe network uses a specific protocol and all traffic on the network has to comply to the format of this protocol

    *Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Basic NoC Building Blocks Network InterfaceIn order to allow for different resources to connect to the network, the network interface can be divided intoA resource independent part (Network Interface) A resource dependent part (Resource Network Interface)

    *Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Basic NoC Building Blocks Bidirectional link 2 * 32-bits data linksAsynchronous, credit based flow controlEasy floorplan routing & timing in DSM process

    *Tag Data Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Basic NoC Building Blocks Router Design *Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Basic NoC Building Blocks Router design with cross point*Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Basic NoC Building BlocksScheduler Design*Round-roubin Algorithm circuits with 2 priorityHong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Basic NoC Building BlocksPhit Size DeterminationThe phit size is the bit width of a link and determines the switch area*PENIphit size = packet size/SERROperation freq = SERR* fNORMSwitchBuffersSER/DESHong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • NoC ExamplestherealDeveloped by PhilipsSynchronous indirect networkWH switchingContention-free source routing based on TDMGT as well as BE QoSGT slots can be allocated statically at initialization phase, or dynamically at runtimeBE traffic makes use of non-reserved slots, and any unused reserved slotsalso used to program GT slots of the routersLink-to-link credit-based flow control scheme between BE buffers to avoid loss of flits due to buffer overflow *Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • NoC ExamplesMoreHERMES - Developed at the Faculdade de Informtica PUCRS, BrazilMANGONostrum - Developed at KTH in StockholmOctagon - Developed by STMicroelectronicsQNoC - Developed at Technion in IsraelXpipes Developed by the Univ. of Bologna and Stanford UniversityOASIS Developed by the Adaptive Systems Lab, UoA, Japan (Our Group)

    *Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Part IINoC topologies Switching strategiesRouting algorithmsFlow control schemesClocking schemesQoSBasic Building Blocks Status and Open Problems *Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Status and Open ProblemsPowerComplex NI and switching/routing logic blocks are power hungryLatencyAdditional delay to packetize/de-packetize data at NIsFlow/congestion control and fault tolerance protocol overheadsDelays at the numerous switching stages encountered by packetsEven circuit switching has overhead (e.g. SOCBUS)Lack of tools and benchmarksSimulation speedGHz clock frequencies, large network complexity, greater number of PEs slow down simulation*Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • TrendsMove towards hybrid interconnection fabricsNoC-bus basedCustom, heterogeneous topologies

    New interconnect paradigmsOpticalWirelessCarbon nanotube

    *Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • NoC research communityAcademe and industry VLSI / CAD people Computer system architects Interconnect experts Asynchronous circuit experts Networking/Telecomm experts*Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Research TopicsSpeed enhancementNew router architectures (e.g. faster arbitration)Different asynchronous protocols employmentSupport of link varying capacityLow-cost Serialization/De-serializationStandardization of Network InterfacePacket construction/destructionTesting/Verification of A-NoC*Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • SummaryNoC is a scalable platform for billion-transistor chips Several driving forces behind it Many open research questions May change the way we structure and model VLSI systems*Hong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

  • Ben Abdallah AbderazekThe University of Aizu, Graduate school of Computer Science and Eng,Adaptive Systems Laboratory E-mail: [email protected]*Networks-on-ChipHong Kong University of Science and Technology, March 2010

    Adaptive Systems Laboratory, Univ. of Aizu

    ****Deadlock: A packet does not its destination, because it is blocked at some intermediate resource

    Livelock: A packet does not reach its destination, because it enters a cyclic path.

    Starvation: A packet does not reach its destination, because some resource does not grant access (wile it grants access to other packets).

    **Routing algorithms can be classified according to the three different criteria: (i) where therouting decisions are taken; (ii) how a path is defined, and (iii) the path length.

    According to where routing decisions are taken, it is possible to classify the routing insource and distributed routing. In source routing, the whole path is decided at the sourceswitch, while in distributed routing each switch receives a packet and defines the direction tosend it. In source routing, the header of the packet has to carry all the routing information,increasing the packet size [9]. In distributed routing, the path can be chosen as a function ofthe network instantaneous traffic conditions. Distributed routing can also take into accountfaulty paths, resulting in fault tolerant algorithms.Depending how a path is defined, routing can be classified as deterministic or adaptive. Indeterministic routing, the path is completely specified from the relative position of source andtarget addresses. In adaptive routing, the path is a function of the network instantaneoustraffic [4]. Adaptive routing increases the number of possible paths usable by a packet toarrive to its destination. However, deadlock and livelock situations can happen in fullyadaptive algorithms [8], which limit its usage.Regarding the path length criterion, routing can be minimal or nonminimal [8][9]. Minimalrouting algorithms guarantee shortest paths between source and target addresses. Innonminimal routing, the packet can follow any available path between source and target.Nonminimal routing offers great flexibility in terms of possible paths, but can lead to livelocksituations and increase the latency to deliver the packet.****Since in adaptive routing packets may arrive out of order,huge buffering space is needed to reorder them. This,together with the protocol overhead, leads to prohibitivecost overhead, extra delay and jitter**************************First, appropriate topology should be selected.For regular topology: we can have mesh, torus, tree, or Star OR an optimization can be carried out to build an application specific topology without regular patter. Second, Protocols including packet format, end-to-end services, and flow control should be defined an implemented in the NI module. 3rd , packet switching scheme. There are many methods, such as SAF, WH and CT switching

    SAF: Entire date of a packet at the incoming link are stored in the buffer for switching and forwarding. A buffer with a large capacity is required. The buffer size should be at least the size of a packet

    VCT: Virtual-cut-through (VCT) switching is similar to SAF switching, but it does not wait for a packet to be received in its entirety before making routing decisions. Transfer latency can be reduced by interpreting the header as soon as it is available, without waiting for the data payload to be received after the header. The packet is forwarded to the next router only when there is available buffer space for the entire packet, otherwise the packet is buffered at the local node. The buffer requirement and the cost for stalling are the same as SAF switching.

    WH: an incoming packet is forwarded right after the packet header is identified and the complete packet follows the header without any discontinuity. The path that the packet follows through the switch is blocked against access by other packets (reserved).

    In terms of implementation, SAF switching has the lowest associated complexity because each router node only needs sufficient logic to support local interpretation of the packet header, but SAF incurs the highest transfer latency. In contrast, both VCT and WH switching have lower transfer latencies than SAF switching, but have the logic overhead required to manage the remaining packet/flits that must follow the header. SAF and VCT switching have the drawback of a higher buffer requirement over WH switching, thus they may not be suited for applications where memory is expensive. The cost of stalling is the highest for WH switching because a packet blocks several nodes and links, while stalling only affects one router node for both VCT and SAF switching.

    ****Input Queue: Every incoming link has a single input Queue so that N queues are necessary for NxN switches. Problem: IQ suffer from the head of line HoL blocking problem. HOL arises when packets arriving at different input ports are destined for the same output portOvercoming HoL: One way this drawback is overcome is by using virtual output queues. Virtual Output Queues (VOQ) are an input queuing strategy in which each input port maintains a separate queue for each output port. It has been shown that VOQ can achieve 100% throughput performance with an effective scheduling algorithm. This scheduling algorithm should be able to provide a high speed mapping of packets from inputs to outputs on a cycle-to-cycle basis.Output Queuing **The last issue in the building block figure is the flow control or congestion control. There are several solutions that prevent packets from causing output and buffer overflow.Packet discarding: Once the buffer overflows, the packets coming again are simply dropped off. Credit based: **************************************