Fault-Tolerant and Secure Architectures for On-Chip...

Preview:

Citation preview

Fault‐Tolerant and Secure Architectures for On‐Chip Networks With Emerging Interconnect Technologies

Mohsin Y AhmedConlan Wesson

OverviewOverview• NoC: Future generation

of many core processor on a single chip

• Current multicoreprocessor cores communicate over shared bus.

• Only one core can send a message at a time.

• Limited number of cores.

Overview (Contd.)Overview (Contd.)• NoC allows for more cores i.e. ensuring scalability.• Multiple cores to send messages simultaneously.• Somewhat similar to computer network.

“Route Packets, Not Wires”-William J. Dally, Stanford University, NVIDIA

Interconnect TechnologyInterconnect Technology• The shared medium arbitrated bus: most frequently

used on-chip interconnect architecture.• All communication devices share the same

transmission medium.• Advantages –

o simple topologyo low area costo extensibility

Interconnect Technology Interconnect Technology (Contd.)(Contd.)

• Disadvantages –o High intrinsic parasitic

resistance and capacitance

o Increased delay in bit transfer with increase in processing elements, eventually exceed the targeted clock period

o Limits the system scalability

Novel Novel NoCNoC ArchitecturesArchitectures• A network-on-chip (NoC) resembles the

interconnect architecture of high-performance parallel computing systems.

• The functional IP blocks communicate with each other with the help of intelligent switches.

• NoC allows the decoupling of the processing elements (i.e., the IPs) from the communication fabric (i.e., the network).

• Employs explicit parallelism, exhibits modularity to minimize the use of global wires, and utilizes locality for power minimization.

SPIN SPIN • SPIN – Scalable,

Programmable, Integrated Network.

• Uses a fat-tree architecture.

• Every node has four children and the parent is replicated four times at any level.

BFTBFT• BFT - Butterfly Fat-Tree.• The IPs are placed at

the leaves and switches placed at the vertices.

• At each subsequent level, the number of required switches reduces by a factor of 2.

CLICHECLICHE• CLICHE (Chip-Level

Integration of Communicating Heterogeneous Elements.

• Consists of an m x n mesh of switches interconnecting computational resources (IPs).

• Every switch, except those at the edges, is connected to four neighboring switches and one IP block.

2D Torus2D Torus• Basically the same as a

regular mesh.• Only difference is that

the switches at the edges are connected to the switches at the opposite edge through wrap-around channels.

• Long end-around connections can yield excessive delays.

Folded TorusFolded Torus• The long end around

delay can be avoided by folding the torus.

• This renders to a more suitable VLSI implementation.

OctagonOctagon• Communication

between any pair of nodes takes at most two hops within the basic octagonal unit.

• Each functional IP has dedicated switch.

SWITCHING METHODOLOGIESSWITCHING METHODOLOGIES

• Switching techniques determine –o When and how internal switches connect their inputs to outputso The time at which message components may be transferred along these

paths

• Different types of switching techniques –o Circuit Switching,o Packet Switchingo Wormhole Switching

Circuit SwitchingCircuit Switching• A physical path from source to destination is

reserved prior to the transmission of the data.• The path is held until all the data has been

transmitted.• Network bandwidth is reserved for the entire

duration of the data.• Valuable resources are also tied up for the duration

of the transmitted data.• Set up of an end-to-end path may cause

unnecessary delays.

Packet SwitchingPacket Switching• Data is divided into fixed-length blocks called

packets.• Whenever the source has a packet to be sent, it

transmits the data.• The need for storing entire packets in a switch in

case of conventional packet switching makes the buffer requirement high.

• In an NoC environment, the requirement is that switches should not consume a large fraction of silicon area compared to the IP blocks.

Wormhole SwitchingWormhole Switching• Packets are divided into fixed length flow control

units (flits).• The input and output buffers are expected to store

only a few flits.• The buffer space requirement in the switches is small

i.e. the switches are small and compact.• The first flit, i.e., header flit, of a packet contains

routing information.• Header flit decoding enables the switches to

establish the path and subsequent flits simply follow this path in a pipelined fashion.

Wormhole Switching Wormhole Switching (Contd.)(Contd.)

• Each incoming data flit of a message packet is simply forwarded along the same output channel as the preceding data flit.

• No packet reordering is required at destinations• Drawback-

o Transmission of distinct messages cannot be interleaved or multiplexed.o Messages must cross the channel in their entirety before the channel can

be used by another message.o Decrease channel utilization if a flit from a given packet is blocked in a

buffer.

Wormhole Switching Wormhole Switching (Contd.)(Contd.)

• By introducing virtual channels in the input and output ports, channel utility can be increased considerably.

• If a flit belonging to a particular packet is blocked in one of the virtual channels, then flits of alternate packets can use the other virtual channel buffers.

NoCNoC PERFORMANCE PERFORMANCE METRICSMETRICS

• It is desirable that an NoC interconnect architecture exhibits high throughput, low latency, energy efficiency, and low area overhead.

• In today’s power constrained environments, it is increasingly critical to be able to identify the most energy efficient architectures and to be able to quantify the energy-performance trade-offs.

Message ThroughputMessage Throughput• Message throughput is measured as the fraction of

the maximum load that the network is capable of physically handling.

• Throughput 1 corresponds to all end nodes receiving one flit every cycle.

• Measured in flits/cycle/IP.

Transport LatencyTransport Latency• Defined as the time (in clock cycles) that elapses

from between the occurrence of a message header injection into the network at the source node and the occurrence of a tail flit reception at the destination node.

• Depending on the source/destination pair and the routing algorithm, each message may have a different latency.

Experimental ResultsExperimental Results

Experimental Results Experimental Results (Contd.)(Contd.)

Experimental Results Experimental Results (Contd.)(Contd.)

Wireless Wireless NoCNoC

• Replacement of some long wired lines by RF wireless links.

• On chip Carbon NanoTube (CNT) antennas.

• Long range wireless links, short wire-line links.

Wireless Wireless NoCNoC ArchitectureArchitecture• The WiNoC architecture

is based on the “Small World” property.

• Networks with the small world property have a very small average path length.

• A small-world topology can be constructed from a locally connected network by rewiring connections randomly to any other node, which creates short-cuts in the network.

Scale Free NetworksScale Free Networks• Maximum nodes have

low degree.

• Few nodes have very high degree.

Wireless Wireless NoCNoC Architecture Architecture (Contd.)(Contd.)

• The whole system is divided into multiple small clusters of neighboring cores called “subnets”.

• The cores in a subnet are connected to a centrally located hub through direct links.

• The hubs from all subnets are connected in a 2nd level network.

• Due to limitations of wireless links, a few wireless links are distributed between hubs separated by relatively long distances.

WiNoCWiNoC Experimental Experimental ResultsResults

WiNoCWiNoC Experimental Experimental Results Results (Contd.)(Contd.)

NoCNoC SecuritySecurity• It is likely to have cores and other devices of

different manufacturers embedded on a single chip.

• Makes vulnerable to hardware Trojans.• Malicious Trojans try to bypass or disable the security

fence of a system.• It can continuously broadcast garbage data, leak

confidential information by radio emission, or route flits in wrong directions or even tamper the flits.

• As soon as a hardware Trojan is detected in a system, it may required to remove from the system immediately with minimum effect on the system.

Fault Tolerant Fault Tolerant NoCNoCArchitectureArchitecture

• We performed a study to find a NoC architecture which would show maximum fault tolerance in case of a node deletion.

• Study performed on both Mesh and Small World topologies.

• For the small world topology, we devised an algorithm for finding an attack tolerant architecture by iteratively reorganizing the initial topology.

Routing AlgorithmRouting Algorithm• Dijkstra’s shortest path

routing is adopted for routing the SW NoC.

• This graph search algorithm solves the single-source shortest path problem for a graph with nonnegative edge path costs, producing a shortest path tree.

Optimal Fault Tolerant Optimal Fault Tolerant ArchitectureArchitecture

• The attack tolerant architecture is achieved by applying an algorithm based on Simulated Annealing.

• Specific cores in the small world topology are attacked i.e. they are isolated from all their neighbors so that they can neither send nor receive flits.

• The topology is reorganized iteratively until convergence of throughput by reordering one of its existing link.

Simulated Annealing Simulated Annealing MetricsMetrics

• M = ∑ �(i, j) d (i, j) / N(N-1), where i, j are NoC cores, d(i, j) are their shortest path distance according to Dijkstra’s algorithm and N is the total number of cores in the system.

• ρ = dM/ dL , where L is the number of levels of neighbors up to which a core is attacked.

• The objective is to minimize ρ to find an optimal solution.

Simulated Annealing AlgorithmSimulated Annealing Algorithm

Initial Network Setup

Current Network = Initial 

network 

Compute Metric for Current Network, ρ

Generate New Network 

Configuration, Compute new 

Metric ρ’

Rendomlypick & 

rewire 1 link

Dijkstra Routing Algorithm

ρ’< ρ?

Generate uniform random number r in [0, 

1]

Current Network = 

New network 

Reached convergence?

Optimal network 

configuration

itr * e (ρ ‐ ρ’) >

r ?

yes

no

yes

no

yes

no

Simulation ResultsSimulation Results

Questions?Questions?

THANK YOU

Recommended