15
Design of Low Power & Reliable Networks on Chip Through Joint Crosstalk Avoidance and Multiple Error Correction Coding Amlan Ganguly & Partha Pratim Pande & Benjamin Belzer & Cristian Grecu Received: 27 November 2006 / Accepted: 4 August 2007 # Springer Science + Business Media, LLC 2007 Abstract Network on Chip (NoC) is an enabling method- ology of integrating a very high number of intellectual property (IP) blocks in a single System on Chip (SoC). A major challenge that NoC design is expected to face is the intrinsic unreliability of the interconnect infrastructure under technology limitations. Research must address the combination of new device-level defects or error-prone technologies within systems that must deliver high levels of reliability and dependability while satisfying other hard constraints such as low energy consumption. By incorpo- rating novel error correcting codes it is possible to protect the NoC communication fabric against transient errors and at the same time lower the energy dissipation. We propose a novel, simple coding scheme called Crosstalk Avoiding Double Error Correction Code (CADEC). Detailed analysis followed by simulations with three commonly used NoC architectures show that CADEC provides significant energy savings compared to previously proposed crosstalk avoid- ing single error correcting codes and error-detection/ retransmission schemes. Keywords Network on Chip . Crosstalk avoidance . Multiple error correction . Low power . Transient errors . Joint codes 1 Introduction and Motivation Current commercial designs integrate from 10 to 100 embedded functional and storage blocks in a single system-on-chip (SoC), and the number is likely to increase significantly in the near future [2, 13]. Network on chip (NoC) is viewed as a revolutionary methodology to achieve such a high degree of integration in a single SoC. According to the Interna- tional Technology Roadmap for Semiconductors (ITRS) [10], signal integrity is expected to be an increasingly critical challenge in designing SoCs. The widespread adop- tion of the NoC paradigm will be possible if it addresses system level signal integrity and reliability issues in addition to easing the design process, and meeting all other constraints and objectives. With shrinking feature size, one of the major factors affecting signal integrity is transient errors, arising due to temporary conditions of the SoC and environmental factors. Among the transient failure mech- anisms are crosstalk, electromagnetic interference, alpha particle hits, cosmic radiation, etc. [7, 15]. These failures can alter the behavior of the NoC fabrics and degrade the signal integrity. Providing resilience against such failures is critical for the operation of NoC-based chips. There are many ways to achieve signal integrity. Among different practical methods, use of new materials for device and interconnect, and tight control of device layouts may be adopted in the NoC domain. Here we propose to tackle this J Electron Test DOI 10.1007/s10836-007-5035-1 Responsible Editor: N. A. Touba A. Ganguly (*) : P. P. Pande : B. Belzer School of Electrical Engineering & Computer Science, Washington State University, PO BOX 642752, Pullman, WA, USA e-mail: [email protected] P. P. Pande e-mail: [email protected] B. Belzer e-mail: [email protected] C. Grecu SoC Research Lab, University of British Columbia, 2332 Main Mall, Vancouver, BC V6T 1Z4, Canada e-mail: [email protected]

Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

Design of Low Power & Reliable Networks on Chip ThroughJoint Crosstalk Avoidance and Multiple ErrorCorrection Coding

Amlan Ganguly & Partha Pratim Pande &

Benjamin Belzer & Cristian Grecu

Received: 27 November 2006 /Accepted: 4 August 2007# Springer Science + Business Media, LLC 2007

Abstract Network on Chip (NoC) is an enabling method-ology of integrating a very high number of intellectualproperty (IP) blocks in a single System on Chip (SoC). Amajor challenge that NoC design is expected to face is theintrinsic unreliability of the interconnect infrastructureunder technology limitations. Research must address thecombination of new device-level defects or error-pronetechnologies within systems that must deliver high levels ofreliability and dependability while satisfying other hardconstraints such as low energy consumption. By incorpo-rating novel error correcting codes it is possible to protectthe NoC communication fabric against transient errors andat the same time lower the energy dissipation. We propose anovel, simple coding scheme called Crosstalk AvoidingDouble Error Correction Code (CADEC). Detailed analysisfollowed by simulations with three commonly used NoCarchitectures show that CADEC provides significant energysavings compared to previously proposed crosstalk avoid-

ing single error correcting codes and error-detection/retransmission schemes.

Keywords Network on Chip . Crosstalk avoidance .

Multiple error correction . Low power . Transient errors .

Joint codes

1 Introduction and Motivation

Current commercial designs integrate from 10 to 100 embeddedfunctional and storage blocks in a single system-on-chip(SoC), and the number is likely to increase significantly inthe near future [2, 13]. Network on chip (NoC) is viewed asa revolutionary methodology to achieve such a high degreeof integration in a single SoC. According to the Interna-tional Technology Roadmap for Semiconductors (ITRS)[10], signal integrity is expected to be an increasinglycritical challenge in designing SoCs. The widespread adop-tion of the NoC paradigm will be possible if it addressessystem level signal integrity and reliability issues inaddition to easing the design process, and meeting all otherconstraints and objectives. With shrinking feature size, oneof the major factors affecting signal integrity is transienterrors, arising due to temporary conditions of the SoC andenvironmental factors. Among the transient failure mech-anisms are crosstalk, electromagnetic interference, alphaparticle hits, cosmic radiation, etc. [7, 15]. These failurescan alter the behavior of the NoC fabrics and degrade thesignal integrity. Providing resilience against such failures iscritical for the operation of NoC-based chips. There aremany ways to achieve signal integrity. Among differentpractical methods, use of new materials for device andinterconnect, and tight control of device layouts may beadopted in the NoC domain. Here we propose to tackle this

J Electron TestDOI 10.1007/s10836-007-5035-1

Responsible Editor: N. A. Touba

A. Ganguly (*) : P. P. Pande :B. BelzerSchool of Electrical Engineering & Computer Science,Washington State University,PO BOX 642752, Pullman, WA, USAe-mail: [email protected]

P. P. Pandee-mail: [email protected]

B. Belzere-mail: [email protected]

C. GrecuSoC Research Lab, University of British Columbia,2332 Main Mall,Vancouver, BC V6T 1Z4, Canadae-mail: [email protected]

Page 2: Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

problem at the design stage. Instead of depending on post-design methods, we propose to incorporate correctiveintelligence in the NoC design flow. This will help toreduce the number of post-design iterations. The correctiveintelligence can be incorporated into the NoC data streamby adding error control codes to decrease vulnerability totransient errors. The basic operations of NoC infrastructuresare governed by on-chip packet-switched networks. AsNoCs are built on packet-switching, it is easy to modify thedata packets by adding extra bits of coded information inspace and time to protect against transient malfunctions.

In the face of increased gate counts, designers arecompelled to reduce the power supply voltage to keepenergy dissipation to a tolerable limit, thus reducing noisemargins [7]. The interconnects become more closelypacked and this increases mutual crosstalk effects. Fasterswitching can also cause ground bounce. The switchingcurrent can cause the already low supply voltage toinstantaneously go even lower, thus causing timing viola-tions. All these factors can cause transient errors in the ultradeep submicron (UDSM) era [7]. Crosstalk is a prominentsource of transient malfunction in NoC interconnects.Crosstalk avoidance coding (CAC) schemes are effectiveways of reducing the worst-case switching capacitance of awire by ensuring that a transition from one codeword toanother does not cause adjacent wires to switch in oppositedirections. Though CACs are effective in reducing mutualinter-wire coupling capacitance, they do not protect againstany other transient errors. To make the system robust, inaddition to CAC we need to incorporate forward error-correction coding (FEC) into the NoC data stream. Amongdifferent FECs, single error correcting codes (SECs) are thesimplest to implement. But aggressive supply–voltagescaling and increases in deep sub-micron noise in future-generation NoCs will prevent SECs from satisfyingreliability requirements. Hence, we investigate performanceof joint CAC and multiple error correcting codes (MECs) inNoC fabrics. The main contributions of this work are thedesign of a novel but simple joint CAC/MEC mechanism,and the establishment of a performance benchmark for thisscheme with respect to other existing coding methods.

2 Related Work

In recent years, there has been an evolving effort indeveloping on-chip networks to integrate increasingly largenumber of functional cores in a single die [2, 13]. But evenbefore the advent of the NoC paradigm, different researchgroups investigated various coding schemes to enhance thereliability of bus-based systems. In [32] the authorsproposed to employ data encoding to eliminate crosstalkdelay within a bus. They presented a detailed analysis of

the self-shielding codes and established fundamental theo-retical limits on the performance of codes with and withoutmemory. In [25], the authors provided a comprehensivestudy of the usefulness of error correcting codes to reducethe crosstalk-induced bus delay (CIBD), and proved thatDual Rail codes perform better than Hamming codes. Theauthors of [25] used single error correcting codes (SECs) tominimize crosstalk. These codes are not as efficient asCACs to handle crosstalk related issues. In addition,different low-power coding (LPC) techniques have beenproposed to reduce power consumption of on-chip buses[31]. But these LPCs aim at reducing only the self-transition in a wire. According to [11], the principallimitation of the applicability of the LPCs is that, due tohigher power dissipation in the codec blocks, these codesare energy efficient only if the length of the wire segmentexceeds a certain limit. In [30] the authors presented aunified framework for applying coding for systems on chips(SoCs), but targeted principally for bus-based systems.

In [4, 5], performance of single error correcting andmultiple error detecting Hamming codes and cyclic codesin an AMBA bus-based system has been discussed. Theenergy efficiency and the area overhead of the codecs havealso been discussed. These papers conclude that errordetection followed by retransmission is more energyefficient than forward error correction (FEC) schemes.Error resiliency in NoC fabrics and the trade-offs involvedin various error recovery schemes are discussed in [16]. Inthis work, the authors investigated performances of simpleerror detection codes like parity or cyclic redundancy checkcodes and single error-correcting, multiple error-detectingHamming codes in NoC fabrics. The basic principle of thiswork is similar to that of [5]: the receiver corrects only asingle bit error in a flow-control-unit (flit), but for morethan one error, it requests retransmission from the sender.As mentioned in the concluding remarks of [5], in the ultradeep submicron (UDSM) domain communication energywill overcome computation energy. Retransmission willgive rise to multiple communications over the same linkand hence ultimately will not be very energy efficient.Moreover retransmission will introduce significant commu-nication latency. In systems dominated by retransmissionsome additional error correction mechanisms for the controlsignals need to be incorporated also. Moreover, these codesdo not have any crosstalk avoidance characteristics, whichare absolutely necessary in the deep submicron (DSM)technology nodes. The role of communication infrastructureof NoCs on energy dissipation is discussed in [18].Different strategies for power management for NoCs, suchas power-aware on-off networks [28], and dynamic voltagescaling [27] have been addressed previously. Application ofCAC and joint CAC/SEC in NoCs is also discussed in [17,19, 20]. In this work we propose a novel joint CAC/MEC

J Electron Test

Page 3: Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

and compare and contrast its performance in NoC archi-tectures with other existing coding schemes.

3 Data Coding in NoC Links

The common characteristic of NoC architectures is that thefunctional IP blocks communicate with each other viaintelligent switches. The data communication between IP’sin a NoC takes place in the form of packets routed througha wormhole switching mechanism. The packets are brokendown into fixed length flow control units or flits. Theswitch blocks need to store only a few flits [6, 18]. Theheader flits carry the relevant routing information. Conse-quently header decoding enables the establishment of a paththat the subsequent payload flits simply follow in apipelined fashion. The transmitted flits are encoded toguard against possible transient errors.

The incorporation of CACs reduces the mutual switchingcapacitance of the inter-switch wire segments. Though thishelps in reducing the energy dissipation in communication,the energy reduction is only linear with the capacitancedecrease. On the other hand, incorporation of errorcorrection codes makes the system more robust, so thatthe voltage level driving the system can be reduced withoutcompromising bit error rates. This makes joint crosstalk-avoidance and error correction codes more suitable forlowering the energy dissipation of on-chip communicationinfrastructures.

There are a few joint crosstalk avoidance and single errorcorrection codes (CAC/SEC) proposed by different researchgroups. Among these joint codes, the Dual Rail (DR) Code[23, 24] or Duplicate Add Parity (DAP) [30], BoundaryShift Code (BSC) [22] and Modified Dual Rail Code(MDR) [26] reduce the switching capacitance associatedwith crosstalk from (1+41 )CL to (1+21 )CL [29], where 1is the ratio of the coupling capacitance to the bulkcapacitance and CL is the load capacitance, including theself-capacitance of the wire.

However, due to intensive integration and deviceshrinkage in the UDSM era, single error correction willnot be sufficient to protect against different transientmalfunctions. Hence there is a need for multiple errorcorrection schemes. We propose a novel, simple jointcrosstalk avoidance and double error correction schemecalled crosstalk avoiding double error correction code(CADEC). We investigate the performance of CADEC incomparison with the various existing joint CAC/SECschemes in different NoC architectures. One point worthnoting here is that, according to [4, 5], error detectionfollowed by retransmission is a more energy efficientscheme than the error correction. To establish the perfor-mance benchmark for the CADEC scheme, we compare its

performance with error detection (ED) codes also. All theerror correcting schemes considered here are enhanced with aretransmission mechanism which is activated when thenumber of errors in the flit exceeds its correction capability.With increase in the correction capability of a code theprobability of retransmission will reduce significantly. For thesake of fair comparison, the same retransmission mechanismis considered across all codes namely, switch-to-switch flitlevel retransmission (s2sf) [16]. As suggested in [16], an end-to-end retransmission mechanism can also be adopted. Butas our emphasis is on characterizing the coding schemes it issufficient to assume a single retransmission method.

Below we explain the basic principles of all the codingschemes considered in this work.

3.1 DAP and MDR Schemes

The Duplicate Add Parity (DAP) scheme achieves jointcrosstalk avoidance and single error correction capability byduplicating each bit of the n-bit flit and placing the copiesadjacent to each other to avoid crosstalk, and by alsocomputing a parity bit from the initial bits to enable singleerror correction. Thus, the encoded flit becomes 2n+1 bitswide [23, 30]. Modified Dual Rail (MDR) code is a simplemodification of the DR/DAP scheme, where a second copyof the parity bit is transmitted to guard against crosstalk onthe parity bit itself [26]. Thus, the MDR encoded flit is 2n+2bits wide. The encoder and decoder of the DAP scheme areshown in Fig. 1. The MDR scheme is very similar andtherefore is not shown separately.

3.2 BSC Scheme

Boundary Shift Coding (BSC) is achieved by avoiding ashared boundary between two successive code words [22].The scheme duplicates each bit and computes an overallparity, and then each alternate code word is given a cyclicshift in a way that appends the parity bit to either the rightor the left of the flit after duplication. The decodingmechanism is the same as in DAP, after carefully extractingthe parity bit from the flit depending on whether it is therightmost or the leftmost bit of the flit. The encoder anddecoder for the BSC scheme are shown in Fig. 2.

3.3 Error Detection Code—ED

This scheme implements Hamming code for error detectionand retransmits if the scheme detects that the flit is in error[4]. As an example, the (38, 32) shortened Hamming codeimplemented for a 32 bit wide flit can reliably detect up totwo errors in the flit. The ED scheme only detects the errors;on detection of any error pattern, it sends an automatic repeatrequest (ARQ) signal for retransmission of the flit. The

J Electron Test

Page 4: Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

encoder is essentially only a (38, 32) Hamming encodingblock. The decoder is also a standard syndrome decoder forthe Hamming encoded flit. Evidently, this scheme does nothave any crosstalk avoidance properties.

3.4 Crosstalk Avoiding Double Error Correction Scheme

Compared with Hamming codes, standard double errorcorrection codes like BCH codes are computationallycomplex, and therefore are not very efficient from theperspective of energy reduction and area cost. In this work,we design a novel scheme which is capable of joint crosstalkavoidance and double error correction. We call this schemeCADEC coding. The encoder and decoder for the CADECscheme are described in the following subsections.

Encoder The encoder is a simple combination of Hammingcoding followed by DAP or BSC encoding to provideprotection against crosstalk. As shown in Fig. 4a, theincoming 32-bit flit is first encoded using a standard (38,32) shortened Hamming code, then each bit of the 38-bitHamming codeword is duplicated, and an overall paritycalculated from one Hamming copy is appended. The (38,32) Hamming code has a Hamming distance of 3 betweenadjacent code words. On duplication this becomes 6 andafter adding the extra parity bit this distance becomes 7. AHamming distance of 7 enables triple error correction, butat a somewhat higher complexity cost than the double-errorcorrecting schemes considered here. Consequently, as a firststep we considered only the double error correctioncapability. The extra parity bit, which is a part of DAP or

Fig. 2 a BSC Encoder; b BSCdecoder

x0

1

0

1

0

1

0

1

0

1

0

CLK

ENCODER

ENABLE

x1

x2

x3

y0

y1

y2

y3

y4

y5

y6

y7

y8

y0

y1

y2

y3

y4

y5

y6

y7

y8

CLK

DECODER

ENABLE

x0

x1

x2

x3

1

0

0

1

1

0

0

1

1

0

1

0

0

0

1

1

0

1

a b

Fig. 1 a DAP encoder; b DAPdecoder x0

x1

x2

x3

y0 y0x0

x1

x2

x3

y1

y2

y3

y4

y5

y6

y7

y8

y1

y2 y3

y4 y5

y6 y7

y8

DECODER

ENABLE

1

0

1

1

10

0

0

ba

J Electron Test

Page 5: Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

BSC schemes, is added to make the decoding process veryenergy efficient as explained below.

Decoder The decoding procedure for the CADEC encodedflit can be explained with the help of the flow diagramshown in Fig. 3. The decoding algorithm consists of thefollowing simple steps:

1. The parity bits of the individual Hamming copies arecalculated and compared with the sent parity;

2. If these two parities obtained in step 1 differ, then thecopy whose parity matches with the transmitted parityis selected as the output copy of the first stage.

3. If the two parities are equal, then any one copy is sentforward for syndrome detection.

4. If the syndrome obtained for this copy is zero then thiscopy is selected as the output of the first stage.Otherwise, the alternate copy is selected.

5. The output of the first stage is sent for (38, 32) singleerror correcting Hamming decoding, finally producingthe decoded CADEC output.

The circuit implementing the decoder is schematicallyshown in Fig. 4(b).

The use of the DAP or BSC parity bit effectively makesthe decoder more energy efficient, compared to a schemewithout the parity bit, which always requires a syndrome tobe computed on both copies.

When the parity bits generated from individual Ham-ming copies fail to match, the syndrome computing blockneed not be used at all, thus on average making the overalldecoding process more energy efficient. This situationarises when there is single error in either one of the twoHamming copies, which, generally, will be the moreprobable case. We note that the circuit diagram of Fig. 4and the flowchart of Fig. 3 show only the logic for doubleerror correction. To simultaneously detect triple or quadru-ple errors, one additional syndrome computation step mustbe performed on the copy selected for the final stage; if thatcopy has a non-zero syndrome, then there are three or moreerrors in the codeword, and an ARQ request to retransmitthe flit should be sent.Fig. 3 Decoding Algorithm for the CADEC scheme

0

1

1

0

Syndrome Detection

0

1

38,32 HAM

DECODE

77 bits i/p

38

38

32 bito/p

38

38

38

38

38

38

Parity from 1st

copy, p1

Parity from 2nd

copy, p2

Sent parity, p0

38

38

b

(a)

Fig. 4 a CADEC encoder. b CADEC decoder

J Electron Test

Page 6: Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

4 Probability of Undetected Error

In the DSM NoC paradigm, reliability and energy dissipa-tion cannot be decoupled. Enhancing reliability byperforming coding invariably increases the energy overheaddue to the codec blocks and redundant wires. But due toincreased reliability, the voltage level driving the intercon-nect wires can be reduced without increasing the probabil-ity of residual word error, as the reduction in noise margincan be compensated by the increased error resilience [5,30]. Considerable energy savings can be achieved byreducing the voltage level on the interconnects, since theenergy dissipation depends on the voltage squared.

To quantify these gains, consider a Gaussian distributednoise voltage VN with variance s2

N which models thecumulative effect of all the transient DSM noise sources asmentioned before. This gives the probability of bit error, ɛ,also called the bit error rate (BER) as

" ¼ QVdd

2σN

� �; ð1Þ

where the Q-function is given by

Q xð Þ ¼ 1ffiffiffiffiffi2p

pZ1x

e�y2

2 dy: ð2Þ

The word error probability is a function of the channelBER ɛ. If Punc (ɛ) is the probability of word error in theuncoded case and Pecc (ɛ) is the residual probability ofword error with error control coding, then it is desirable thatPecc "ð Þ � Punc "ð Þ. Using Eq. 1, we can reduce the supplyvoltage in presence of coding to bVdd , given by

bVdd ¼ VddQ�1 b"ð ÞQ�1 "ð Þ : ð3Þ

In (3), Vdd is the nominal supply voltage in the absenceof any coding. To compute bVdd for various schemes we findthe residual word error probability for each of the schemesinvestigated in this paper.

4.1 Probability of Undetected Error for ED

As pointed out in [12], any (n, k) linear code can detect2n−2k error patterns of length n. The probability ofundetected error for any (n, k) linear code can be computedfrom the weight distribution polynomial of the code, A (z),given by

A zð Þ ¼ A0 þ A1zþ . . .þ Anzn; ð4Þ

where Ak is the number of codewords with weight (i.e., thenumber of 1s in the codeword) equal to k. The dual of the

linear code also has an associated weight distribution, B(z),given by

B zð Þ ¼ B0 þ B1zþ . . .þ Bnzn: ð5Þ

The weight distribution of the original code and its dualcode are related by [12, 14]

A zð Þ ¼ 2� n�kð Þ 1þ zð ÞnB 1� z

1þ z

� �: ð6Þ

The probability of undetected word error PED(ɛ) for anerror detection scheme using a linear code with dual weightdistribution B(z) is [12]

PED "ð Þ ¼ 2� n�kð ÞB 1� 2"ð Þ � 1� "ð Þn; ð7Þ

where B(1−2ɛ) is given by

B 1� 2"ð Þ ¼Xni¼0

Bi 1� 2"ð Þi: ð8Þ

The ED scheme proposed in [4] uses the (38, 32)shortened Hamming code for error detection, so thecoefficients Bi in Eq. 8 are obtained by using the H-matrixof that code. Using Eq. 7, the probability of undetectederror for the ED code, for small values of BER ɛ, turns outto be

PED "ð Þ ¼ n� kð Þ"2 ð9Þ

where n=38 and k=32 for the (38,32) shortened Hammingcode.

4.2 Probability of Undetected Error for DAP, BSCand MDR

The DAP coding scheme can correct all single errorpatterns and some multiple errors, which are taken intoaccount while calculating the probability of undetectederror. Let the original uncoded flit consist of k bits (weassume k=32 here). This makes the length of the DAPencoded flit to be (2k+1) bits. Correct decoding can happenunder two circumstances as discussed below:

1. The parity bit is error-free and one copy of the flit hasno errors. The other copy in this case can have anynumber of erroneous bits. However, the parity has to beregenerated at the decoder only from the copy that iserror-free. So, this possibility is not interchangeablebetween the two copies.

2. The other possibility for correct decoding is when theparity bit is in error. Then, if the (k+1) bits consisting

J Electron Test

Page 7: Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

of the copy from which the parity is regenerated andthe sent parity have an odd number of errors, then theregenerated parity will not match the sent parity and theother copy which is error-free will be selected.

These two cases jointly give the set of cases wherecorrect detection is possible, whose complement is the setof undetected errors. The probability of the set ofundetected errors as computed in [30] is given by

PDAP "ð Þ ¼ 1�Xki¼0

ki

� �"i 1� "ð Þ2k�1þi

�Xk

2

i¼0

2kþ12iþ1

� �"2iþ1 1� "ð Þ2k�2i: ð10Þ

This can be simplified to the following expression forsmall values of ɛ

PDAP "ð Þ ¼ 3k k þ 1ð Þ2

"2: ð11Þ

BSC and MDR, which perform the decoding followingan essentially similar principle, have the same probabilityof word error as DAP.

4.3 Probability of Undetected Error for CADEC

The probability of correct decoding can be found byconsidering each of the cases where the decoder cancorrectly decode flits despite errors. The cases where thedecoder can correctly decode words with more than twoerrors also need to be considered. The complement of theset of correctly decoded words constitutes the set ofundetected errors. This probability is given by PCADEC

(ɛ). So, we have the relation:

PCADEC "ð Þ ¼ 1� Pcorrect: ð12Þ

In the following derivation, the width of the original flitis denoted by k, where k is 32, which is first Hammingcoded to 38 bits, denoted by n. Each bit of the n-bitHamming codeword is duplicated and an overall parity bitis appended. All possibilities of correct decoding arebroadly divided into three categories:

1. Error-free transmitted parity bit:One of the copies has no error while the other has

anywhere from zero to all bits in error. This can be correctlydecoded similarly as in the DAP scheme which is integratedinto the novel CADEC scheme.2. Single bit error in each copy:

There is a single error in both copies, irrespective of theparity-bit being in error or not.

3. Erroneous transmitted parity bit: There are multiplecases under this scenario

& no errors in either copy& up to one error in one copy and an even number of

errors in the other starting from 2 to n errors& a single error in one copy and an odd number of

errors in the other.

The complete probability of correct decoding, Pcorrect isgiven by the sum of the probabilities corresponding to theabove mutually exclusive cases. In the limit of smallchannel BER ɛ, this can be expressed as

Pcorrect ¼ 1� n2 n� 4ð Þ"3: ð13Þ

From Eqs. 12 and 13, the word error probability is

PCADEC "ð Þ ¼ n2 n� 4ð Þ"3: ð14Þ

Using Eq. 3, along with Eqs. 9, 11, and 14 for theundetected word error probabilities for the different codingschemes, the tolerable voltage swing reduction can becomputed against varying values of BER ɛ. The plot ofvoltage swing versus BER is shown in Fig. 5. The nominalvoltage at the 130 nm technology node is assumed to beVdd=1.2 V.

As can be seen from Fig. 5, the voltage swing is lowerthan the nominal voltage for all the coding schemes. TheCADEC scheme provides maximum voltage reduction as itcan correct and also detect more errors than the others. Forthe purpose of simulations the voltage swings for differentcoding schemes corresponding to the channel BER of 10−20

[30] are used later in the paper.

10-20

10-15

10-10

10-5

0.4

0.8

1.2DAPCADECED

Vdd

(V

)

εFig. 5 Variation of achievable voltage swing with bit error rate fordifferent coding schemes

J Electron Test

Page 8: Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

5 Energy Dissipation in a NoC Interconnect

When flits travel between switches on the interconnectionnetwork, both the inter-switch wires and the logic gates inthe switches toggle, resulting in energy dissipation. Fur-thermore, flits will need to traverse multiple hops to reachtheir destinations. To determine the energy efficiency of ourproposed scheme, we need to determine the energydissipated in each interconnect and switch hop. The energyper flit per hop is given by the sum

Ehop ¼ Eswitch þ Einterconnect: ð15Þ

The energy dissipated in transporting a flit throughh hops can be calculated as

Eflit ¼Xhj¼1

Ehop; j: ð16Þ

In the presence of coding, the energy dissipated in eachhop will be given by

Ehop; j ¼ Eswitch; j þ Ecodec; j þ Einterconnect; j: ð17Þ

In order to quantify the energy dissipation profile for aNoC interconnect architecture, we determined the energydissipated in each switch, Eswitch, and each codec, Ecodec byrunning Synopsys™ Prime Power on the gate-level netlistof the switch and the codec blocks. Our energy estimationmethodology involved feeding a large set of data patterns tothe switch and codec blocks. Through functional simulationusing Synopsys Prime Power, the average values for theactivity factors were determined. To determine the inter-connect energy Einterconnect, the capacitance of each inter-connect stage was calculated taking into account thespecific layout of each topology [8, 9]. In presence ofcoding the effective coupling component of this capacitanceis however reduced [17]. Thus, in addition to the voltagereduction due to increased reliability, the effective wirecapacitance is also reduced. These two factors togethercontribute to reduce the energy dissipation of the inter-switch wire segments.

Messages can be injected by each IP into the networkfollowing different stochastic distributions. In our experi-ments the traffic injected by the functional IP blocksfollowed Poisson and self-similar distributions [18]. In thepast, a Poisson distributed injection rate was frequentlyused when characterizing performance of multiprocessorplatforms [21]. However the self-similar distribution wasfound to be a better match to real world SoC scenarios [1].

As will be demonstrated later, the energy savings trend dueto coding does not depend strongly on any particularinjection pattern.

6 Expected Energy Dissipation in Presence of Errors

The schemes investigated here implement corrective intel-ligence either in the form of joint crosstalk avoidance andforward error correction or error detection followed byretransmission. In the error detection (ED) scheme, when-ever an error is detected, the receiving switch asks forretransmission from the previous one. In contrast, the jointcrosstalk avoidance and single/multiple error correctingcodes ask for retransmission only when the number oferrors in a flit exceeds their correction capability. Aninteresting study is to compare the expected energydissipation per bit for each of the schemes, given that thereis an error in the flit when it is transmitted for the first time.In the following derivations the coded flit length m isassumed to be 38 for the ED scheme, 65 for DAP, BSC andMDR, and 77 for CADEC.

The retransmission mechanism used for each of theschemes to avoid data loss, is a switch-to-switch, flit levelretransmission. If the number of errors in a flit is more thanthe correction capability of the coding scheme then anautomatic repeat request (ARQ) is sent and the erroneousflit is retransmitted. The ED scheme sends ARQ inpresence of even a single error. This necessitates adequatebuffering at the switches for the flits already transmitted.So, there is an additional energy expenditure associatedwith the retransmission buffers [16]. The energy dissipationassociated with the ARQ signal needs to be considered aswell.

6.1 Error Detect and Retransmit Scheme-ED

The probability of the flit having an error in the firsttransmission is given by the following equation, in whichthe last equality assumes small BER ɛ:

Perror ¼ P � 1ð Þ ¼ 1� 1� "ð Þm ¼ m": ð18ÞLet the event that there is an error in the first

transmission be B, and the event that the ith transmissionis the first error-free transmission after i−1 erroneoustransmissions be A. Then the conditional probability forevent A given event B has occurred can be computed as

P A=Bð Þ ¼ P A\Bð ÞP Bð Þ : ð19Þ

As A is the event that the first (i−1) transmissions haveerrors and B is the event that the very first transmission has

J Electron Test

Page 9: Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

at least a single error we can observe that event A impliesevent B. Thus we may say that A\B equals only A. Now,the probability of i repeated transmissions is given by theprobability of i−1 transmissions with at least one errorand the ith transmission without any error, which isP � 1ð Þ½ �i�1 1� P � 1ð Þð Þ. So, the conditional probabilityof i repeated transmissions given an error in the firsttransmission is

Pi ¼ P A=Bð Þ ¼ P � 1ð Þ½ �i�1 1� P � 1ð Þð ÞPerror

: ð20Þ

Hence the expected energy dissipation is given by thefollowing infinite sum, which accounts for all possibletransmissions:

Ex Ebit;ED

�Error

� � ¼ X1i¼2

Pi � i � EED: ð21Þ

In Eq. 21, the number of transmissions i starts from 2 asthat is the least number of transmissions needed if the firsttransmission always has an error, and EED ¼ Ebit;EDþEbit;buf þ EARQ. Here, Ebit,ED is the energy dissipated perbit in a inter-switch link in case of the sole ED scheme. Theenergy factor also includes the energy per bit for the bufferstorage, Ebit,buf and the energy dissipation for the ARQ bit,EARQ. Thus i·EED is the energy dissipated per bit in irepeated transmissions for the ED scheme. This gives theexpected energy dissipation for the ED scheme as

Ex Ebit;ED

�Error

� � ¼ 2� m"ð Þ1� m"ð Þ EED; ð22Þ

where m is the total number of bits in the coded flit. Forsmall ɛ the above equation simplifies to

Ex Ebit;ED

�Error

� � ¼ 2þ m"½ �EED: ð23ÞIt is evident from Eq. 23 that the expected value of

energy dissipation in the ED scheme is more than twice thatof a single transmission in presence of errors.

6.2 DAP, BSC and MDR Coding Schemes

If the DAP, BSC and MDR schemes were enhanced usinga retransmission mechanism, then we would expect theenergy dissipation to depend on the retransmissionprobability. The difference between the joint CAC/SECschemes and the ED scheme is that the joint code willsend an ARQ only when there is more than one error inthe flit. For any single error the schemes will correct theflit on the fly. Once again, let A be the event that there arei−1 transmissions with more than one error, whichnecessitated retransmission for i−1 times, while the lastith transmission has one or less errors. Also, let B be theevent that the first transmission had at least one error. As in

the case of the ED scheme, we are interested in determiningP(A/B). As before, A\B as A is a subset of B. Theconditional probability of having i>1 repeated transmis-sions, given an error in the first transmission, follows from(20) and is given by

Pi ¼ P � 2ð Þi�1P < 2ð ÞPerror

: ð24Þ

Here Perror is obtained from (19), P(<2) is the probabilityof having less than two erroneous bits in the flit, and P(≥2)is the probability of having two or more errors in the flitwhich is given by 1−P(<2). Now, for i=1, the flit hadexactly one error and hence was correctable; this probabil-ity is given by

Pi¼1 ¼ P 1ð ÞPerror

¼ m" 1� "ð Þm�1

m"� 1� m� 1ð Þ" ð25Þ

The expected value of the energy dissipation is given bythe following sum similar to (22)

Ex Ebit;DAP

�Error

� � ¼ X1i¼1

Pi � i � EDAP; ð26Þ

where EDAP ¼ Ebit;DAP þ Ebit;buf þ EARQ and Ebit,DAP is theenergy dissipated per bit in a inter-switch link in case of theDAP scheme. As before, the retransmission buffer energyand the ARQ energy are also included. Equation 26 can besimplified for small values of ɛ as

Ex Ebit;DAP

�Error

� � ¼ 1þ m m� 1ð Þ24

"3

" #EDAP: ð27Þ

Equation 27 also gives the expected value of energydissipation per bit (given an initial error) for BSC andMDR, since they have the same error correction capabilityas DAP. From Eq. 27, the expected energy dissipation perbit when an error has occurred is less in the case of DAP,BSC or MDR codes than in the case of ED, as they send anARQ only when there is more than a single error whichhappens less often than a single error occurring in thetransmitted flit.

6.3 CADEC Scheme

In CADEC, the expected energy per bit will be less than inED, as CADEC retransmits only when there are three ormore errors compared to ED which retransmits even whenthere is a single error.

For CADEC, the event A will be i−1 transmissions withmore than two errors and the last transmission with two orless errors. The event B as before will be the case when thefirst transmission is in error. Following similar arguments asin the case of ED and DAP, A∩B=A. Hence, the conditional

J Electron Test

Page 10: Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

probability of i repeated transmissions, where i>1, given thatthe first transmission has an error, is given by

Pi ¼ P � 3ð Þi�1P < 3ð ÞPerror

; ð28Þ

where P(<3) is the probability of having 0, 1 or 2 errors inthe flit and P(≥3) is the probability of having more than twoerrors and equals 1−P(<3).

However, if the first transmission has two or less errorsthen there will be no retransmissions and this event has theprobability

Pi¼1 ¼ P 1ð ÞPerror

þ P 2ð ÞPerror

� 1� m� 1ð Þ2

"; ð29Þ

where m is the number of bits in the coded flit.

Similar to the other schemes the expected value of theenergy dissipation in this case is given by

Ex Ebit;CADEC

�Error

� � ¼ X1i¼1

Pi � i � ECADEC ð30Þ

where ECADEC ¼ Ebit;CADEC þ Ebit;buf þ EARQ and Ebit,CADECis the energy dissipation per bit for the CADEC scheme. Theenergy dissipation per bit for the retransmission buffer,Ebit,buf and that for the ARQ bit, EARQ are also considered inEq. 30.

The final expected value of the energy dissipation giventhere is an error in the flit in presence of CADEC codingsimplifies (in the limit of small ɛ) to

Ex Ebit;CADEC

�Error

� � ¼ 1� m� 1ð Þ2

"

Ebit;CADEC ð31Þ

From the above analysis, it is evident that in the event ofan error the ED scheme on an average dissipates about twotimes more energy than the CADEC scheme per bit,ignoring the ɛ term which is much less than unity.

An important point worth mentioning here is thatEbit,DAP>Ebit,CADEC and Ebit,ED>Ebit,CADEC. This is be-cause the voltage reduction owing to enhancement inreliability is more for the CADEC scheme compared tothe other two as seen in Fig. 5. The effective switchingcapacitance of adjacent wires in presence of crosstalkavoidance coding in CADEC is less than that for the EDscheme which does not guard against crosstalk. Though thecoupling capacitances in DAP, BSC and MDR are same asthat in CADEC, they need a higher voltage level due totheir lower error correction capability. As these two factors,namely, voltage swing and switching capacitance, are theprimary contributing factors towards energy dissipation, theenergy expenditure per bit per hop is much less for CADECcompared to the other schemes.

- Functional IP - Switch

a b

c

Fig. 6 NoC architectures a MESH b FOLDED-TORUS c BUTTER-FLY–FAT–TREE (BFT)

0

2000

4000

6000

8000

10000

12000

14000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Injection Load

En

erg

y D

issi

pat

ion

per

cyc

le(p

J)

UncodedEDDAPBSCCADEC

0

2000

4000

6000

8000

10000

12000

14000

16000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Injection Load

En

erg

y D

issi

pat

ion

per

cyc

le(p

J)

UncodedEDDAPBSCCADEC

a b

Fig. 7 Average energy dissipa-tion per simulation cycle forall the schemes for MESH-basedNoC at a 1 =1 and b 1 =4with Poisson injectionprocess

J Electron Test

Page 11: Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

7 Experimental Results and Analysis

In order to quantify the effectiveness of the proposedCADEC coding scheme on the energy dissipation charac-teristics of NoC communication infrastructures, we consid-ered a system consisting of 64 IP blocks and mapped themonto MESH, Folded Torus and Butterfly–Fat–Tree (BFT)based NoC architectures as shown in Fig. 6. We assume theNoC to be spread over a die size of 20 mm×20 mm. Wecompared the performance of the CADEC scheme with thealready proposed joint CAC/SEC schemes like DAP, BSCand MDR. We have already shown that the energydissipation characteristics of DAP and MDR are verysimilar [17]. Hence, we consider only DAP along withBSC in our comparative analysis. Additionally performanceof the CADEC scheme is also compared with the ED-retransmission mechanism. In the simulations, messageswere injected with Poisson and self-similar distributions[18]. The routing mechanism used in the simulationsdepends on the particular network architecture adopted.For the Mesh and Folded Torus architectures e-cube(dimension order) routing [6] was used whereas, for theBFT architecture, LCA (Least Common Ancestor) routingmethodology was adopted [18]. The energy dissipations areplotted for each of the three NoC architectures mentionedabove. The energy dissipation profiles give the energydissipated by all messages in the NoC per simulation cycle.

The packet length was assumed to be 16 flits. Simu-lations were performed assuming 130 nm technology node

parameters. The channel BER is assumed to be 10−20 [30]in these simulations.

The energy dissipation of each inter-switch wire segmentis a function of 1 , the ratio of the coupling capacitance tothe bulk capacitance. For a given interconnect geometry,the values of 1 depend on the metal coverage in upper andlower metal layers. At the 130 nm technology node, the twoextreme values of 1 are 0.95 and 4.6, respectively [30].

All the schemes have different number of bits in theencoded flit. A fair comparison in terms of energy savingsdemands that the redundant wires be also taken intoaccount while comparing the energy dissipation profiles.The metric used for comparison thus takes into the accountthe savings in energy due to the reduced crosstalk, reducedvoltage level on the wires, the extra redundant wires, theadditional energy dissipated by the codecs, and energydissipated by the retransmission buffers and ARQ signalsfor the schemes. An uncoded 32-bit wide flit is consideredas the standard for comparison.

Figures 7a and b show the energy dissipation profile for allthe coding schemes (ED, DAP, BSC and CADEC) for 1 =1and 1 =4 respectively, in a Mesh-based NoC architecture.

Figures 8a and b show the energy dissipation profile with1 =1 and 1 =4 respectively for a Folded-Torus based NoCfabric. Figures 9a and b show the energy dissipation profilefor a Butterfly–Fat–Tree architecture for the same two extremecases of 1 . In all these experiments the injection processfollowed the Poisson distribution. The energy expenditureper cycle is least in the case of the CADEC scheme, as it can

02000400060008000

1000012000140001600018000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Injection Load

En

erg

y D

issi

pat

ion

per

cyc

le(p

J)

UncodedEDDAPBSCCADEC

0

5000

10000

15000

20000

25000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Injection Load

En

erg

y D

issi

pat

ion

per

cyc

le(p

J)

UncodedEDDAPBSCCADEC

a b

Fig. 8 Average energy dissipa-tion per simulation cyclefor all the schemes for FOLDEDTORUS-based NoC at a1 =1 and b 1 =4 with Poissoninjection process

0

2000

4000

6000

8000

10000

12000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Injection Load

En

erg

y D

issi

pat

ion

per

cyc

le(p

J)

UncodedEDDAPBSCCADEC

0

2000

4000

6000

8000

10000

12000

14000

16000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Injection Load

En

erg

y D

issi

pat

ion

per

cyc

le(p

J)

UncodedEDDAPBSCCADEC

a b

Fig. 9 Average energy dissipa-tion per simulation cycle forall the schemes for BFT-basedNoC at a 1 =1 and b 1 =4 withPoisson injection process

J Electron Test

Page 12: Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

reduce the voltage swing more than any of the other schemesdue to its double error correcting capability, as discussed inSection 4. In addition to this, the joint CAC and FEC codes(DAP, BSC and CADEC) also reduce the mutual switchingcapacitances on the inter-switch wire segments, which isanother contributing factor in lowering the energy dissipa-tion. The reduction in effective switching capacitancehappens only when crosstalk is avoided but not in the EDscheme which does not address crosstalk. Thus the maxi-mum energy dissipation corresponds to the ED scheme.

Figures 10a and b show the energy dissipation profile fora Mesh based NoC by considering a self-similar trafficinjection process.

It can be inferred from Figs. 7, 8 and 9 that the reductionin energy dissipation arises out of coding follows the sametrend irrespective of the specific NoC, though the absolutevalue varies from one topology to another. From Fig. 10, itis evident that the energy savings arising as a result of thecoding process while considering a self-similar trafficinjection is not very different compared to that with Poissondistribution.

8 Timing Characteristics

The exchange of data among the constituent blocks in aSoC is becoming an increasingly difficult task because ofgrowing system size and nonscalable global wire delay. Tocope with these issues, designers must divide the end-to-end communication medium into multiple pipelined stages,with the delay in each stage comparable to the clock-cyclebudget. In NoC architectures, the inter-switch wire segments,along with the switch blocks, constitute a highly pipelinedcommunication medium characterized by link pipelining,deeply pipelined switches, and latency-insensitive componentdesign [3]. The switches generally consist of multiplepipelined stages as shown in Fig. 11. The number ofintraswitch pipelined stages can vary with the design styleand the features incorporated within the switch blocks. Theencoders and decoders are part of intra-switch pipelinedstages. In accordance with ITRS [10], a generally accepted

rule of thumb is that the clock cycle of high performanceSoCs will saturate at a value in the range of 15 FO4 (Fan-outof 4) delay units. If the delay of the encoder and the decoderand that of the inter-switch wire segments can be constrainedwithin this clock cycle limit then the pipelined communica-tion infrastructure will be maintained.

8.1 Inter-Switch Wire Delay

Due to crosstalk with adjacent wires the delay of datapropagation through an interconnect increases. This Cross-talk Induced Bus Delay (CIBD) is a function of the worstcase crosstalk capacitance between the adjacent wires and itdepends on the correlation between transmitted signals.More correlated signals incur less propagation delaycompared to completely uncorrelated signals. For anuncoded interconnect the data patterns are generallyuncorrelated and consequently it is possible to have theworst case switching scenario, where a data pattern canhave a 101to 010 transition or vice versa. Due to oppositetransitions in neighbors on both sides of the victim wire thecoupling capacitance of the victim increases by twice foreach neighbor and hence it becomes (1+41 )CL [25] whereCL is the load capacitance of the wire including self-capacitance and 1 is the ratio of the coupling capacitance tothe bulk capacitance as discussed in Section 7. The CIBDfor such a situation becomes (1+41 )C0, where, C0 is thedelay of a single individual wire without any coupling.

When error control coding is employed, the correlationbetween the transmitted data depends on the particular errorcontrol code used. For the ED scheme which is imple-mented using a Hamming code there are no inherent

0

2000

4000

6000

8000

10000

12000

14000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Injection Load

En

erg

y D

issi

pat

ion

per

cyc

le(p

J)

UncodedEDDAPBSCCADEC

0

2000

4000

6000

8000

10000

12000

14000

16000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Injection Load

En

erg

y D

issi

pat

ion

per

cyc

le(p

J)

UncodedEDDAPBSCCADEC

a b

Fig. 10 Average energy dissi-pation per simulation cyclefor all the schemes for MESH-based NoC at a 1 =1 andb 1 =4 with self-similar injec-tion process

inte

r-sw

itch

link

inte

r-sw

itch

link

inte

r-sw

itch

link

dec

od

er

enco

der

dec

od

er

enco

der

intra-switchpipelined stages

intra-switchpipelined stages

Fig. 11 Pipeline data transfer in a NoC considering channel coding

J Electron Test

Page 13: Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

crosstalk avoidance characteristics and hence in general thecoded data is uncorrelated. Consequently the worst casetransition of two neighbors transitioning in oppositedirections cannot be avoided and hence the CIBD for theED scheme is (1+41 )C0.

For the DAP, BSC, MDR and CADEC schemes theindividual bits are all duplicated and hence a 101 or 010pattern can never occur at all in any code word. Thisenhances the correlation between transmitted signals. As aresult the worst case coupling in the case of such codingschemes reduces to (1+21 )CL. This happens because,when neighboring bits switch in the opposite direction,the duplication mechanism forces adjacent pairs of bits oneither side of the transition to switch in the same direction.For example, 0011→1100 is the worst transition possiblesince the bits on the two edges are copies. The worst caseCIBD thus becomes (1+21 )C0. Table 1 shows the delaysincurred by the flits while traversing the inter-switch wiresegments. It should be noted that for MESH and FoldedTorus architectures all the inter-switch wire lengths are thesame and hence their delays are equal. On the contrary, inthe BFT architecture the wire lengths vary with the level ofthe tree. As a result the wire delays also vary with the level.For a 64-IP system the BFT-based NoC will have 3(log464) levels. As shown in Table 1, for all the schemesthe inter-switch delays are within the one clock cycle limitof 15FO4. Consequently the flits can be transmittedbetween neighboring switches within a single clock cycle

maintaining the pipelined communication architecture ofthe NoC. As the transmitted signals for DAP, BSC, MDRand CADEC are more correlated than those for Uncodedand ED schemes, they incur less delay in inter-switch wiretraversal. Another point worth noting is that as DAP, BSC,MDR and CADEC reduce the wire capacitance by the sameamount they incur identical inter-switch delays.

8.2 Codec Delay

Through RTL design and synthesis using Synopsyssynthesis tools, we obtain the delays along the criticalpaths of each encoder and decoder for all the codingschemes. The delay values corresponding to all the codingschemes are shown in Table 2. It is evident that all thecoding schemes achieve the target delay values within thelimit of one clock cycle. As all the individual encoding anddecoding operations can be performed within one clockcycle the overall system latency will be the same for all thecases. Thus none of the coding schemes has any advantageover the others in terms of system latency. But the messagelatency in presence of coding will be more compared to theuncoded situation. Fig. 12 shows the effect of the codingschemes on message latency for the MESH network, withPoisson injection process.

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Injection Load

Ave

rag

e M

essa

ge

Lat

ency

(Cyc

les)

Uncoded

Coded

Fig. 12 Variation of average message latency with injection load

Table 2 Critical path delay for each coding scheme

Coding scheme Codec delay (FO4)

Encoder Decoder

ED 8.2 10.4DAP 5.8 9.5BSC 6.0 10.0MDR 5.8 9.5CADEC 10.5 10.9

Table 3 Area overhead of the coding schemes

Coding scheme Area (2-input NAND gate)

ED 816DAP 678BSC 842MDR 684CADEC 1357

Table 1 Inter-switch wire delay

Coding scheme Architecture Delay (F04)

Uncoded/ED MESH 0.5Folded Torus 2.1BFT_3_2a 6.5BFT_2_1b 1.6

DAP/BSC/MDR/CADEC MESH 0.3Folded Torus 1.1BFT_3_2a 3.5BFT_2_1b 0.9

a Inter-switch wire spanning levels 2 and 3b Inter-switch wire spanning levels 2 and 3

J Electron Test

Page 14: Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

For all the other NoCs considered here the codingschemes will have similar effect on system latency. Here,the average latency per simulation cycle due to all themessages present in the NoC is considered. As can be seen,the overall message latency in presence of coding increases,but considering the gain in energy savings, the latencyoverhead is modest.

9 Area Overhead

For the sake of complete comparison, we also report the siliconarea required by the codec blocks for each of the codingschemes. Through RTL level design and synthesis in Syn-opsys™ Design Analyzer the silicon area consumed by eachcodec was obtained as shown in Table 3. The area figures areexpressed in units of a minimum sized 2-input NAND gate.

The switches along with the Network Interface (NI) consistof approximately 30K minimum sized NAND gates. Conse-quently, the area overhead due to each of the coding schemes isnot significant. This overhead is a small price to pay for theenhanced reliability and high gains in energy savings fromincorporating the coding schemes. Another extra area arisesfrom the retransmission buffers. Following [16], for fullthroughput operation these buffers account for around 1200two-input NAND gates per switch port. This additional areaoverhead can be avoided by adopting a coding scheme withhigher error correction capability. As shown in (14), the worderror probability and hence the probability of retransmission isproportional to ɛ

3 for the CADEC scheme. Assuming atypical bit error rate, ɛ, of 10−20 [30], the probability ofretransmission is extremely low. Consequently, even withoutprovision of retransmission the probability of data loss will benegligible. This suggests that higher order error correctingcodes will be more area efficient than retransmission-basedmechanisms.

10 Conclusion

The communication requirements of large multiprocessor SoCs(MP-SoCs) can be conveniently met by the Network on Chip(NoC) paradigm. By incorporating joint crosstalk avoidanceand double error correction coding, it is possible to simulta-neously enhance the reliability of the NoCs and lower theenergy dissipation, despite the associated redundant wires andcodec logic requirements. As verified through detailed analysisand simulations, the proposed CADEC scheme lowers theenergy dissipation compared to all other existing schemesstudied here. The energy savings arise from two factors,namely, the possibility of lowered voltage swing, and reductionof mutual switching capacitance of the inter-switch wiresegments. From the analysis carried out in this work, it can

also be concluded that coding schemes with higher ordercorrection capability outperform sole retransmission-basedmechanisms in terms of energy and area overhead.

References

1. Avresky DR, Shubranov V, Horst R, Mehra P (1999) PerformanceEvaluation of the ServerNetR SAN under Self-Similar Traffic.Proceedings of 13th International and 10th Symposium on Paralleland Distributed Processing 143–147, April 12–16th

2. Benini L, De Micheli G (2002) Networks on Chips: A New SoCParadigm. IEEE Computer 70–78, Jan

3. Benini L, Bertozzi D (2004) Xpipes: A Network-on-ChipArchitecture for Gigascale Systems-on-Chip. IEEE Circuits SystMag 4(2):18–31, Apr–June

4. Bertozzi D, Benini L, De Micheli G (2002) Low power errorresilient encoding for on-chip data buses. Proceedings of theDesign, Automation and Test in Europe Conference and Exhibi-tion, (DATE) 102–109, 4–8 March

5. Bertozzi D, Benini L, De Micheli G (2005) Error ControlSchemes for On-Chip Communication Links: The Energy-Reliability Tradeoff. IEEE Trans Comput-Aided Des IntegrCircuits Syst 24(6):818–831, June

6. Duato J, Yalamanchili S, Ni L (2002) Interconnection Networks –An Engineering Approach, Morgan Kaufmann

7. Dupont E, Nicolaidis M, Rohr P (2002) Embedded RobustnessIPs for Transient-Error-Free ICs. IEEE Des Test Comput 19(3):54–68, May–June

8. Grecu C, Pande PP, Ivanov A, Saleh R (2004) A ScalableCommunication-Centric SoC Interconnect Architecture”, Proceed-ings of IEEE International Symposium on Quality ElectronicDesign, ISQED 343–348

9. Grecu C, Pande PP, Ivanov A, Saleh R (2005) Timing Analysis ofNetwork on Chip Architectures for MP-SoC Platforms. Micro-electron J Elsevier 36(9):833–845

10. ITRS (2005) Documents, http://www.itrs.net/Links/2005ITRS/Home2005.htm

11. Kretzschmar C, Nieuwland AK, Muller D (2004) Why TransitionCoding for Power Minimization of on-Chip Buses does not work.Proceedings of the Design, Automation and Test in EuropeConference and Exhibition 512–517, 16–20 Feb

12. Lin S, Costello DJ (1983) Error Control Coding: Fundamentalsand Applications, Prentice-Hall

13. Magarshack P, Paulin PG (2003) System-on-Chip beyond theNanometer Wall. Proceedings of 40th Design Automation Conf.(DAC 03), ACM Press, pp 419–424

14. McWilliams FJ (1963) A Theorem on the Distribution of Weightsin a Systematic Code. Bell Syst Tech Jour 42:79–94

15. Mitra S, Seifert N, Zhang M, Shi Q, Kim KS (2005) RobustSystem Design with Built-In Soft Error Resilience. IEEEComputer 38(2):43–52, Feb

16. Murali S, De Micheli G, Benini L, Theocharides T, VijaykrishnanN, Irwin M (2005) Analysis of Error Recovery Schemes forNetworks on Chips. IEEE Des Test Comput 22(5):434–442

17. Pande PP, Ganguly A, Feero B, Belzer B, Grecu C (2006) Design ofLow power & Reliable Networks on Chip through Joint CrosstalkAvoidance and Forward Error Correction Coding. Proceedings of21st IEEE International Symposium on Defect and Fault Tolerancein VLSI Systems (DFT 06), 4th–6th October

18. Pande PP, Grecu C, Jones M, Ivanov A, Saleh R (2005)Performance Evaluation and Design Trade-offs for Network onChip Interconnect Architectures. IEEE Trans Comput 54(8):1025–1040, August

J Electron Test

Page 15: Design of Low Power & Reliable Networks on Chip Through ...pande/Journal_Papers/Partha_JETTA.pdfdifferent low-power coding (LPC) techniques have been proposed to reduce power consumption

19. Pande PP, Zhu H, Ganguly A, Grecu C (2006) Crosstalk-awareEnergy Reduction in NoC Communication Fabrics. Proceedingsof IEEE International SOC Conference, SOCC 2006 225–228,24th–27th September

20. Pande PP, Zhu H, Ganguly A, Grecu C (2006) Energy Reductionthrough Crosstalk Avoidance Coding in NoC Paradigm. Proceed-ings of 9th Euromicro Conference on Digital System Design,DSD 2006, 30th August-1st

21. Park K, Willinger W (2000) Self-similar Network Traffic andPerformance Evaluation, John Wiley & Sons

22. Patel KN, Markov IL (2003) Error-Correction and CrosstalkAvoidance in DSM Busses,” IEEE Transactions on Very LargeScale Integration (VLSI) Systems, Special Issue for System LevelInterconnect Prediction (SLIP) 1–5

23. Rossi D et al (2002) Coding scheme for low energy consumptionfault-tolerant bus. Proceedings of 8th IEEE International On-LineTesting Workshop 8–12

24. Rossi D et al (2003) Power Consumption of Fault Tolerant Codes:the Active Elements. Proceedings of 9th IEEE International On-Line Testing Symposium 61–67

25. Rossi D, Metra C, Nieuwland AK, Katoch A (2005) ExploitingECC Redundancy to Minimize Crosstalk Impact. IEEE Des TestComput 22(1):59–70, Jan

26. Rossi D, Metra C, Nieuwland AK, Katoch A (2005) New ECC forCrosstalk Effect Minimization. IEEE Des Test Comput 22(4):340–348, July–Aug

27. Shang L, Peh LS, Jha NK (2003) Dynamic voltage scaling withlinks for power optimization of interconnection networks. Pro-ceedings of the 9th International Symposium on High Perfor-mance Computer Architecture (HPCA-9) 91–102, 8–12 Feb

28. Soteriou V, LS Peh (2004) Design-space exploration of power-aware on/off interconnection networks. Proceedings of IEEEInternational Conference on Computer Design (ICCD) 510–517,11–13 Oct

29. Sotiriadis PP, Chandrakasan AP (2002) A bus energy model fordeep submicron technology. IEEE Trans Very Large Scale Integr(VLSI) Syst 10(3):341–350, June

30. Sridhara SR, Shanbhag NR (2005) Coding for System-on-ChipNetworks: A Unified Framework. IEEE Trans Very Large ScaleIntegr (TVLSI) Syst 13(6):655–667, June

31. Stan MR, Burleson WP (1997) Low-power encodings for globalcommunication in CMOS VLSI. IEEE Trans Very Large ScaleIntegr (TVLSI) Syst 5(4):444–455, Dec

32. Victor B, Keutzer K (2001) Bus Encoding to Prevent CrosstalkDelay. Proceedings of IEEE International conference on ComputerAided Design (ICCAD) 57–63, 4–8 Nov

Amlan Ganguly is a doctoral candidate in the School of ElectricalEngineering and Computer Science at the Washington State Univer-sity, USA. He received his B.Tech (Hons.) degree in Electronics and

Electrical Communication Engineering from the Indian Institute ofTechnology, Kharagpur, India. After a brief stint of five months inIntel India Development Centre, Bangalore, India he joined thegraduate program at WSU. His research interests include design offault tolerant interconnection infrastructures for Multi-Processor SoCplatforms and novel architectures for On-Chip networks. Amlan is astudent member of the IEEE.

Partha Pratim Pande is an Assistant Professor at the School ofElectrical Engineering & Computer Science, Washington StateUniversity, Pullman, Washington. He received his Ph.D. in Electricaland Computer Engineering from the University of British Columbia in2005 and an MS in Computer Science from the National University ofSingapore in 2002. His primary research interests lie in the area ofdesign and Test of Networks on Chip, Fault Tolerance and reliabilityof Multi-processor SoC (MP-SoC) platforms. Partha is serving in theprogram committees of different international conferences, likeIOLTS, ATS, MWSACS and DELTA. He is a member of IEEE.

Benjamin Belzer (S’93, M’96) received the B.A. degree in Physicsfrom the University of California at San Diego in 1982, and the Ph.D.degree in Electrical Engineering from the University of California atLos Angeles in 1996. From 1981 to 1991, he worked as a softwareengineer for Beckman Instruments, Hughes Aircraft, NorthropCorporation, and Source Scientific in Southern California, andDevelco, Inc., in Northern California. Since 1996, he has been onthe faculty of the School of Electrical Engineering and ComputerScience at Washington State University, Pullman, WA, where he iscurrently an Associate Professor. His research interests include codingfor networks on chip, iterative detection and equalization, codedmodulation for wireless communications, combined source andchannel coding, and image and video communication systems.

Cristian Grecu is a doctoral candidate in the Department of Electricaland Computer Engineering, University of British Columbia, Canada.His research interests focus on design, test and fault tolerance of largeSoCs, with particular emphasis on their data communication infra-structures. He received his B.S. and M.Eng. degrees in ElectricalEngineering from the Technical University of Iasi, Romania, and theM.A.Sc. degree from the University of British Columbia. Cristian is astudent member of the IEEE.

J Electron Test