22
1 Sun Labs, Sun Microsystems, Inc. R. Ho - 2006 Interconnection technologies Ron Ho VLSI Research Group Sun Microsystems Laboratories

Sun Microsystems Laboratories

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sun Microsystems Laboratories

1Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Interconnection technologiesRon HoVLSI Research GroupSun Microsystems Laboratories

Page 2: Sun Microsystems Laboratories

2Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Acknowledgements• Many contributors to the work described here

> Robert Drost, David Hopkins, Alex Chow, Tarik Ono, Jo Ebergen> … and the entire Sun Labs VLSI Research Group> Danny Cohen

Page 3: Sun Microsystems Laboratories

3Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Multicore chips today• Niagara: 8 cores

> 32 threads (4 threads/core), 1 shared FP> 90nm CMOS, 1.2GHz, 63W, 380mm2

> High-volume production now

• Niagara2: 8 (improved) cores> 64 threads (8 threads/core), 1 FP per core> 65nm CMOS, 342mm2

> Shipping systems in 2H07Interconnect networks: an 8x9 xbar

Page 4: Sun Microsystems Laboratories

4Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Multicore chips tomorrow?• How do you scale up to tens or hundreds of cores?

> …Assuming you want to (more total power for high performance)> …And on a single chip: avoid the costs of chip-to-chip IO

> Bumps are big (180μm), links are power-hungry (20pJ/bit = 20mW/Gbps)

• You need to do some combination of:> Making the cores very small

> But you lose functionality: what kind of programming models do you want?> Making the die very big

> But you lose the yield/cost battle very quickly

>

Source: 2006 IDF

5 10 15 20 25 30 35

100 200 300 400 500 600

Cos

t per

squ

are

mm

Die size in square mm

Page 5: Sun Microsystems Laboratories

5Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

A different direction…• We can rethink the “single-chip” requirement…

> If we reduce (eliminate) the costs of chip-to-chip communication> Power, area (bandwidth density), latency, known-good-die

• Break a multi-core chip into a many-chip-system> Smaller chips lead to higher yields and lower cost> Different chips lead to system adaptability and reconfigurability > Aggregate systems of chips effectively break the reticle limit

• Enable a broad set of interconnection explorations

Page 6: Sun Microsystems Laboratories

6Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Proximity I/O• Pioneered by Ivan Sutherland at Sun Labs

• The big idea:

Transmit

Transmit

Receive

Receive

Chip1 Chip3Chip2

Source: Drost, Sun, CICC 2003

Not a solderedconnection!

Page 7: Sun Microsystems Laboratories

7Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

200um 20um

Area bump bonds Proximity I/O pads

180um 20um

Proximity I/O benefits• Bandwidth density 60x-100x greater than balls

• Much lower power> No ESD required> Use wide parallel links instead of narrow SerDes links> Very small Tx and Rx circuits

Source: Drost, Sun, CICC 2003

Page 8: Sun Microsystems Laboratories

8Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Proximity I/O challenges • Misalignment is the proverbial monkey’s wrench

> Initial imperfect chip placement> Dynamic chip movement from vibration or thermal expansion

• A robust solution is (at least) two-pronged> Combination of specialized packaging and custom electronics> Here, discuss some of the electronic/circuit strategy

X Y

Z

θ

Chip1 Chip2

Φ Ψ

Page 9: Sun Microsystems Laboratories

9Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Fixing misalignment• Detect misalignment using Vernier-like arrays

> Measure capacitance between chips to sub-fF resolution

• Correct in-plane misalignment using data steering

Source: Hopkins, Sun, ISSCC 2007

Page 10: Sun Microsystems Laboratories

10Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Dealing with noise• Receivers are differential sense-amplifiers

> “Butterfly” scheme rejects noise (receivers are offset-limited)> We employ a clock—it uses unclocked receivers with larger pads

Source: Hopkins, Sun, ISSCC 2007

Page 11: Sun Microsystems Laboratories

11Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Silicon measurements• 144b Proximity I/O datapath, 180nm TSMC chip

> 1.8Gbps per pin for 260Gbps in 0.5mm2

> Measured BER<10-15

> 3pJ/bit at a 1.8V power supply> > 0.7UI timing margin, > 200mV voltage margin at speed> Measured sensitivity to chip separation

Source: Hopkins, Sun, ISSCC 2007

Page 12: Sun Microsystems Laboratories

12Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Proximity I/O as an enabler• Low-cost chip-to-chip communication

> Off-chip I/O looks like an extension of on-chip wires> Many chips look like a (big) single chip> What kinds of interconnect networks should we consider?

Proximity I/O at the overlap

Page 13: Sun Microsystems Laboratories

13Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Proximity I/O-based grids• Another, perhaps more realistic grid

> All big (high-power) chips are all face-up on a cold plate> Face-down chips merely transmit data

• Natural extension to various network topologies

Page 14: Sun Microsystems Laboratories

14Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

A red flag?• Such a system will have lots of VLSI wires

> The entire interconnection network consists of on-chip wires

• Latency and bandwidth characteristics well-known> …And so are the energy costs: E=C*V*ΔV

> Cap is about 0.45pF/mm (incl. repeaters), and not really scaling> Voltage is about 1V, and not really scaling

> Therefore, energy is about 0.45pJ/bit/mm of linear length> So 100 64b buses at 4GHz, 10% activity over 20mm: 24W!

• Need an efficient wiring system

Page 15: Sun Microsystems Laboratories

15Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Efficient capacitively-driven wires

• Reduced power: low voltage swing on wires> Swing is Vdd/(n+1) without requiring a second power supply

• Reduced power: small load seen by driver> Allows use of a 1/n-th sized driver, saving power and area

• Reduced latency: capacitor pre-emphasizes edges> Also, charge distribution effectively cuts wire delay in half

Rwire, Cwire Cload CcCc = (Cwire+Cload)/nwhere n = 10 - 20

Page 16: Sun Microsystems Laboratories

16Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Pre-emphasis extends bandwidth• The inline capacitor acts as a high-frequency short

> Example of a 14mm minimum width wire (simulated)> Bandwidth RC-limited to 50MHz> Capacitor (of 1/20th total capacitance) extends it to 180MHz

-60

-40

-20

0

1e+06 1e+07 1e+08 1e+09 1e+10

Nor

mal

ized

vol

tage

at e

nd o

f wire

(dB

)

Frequency

-3dB bandwidth improves by 3.3x

Full-swing voltage

50mV swing through Cc

Source: Ho, Sun, ISSCC 2007

Page 17: Sun Microsystems Laboratories

17Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Making a better repeater• Analysis of energy-efficiency shows benefits

> Compare against repeaters for a 10mm wire in a 180nm process

0

2

4

6

8

10

12

0.8 1 1.2 1.4 1.6 1.8

Ene

rgy

in p

J

Performance in 1/nS

Capacitively-driven wire,varying coupling capacitance

from Cwire/18 to Cwire/40

One repeater stage

Two repeaters

Three repeatersRepeater curves arefrom varying driver sizes 30% faster

10X low

er power

Source: Ho, Sun, ISSCC 2007

Page 18: Sun Microsystems Laboratories

18Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

driver long wire

Building pitchfork capacitors• Exploit a “problem” of wires: sidewall capacitance

> Can use more tines or multiple wire layers

• Coupling caps ideal for summing junctions> Use them to build a cheap FIR filter

CloadCwire

Cpar

Cc1

Cc2

z-1 Source: Ho, Sun, ISSCC 2007

Page 19: Sun Microsystems Laboratories

19Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Some costs• Differential wires cost area, power; twisted for noise

• Sense-amps have offset and biasing requirements> Biasing can use refresh, with (n+1) channels for an n-bit bus

• Verification requires some care> Correctness comes from being near, not from being connected!

A

B

A_b

B_b

A

B_b

A_b

B

Page 20: Sun Microsystems Laboratories

20Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Silicon measurements• Multiple datapaths on a 180nm TSMC 1.8V chip

> Measured 10x less energy at a 50mV swing (max. savings 18x)> Measured 4x bandwidth extension from pre-emphasis> Bit error rates < 10-11 (limited by test time)> 50% UI eye opening on each experiment

• 14mm, unrepeated, min.width> RC-limited to 55MHz> Capacitor pushes it to 200MHz> Sub-optimal 2-tap FIR (wrong delay)

-20 -15 -10 -5 0 5 10 15 20

Volta

ge

Time (nS)

After capacitor

End of wire

End of wire, using FIR filter

1.7

1.7

1.7

1.8

1.8

1.8

1.9

1.9

1.900010000 sequence

Source: Ho, Sun, ISSCC 2007

Page 21: Sun Microsystems Laboratories

21Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Look at new system architectures• A pair of complementary enabling technologies

> Multi-chip grids connected using Proximity I/O and efficient wires> Chip-to-chip latency, bandwidth, power equal to on-chip wires> Long on-chip wires can be lower power and higher performance

• Multi-chip grids look like a big “virtual” chip> With (re)configurability, cost, and “reticle-breaking” benefits

• A question: how to best interconnect these chips?> Or: how to best arrange these chips for an interconnect network?

Page 22: Sun Microsystems Laboratories

22Sun Labs, Sun Microsystems, Inc.R. Ho - 2006

Interconnection technologiesRon Ho, [email protected]