Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
High-Radix Routers 1ISCA’05
Microarchitecture of a High-Radix Router
John Kim, William J. Dally, Brian Towles1, and Amit K. Gupta
Concurrent VLSI Architecture 1D.E. Shaw Stanford University Research & Development
High-Radix Routers 2ISCA’05
Interconnection Network
����������������
�����������������
��������������
��������
�������� ��� � ��
�����!
�����" ���������� � ���
#����$%�
High-Radix Routers 3ISCA’05
Bandwidth Trend
0.1
1
10
100
1000
10000
1985 1990 1995 2000 2005 2010
year
ban
dwid
th p
er r
out
er n
od
e (G
b/s
)
Torus Routing ChipIntel iPSC/2J-MachineCM-5Intel Paragon XPCray T3DMIT AlewifeIBM VulcanCray T3ESGI Origin 2000AlphaServer GS320IBM SP Switch2Quadrics QsNetCray X1Velio 3003IBM HPSSGI Altix 3000
Bandwidth growth result of
1.
2. Increase in number of signals
Increase in signaling rate
High-Radix Routers 4ISCA’05
Bandwidth Trend
0.1
1
10
100
1000
10000
1985 1990 1995 2000 2005 2010
year
ban
dwid
th p
er r
out
er n
od
e (G
b/s
)
Torus Routing ChipIntel iPSC/2J-MachineCM-5Intel Paragon XPCray T3DMIT AlewifeIBM VulcanCray T3ESGI Origin 2000AlphaServer GS320IBM SP Switch2Quadrics QsNetCray X1Velio 3003IBM HPSSGI Altix 3000
Approximately an order of magnitude
increase in bandwidth every 5
years
How to effectively utilize the increased bandwidth?
High-Radix Routers 5ISCA’05
Current Interconnection Networks
• Earlier works [Dally ’90, Agarwal ’91] suggested “lower-radix” networks– Examples: Torus Routing Chip, Cray T3E, SGI Altix 3000
• Current Routers– Myrinet : radix-32– Quadrics : radix-8– IBM SP2 Switch : radix-8
• Technology trends– Increasing Bandwidth– Optical signaling
High-Radix Routers can take better advantage of these technology trends.
High-Radix Routers 6ISCA’05
Outline
� Technology Trends for High- Radix Router� Motivation for High- Radix Router� High- Radix Router Architectures
� Baseline Architecture� Fully Buffered Crossbar� Hierarchical Crossbar
� Conclusion & Future Work
High-Radix Routers 7ISCA’05
High-Radix Router
Router
Router
High-Radix Routers 8ISCA’05
High-Radix Router
Router
Router
Low-radix (small number of fat ports) High-radix (large number of skinny ports)
RouterRouter
High-Radix Routers 9ISCA’05
Latency in a Network
Latency = Header Latency + Serialization Latency
= H tr + L / b
= 2trlogkN + 2kL / B
= H tr + L / b
where k = radixB = total BandwidthN = # of nodesL = message size
High-Radix Routers 10ISCA’05
Latency vs. Radix
0
50
100
150
200
250
300
0 50 100 150 200 250radix
late
ncy
(nse
c)
2003 technology 2010 technology
�������������� �
�����������������
Serialization latency increasesHeader latency decreases
High-Radix Routers 11ISCA’05
Determining Optimal Radix
Latency = Header Latency + Serialization Latency
= H tr + L / b= 2trlogkN + 2kL / B
Optimal radix� k log2 k = (B tr log N) / L
= Aspect Ratio
where k = radixB = total BandwidthN = # of nodesL = message size
High-Radix Routers 12ISCA’05
Higher Aspect Ratio, Higher Optimal Radix
1996
2003
2010
1991
1
10
100
1000
10 100 1000 10000
Aspect Ratio
Opt
imal
Rad
ix (k
)
High-Radix Routers 13ISCA’05
Higher Radix, Lower Cost
0
1
2
3
4
5
6
7
8
0 50 100 150 200 250radix
cost
( #
of 1
000
chan
nels
)
2003 technology 2010 technology
High-Radix Routers 14ISCA’05
Outline
� Technology Trend for High- Radix Router� Motivation for High- Radix Router� High- Radix Router Architectures
� Baseline Architecture� Fully Buffered Crossbar� Hierarchical Crossbar
� Conclusion & Future Work
High-Radix Routers 15ISCA’05
Virtual Channel Router Architecture
High-Radix Routers 16ISCA’05
High-Radix Switch Architectures (I)
output 1
output 2
output k
input 1
input k
���
input 2
� � �
(a) Baseline design
High-Radix Routers 17ISCA’05
Simulation Methodology
• Open-loop simulation done with steady-state measurement• Output switch arbitration required two cycles• Wire delay included in the pipeline• Uniform random traffic pattern• Each packet was assumed to be a single flit• Radix=64 router evaluated with 4 VCs per input
• Metrics – latency & throughput• Cost – area
High-Radix Routers 18ISCA’05
Baseline Performance Evaluation
0
10
20
30
40
50
0 0.2 0.4 0.6 0.8 1
offered load
late
ncy
(cyc
les)
low-radix
High-Radix Routers 19ISCA’05
Baseline Performance Evaluation
0
10
20
30
40
50
0 0.2 0.4 0.6 0.8 1
offered load
late
ncy
(cyc
les)
low-radix
baseline(high-radix)
Low radix
better
High-Radix Routers 20ISCA’05
High-Radix Switch Architectures (II)
output 1
output 2
output k
input 1
input k
���
input 2
� � �
(a) Baseline design (b) Fully buffered crossbar
output 1
output 2
output k
input 1
input k
���
input 2
� � � output 1
output 2
output k
input 1
input k
���
input 2
� � �
Fully buffered crossbar with shared buffers
High-Radix Routers 21ISCA’05
Fully Buffered Crossbar Provides High Performance but �.
0
10
20
30
40
50
0 0.2 0.4 0.6 0.8 1offered load
late
ncy
(cyc
les)
low-radixbaselinefully-buffered
High-Radix Routers 22ISCA’05
� Becomes Costly to build
0
50
100
150
200
0 50 100 150 200 250radix
area
(m
m2)
wire area buffer area
Buffer area dominates at radix 50
Other issues:
1.
2.
Credit Information
Adaptive Routing
High-Radix Routers 23ISCA’05
(b) Fully buffered crossbar
output 1
output 2
output k
input 1
input k
���
input 2
� � �
High-Radix Switch Architectures (III)
output 1
output 2
output k
input 1
input k
���
input 2
� � �
(a) Baseline design
subswitch
(c) Hierarchical crossbar
High-Radix Routers 24ISCA’05
Hierarchical Crossbar Performance on Uniform Random Traffic
0
10
20
30
40
50
0 0.2 0.4 0.6 0.8 1offered load
late
ncy
(cyc
les)
baselinesubswitch 32subswitch 16subswitch 8subswitch 4fully-buffered
subswitch
High-Radix Routers 25ISCA’05
Worst-case Traffic Pattern
0
10
20
30
40
50
0 0.2 0.4 0.6 0.8 1offered load
late
ncy
(cyc
les)
baselinesubswitch 32subswitch 16subswitch 8subswitch 4fully-buffered
subswitch
High-Radix Routers 26ISCA’05
Worst Case Performance Comparison
0
10
20
30
40
50
0 0.2 0.4 0.6 0.8 1offered load
late
ncy
(cyc
les)
baselinesubswitch 32subswitch 16subswitch 8subswitch 4fully-buffered
High-Radix Routers 27ISCA’05
4k Node Network Performance
0
20
40
60
80
100
0 0.2 0.4 0.6 0.8 1offered load
late
ncy
(cyc
les)
low-radix router high-radix router
High-Radix leads to lower zero–load latency
High-Radix Routers 28ISCA’05
High-Radix Router Microarchitecture
� In building high- radix router, there are scaling issues as the number of ports increase
� Baseline design provides poor performance� Fully buffered crossbar leads to high performance but
a costly design� Hierarchical crossbar provides a feasible design with
minimal loss in performance
High-Radix Routers 29ISCA’05
Conclusion
� Performance of digital systems is often limited by the interconnection network
� Increasing on- chip bandwidth can be more efficiently used by a high-radix routers.
� High- radix lead to lower cost and lower latency networks
� A high- radix router requires a different architecture� A hierarchical organization makes high- radix routers
feasible
High-Radix Routers 30ISCA’05
Future Work
• Optimal topology for high- radix network– Clos topology suited for high- radix routers
• Routing - How to adaptively route in a high- radix router?
• Flow- control with high- radix router
• Simulating a larger network
• Microarchitecture - Alternative to crossbars : multistage switch organizations, network- on- chip (torus, mesh, etc.)
High-Radix Routers 31ISCA’05
Thank you
Questions?