BEE3 Update
Chuck ThackerTechnical Fellow
Microsoft Research11 January, 2007
Outline
• What is BEE3?
• BEE2-BEE3 Differences
• Project participants
• Engineering plan, schedule
What is BEE3?
• Follow-on to BEE2 (BWRC, 2004)• Board with several highly-connected FPGAs• Vehicle for computer architecture research
– Microsoft’s primary interest
• Potential platform for high performance DSP applications– Astronomers, and perhaps others.
• Allows large scale architectural experiments– Although perhaps not as large as originally hoped– And certainly not at the speed of a real implementation
• Can scale smoothly from a single board to 64 boards (256 FPGAs)
BEE2
BEE2 – BEE3 Differences
• 4 Xilinx Virtex 5 vs 5 Virtex 2 Pro FPGAs– We use XC5VLX110T-ff1136– V2Pro is now obsolete (130nm)– V5 is a major improvement (65nm)
• 6-input LUT (64 bit DP RAM)• Better Block RAMs• Improved interconnect• Better signal integrity
• 8 Infiniband/CX4 channels vs 18• 4 x8 PCI Express Low Profile slots
BEE3 – BEE2 Differences (2)• 2 Banks DDR2 x 2 vs 4 Banks DDR2 x 1
– Same capacity (64 GB likely)– Lower bandwidth– Mandated by fewer signal pins on V5
• 4 10/100/1000 Ethernet channels• No SATA
– BEE2 SATA didn’t work anyway – iSCSI instead (?)
• No PowerPCs– This version has not yet been released by Xilinx
BEE2 – BEE3 Differences (3)• Divided the system into two boards, Main and Control
– Main board has FPGAs, all high speed logic– Control board handles downloading, monitoring– Simplifies main board engineering – can design control board in parallel
• Smaller main board– 168 vs 374 in2
– Fewer layers for lower cost• Much more “PC-like”• Can use PC power supplies, peripherals• Several layouts are being considered
– All fit in 2U enclosure– Much more attention is being given to thermal design– Must pass UL, FCC
BEE3 Main Board
User15VLXT
User25VLXT
User35VLXT
User45VLXT
DDR2 DIMM0DDR2 DIMM1
DDR2 DIMM0DDR2 DIMM1
108
108
108
108
133 133
DDR2 DIMM2DDR2 DIMM3
133133
DDR2 DIMM2DDR2 DIMM3
40x2
DDR2 DIMM0DDR2 DIMM1
DDR2 DIMM0DDR2 DIMM1
133 133
DDR2 DIMM2DDR2 DIMM3
133133
DDR2 DIMM2DDR2 DIMM3
QSH-DP-040
40x2
40x2QSH-DP-
040QSH-DP-
040
PCI-E8X
CX4
CX4
CX4
CX4
CX4
CX4 PCI-E
8X PCI-E
8X
40x2QSH-DP-
040CX4
CX4
PCI-E8X
Bandwidths (per-FPGA)• Memory
– 400 MT/s * 8B/T * 2 channels: 6.4GB/s
• Ring– 400 MT/s * 12 B/T: 4.8 GB/s
• QSH– 400 MT/s * 10 B/T: 4 GB/s
• Ethernet– 125 MB/s
• CX4– 1.25 GB/s * 2 directions * 2 channels: 5GB/s
• PCI Express– Same as CX4
BEE3 Clocking, JTAG
User15VLXT
User25VLXT
User35VLXT
User45VLXT
JTAG
SMA
200MHz
Clock Buf1:4
333MHz
Clock Buf1:4
PS_ON#
SelectMAP{Data[15:0], Cclk, RDWR_B, Busy, Prog_B, Init_B, Done} + JTAG{TMS, TCK}
TDI
TDO
GT
P 8
x
GT
P 8
x
Gcl
k
DD
R2
GT
P 8
x
GT
P 8
x
Gcl
k
DD
R2
GT
P 8
x
GT
P 8
x
Gcl
k
DD
R2
GT
P 8
x
GT
P 8
x
Gcl
k
DD
R2
156.25MHz
Clock Buf1:8
125MHz
Clock Buf1:8
SMA
Sel0,En0
Sel1,En1
CS_B[3:0]
5Vsb x4
GND x2064p
in 0.1" H
ead
er Co
nne
ctor
100MHz
PC
I-E
xpre
s 8x
Slo
t#1
PC
I-E
xpre
s 8x
Slo
t#2
PC
I-E
xpre
s 8x
Slo
t#3
PC
I-E
xpre
s 8x
Slo
t#4
125MHz
Sel2,En2
Sel3,En3
PWR_OK
BEE3 Control Board
PROM
16
USBCtrl
EthernetPHY
JTAG
DRAM 32
FLASH
USB/H USB/D RJ45
50MHz
Spartan3FT256
LED x4
GPIO x40
PushBtn x4
64pin 0.1" Header C
onnector
16x4Char LCD
5Vsb x4
GND x20
BEE3 System (v1)
Power Supply
AT
X P
WR
12V
AT
X
PW
R
Fujitsu 2x2 CX4
1.0V
1.8V
RJ45
2.5V
SMA
4 G
B D
DR
2-6
67 D
RA
M4 G
B D
DR
2-6
67 D
RA
M
4 G
B D
DR
2-6
67 D
RA
M4 G
B D
DR
2-6
67 D
RA
M
FF1738
5VLXTFF1136
Fujitsu 2x2 CX4
4 G
B D
DR
2-6
67 D
RA
M4 G
B D
DR
2-6
67 D
RA
M
4 G
B D
DR
2-6
67 D
RA
M4 G
B D
DR
2-6
67 D
RA
M
FF1738
5VLXTFF1136
4 G
B D
DR
2-6
67 D
RA
M4 G
B D
DR
2-6
67 D
RA
M
4 G
B D
DR
2-6
67 D
RA
M4 G
B D
DR
2-6
67 D
RA
M
FF1738
5VLXTFF1136
4 G
B D
DR
2-6
67 D
RA
M4 G
B D
DR
2-6
67 D
RA
M
4 G
B D
DR
2-6
67 D
RA
M4 G
B D
DR
2-6
67 D
RA
M
FF1738
5VLXTFF1136
1.0V 1.0V
1.0V
1.8V
SMA
PC
I-Exp
ress 8x
PC
I-Exp
ress 8x
PC
I-Exp
ress 8x
PC
I-Exp
ress 8x
GESwitch
64pin 0.1" H
eader C
onnector
Control Board
Available for Fans
QS
H-D
P-0
40
QS
H-D
P-0
40
QS
H-D
P-04
0
QS
H-D
P-04
0
12V
A
TX
P
WR
64pin 0.1" Header Connector
Main cable harness exits
here
BEE3 System (v2)
Control Board
I/O Panel
4 G
B D
DR
2-6
67
DR
AM
4 G
B D
DR
2-6
67
DR
AM
4 G
B D
DR
2-6
67
DR
AM
4 G
B D
DR
2-6
67
DR
AM
5VLXTFF1136
QS
H-D
P-040
QS
H-D
P-040
QS
H-D
P-040
QS
H-D
P-040
4 G
B D
DR
2-6
67
DR
AM
4 G
B D
DR
2-6
67
DR
AM
4 G
B D
DR
2-6
67
DR
AM
4 G
B D
DR
2-6
67
DR
AM
5VLXTFF1136 2
4 p
in A
TX
PW
R
Fujitsu 2x2 CX4
1.0V
1.8V 2.5V
4 G
B D
DR
2-6
67
DR
AM
4 G
B D
DR
2-6
67
DR
AM
4 G
B D
DR
2-6
67
DR
AM
4 G
B D
DR
2-6
67
DR
AM
5VLXTFF1136
Fujitsu 2x2 CX4
4 G
B D
DR
2-6
67
DR
AM
4 G
B D
DR
2-6
67
DR
AM
4 G
B D
DR
2-6
67
DR
AM
4 G
B D
DR
2-6
67
DR
AM
5VLXTFF1136
1.0V 1.0V
1.0V
1.8V
PC
I-Express 8x
GESwitch
64-pin 0.1" Header C
onnector
12VP
WR
PC
I-Express 8x
PC
I-Express 8x
PC
I-Express 8x
PC
Ie 1x
BEE3 Main Board (v3)
QS
H-D
P-04
0
QS
H-D
P-04
0
QS
H-D
P-04
0
QS
H-D
P-04
0
24 pin A
TX
PW
R
Fujitsu 2x2 CX4
Fujitsu 2x2 CX4
PC
I-Exp
ress 8x
64-pin 0.1" H
eade
r Conn
ector12
VP
WR
PC
I-Exp
ress 8x
PC
I-Exp
ress 8x
PC
I-Exp
ress 8x
4 GB
DD
R2-66
7 DR
AM
4 GB
DD
R2-66
7 DR
AM
4 GB
DD
R2-66
7 DR
AM
4 GB
DD
R2-66
7 DR
AM
5VLXTFF1136
4 GB
DD
R2
-667 D
RA
M4 G
B D
DR
2-66
7 DR
AM
4 GB
DD
R2
-667 D
RA
M4 G
B D
DR
2-66
7 DR
AM
5VLXTFF1136
4 G
B D
DR
2-6
67
DR
AM
4 G
B D
DR
2-6
67
DR
AM
4 G
B D
DR
2-6
67
DR
AM
4 G
B D
DR
2-6
67
DR
AM
5VLXTFF1136
4 G
B D
DR
2-66
7 D
RA
M4
GB
DD
R2-
667
DR
AM
4 G
B D
DR
2-66
7 D
RA
M4
GB
DD
R2-
667
DR
AM
5VLXTFF1136
1.0V
1.0V
1.8V
1.0V
1.0V
1.8V 2.5V
RJ45 RJ45
Remaining Issues• Precise EATX compatibility, or not?
– Affects layout complexity, thermal design
• Power supply sizing– We don’t want to leave the overclockers in the lurch
• Standard power supplies (?)– “2U” supplies aren’t as efficient, have fewer vendors– Prefer Intel/Google “12V only” supplies (minimum loading issue),
if available in time and at reasonable cost
• PCI Express is nonstandard– Xilinx hard macro is “device only”, not host– Need an intrepid graduate student– Can still use it for additional Infiniband/CX4 channels
Project Participants and Roles• Microsoft Research (Silicon Valley)
– Funds and manages system engineering• Celestica (Ottawa and elsewhere)
– Does main board engineering, produces final systems.– Microsoft has a very deep relationship with Celestica
• Function Engineering (Palo Alto)– Does thermal and mechanical engineering
• Xilinx (San Jose)– Provides FPGAs for academic machines– Provides FPGA application expertise
• Ramp Group (BWRC)– Control board, basic software
• Ramp Community– Uses the systems for research
Why is Microsoft interested?• We believe the overall RAMP effort will have significant impact, and
want to support it in the most effective way we can.– Simply paying for grad students seems suboptimal
• We observe that universities aren’t very good at this sort of system engineering and production.– Grad students are great for many things, but doing things like board
layout aren’t among them.– Requires deep understanding of tools and production processes. Pros
have this.– We can open doors that academia can’t.– We have experience in managing this sort of program.
• We want the systems themselves– As infrastructure for our new effort in computer architecture (yes, this is
a recruiting pitch).• We also want systems to be available to other industrial users
– This might be more difficult if the systems came from academia.– But we don’t want to be in the hardware business.
Plan, schedule• Generate design spec: 6 weeks
– Scope layout problems and layer count
• Layout and signal integrity: 12 weeks– Parts procurement proceeds in parallel– Will probably do 4-5 prototypes.
• Board fab, test and assembly: 3 weeks• Design verification testing:5 weeks
– This happens at Microsoft or BWRC
• Production can start in Summer ‘07
Discussion?