View
66
Download
0
Category
Preview:
DESCRIPTION
ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications. Hardware assisted Simulated Annealing. Use FPGA to perform FPGA placement Take advantage of parallelism and specialization Some limitations Global view of cost Convergence Scalability Lots of benefits - PowerPoint PPT Presentation
Citation preview
Lecture 13: Reconfigurable Computing Applications October 10, 2013
ECE 636
Reconfigurable Computing
Lecture 11
Reconfigurable Computing Applications
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Hardware assisted Simulated Annealing
° Use FPGA to perform FPGA placement
° Take advantage of parallelism and specialization
° Some limitations• Global view of cost
• Convergence
• Scalability
° Lots of benefits• Massive parallelism
• Self-contained reconfigurable system
Courtesy: Wrighton/DeHon
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Systolic Architectures
Memory
Bottleneck
Compute
Compute
Compute
Memory
Compute
Memory
Compute
Memory
Compute
Memory
Compute
Memory
Compute
Memory
Compute
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Strategy° Reformulate simulated annealing allowing only
local swaps
° Consider all swaps in parallel
° Maintain information in “systolic cells”• Represent current placement spatially
• Construct hardware to operate on entire placement at once
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Local Swaps
Local Communication
Local Swaps
Massively Parallel Operation
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Individual Swap Element
myX, myY
counter
myIDFanout0(id, x, y)
PosChain(id, x, y)
Fanout2(id, x, y)
Fanout2N(id, x, y)
Fanin0(id, x, y)
Fanin1(id, x, y)
Fanin2(id, x, y)
FaninN(id, x, y)
Position chain in
Left data in
Position chain out
Right Data In
Right data out
Left data out
Fanout1(id, x, y)
Up data in/out
Down Data in/out
RandomnessArithmetic
Unit
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Linear Wirelength Improvement
Apex4 Benchmark
0
20000
40000
60000
80000
100000
120000
140000
160000
0 100000 200000 300000 400000 500000 600000 700000
Clock Cycles
Me
tric
0
0.2
0.4
0.6
0.8
1
P
Metric
P
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Choosing 400 Cooling Steps
0.8
1
1.2
1.4
1.6
1.8
2
1 .02√N .04√N .06√N .08√N .10√N .12√N .14√N .16√N .18√N .20√N
swapsPerInterval
No
rma
lize
d L
ine
ar
Wir
ele
ng
th M
etr
ic
alu4.net
apex2.net
apex4.net
bigkey.net
clma.net
des.net
diffeq.net
dsip.net
elliptic.net
ex1010.net
ex5p.net
frisc.net
misex3.net
pdc.net
s298.net
s38417.net
s38584.1.net
seq.net
spla.net
tseng.net
Lecture 13: Reconfigurable Computing Applications October 10, 2013
VPR Comparison Methodology
Netlists fromFPGA Place andRoute Challenge
SystolicPlacementAlgorithm
vpr
Router
Placed Design
vpr -fastPlacer
ConfigurationOptions
Record Statistics (channel utilization,critical path delay)
Routed Design Routed Design
vpr
Router
Placed Design
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Speedups
° VPR on 2.2 GHz Xeon Workstation
° 500x for ex5p• 18% channel growth
° 1200x for spla• 41% channel growth
° More opportunity for speedups with better cooling schedules
° Better quality with better cost functions
° Feasible on a Virtex2000E part
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Networking Application: Reconfigurable Firewall
° Networking hardware well suited for reconfigurable hardware
• Target signatures change often
• Massive quantities of stream-based data
• Repetitive operations
° Connecting up to a realistic networking environment is hard• Washington University experimental setup one of the best
• Shows importance of both memory and processing capability
° Numerous experiments performed over the past five years
Courtesy: Lockwood
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Network Routing
• FPGAs popular in network hardware
• New protocols implemented directly in silicon
• Easy to upgrade in the field
• Washington University Gigabit Switch (WUGS)
- Switch provides up to 160 Gbps of bandwidth.
Lecture 13: Reconfigurable Computing Applications October 10, 2013
FPGA-based Router
• FPX module contains two FPGAs
• NID – network interface device
- Performs data queuing
• RAD – reprogrammable application device
- Specialized control sequences
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Reconfigurable Data Queuing
• Data may be congested.
• FPGA can be programmed for virtual channels.
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Hardware Setup
• Stacked boards part of system
• Scalable to multiple boards
• Allows for cooling, power.
Lecture 13: Reconfigurable Computing Applications October 10, 2013
IP Lookup Function• RAD can be used to evaluate packet headers.
• Headers evaluated in groups of four bits
Lecture 13: Reconfigurable Computing Applications October 10, 2013
FPX Hardware Platform
PR
OM
Cache
Prog
ram
Flo
wB
uffer
Ro
ute
Filter
Exte
ns
ible
Mo
du
les
Layered Protocol Wrappers
Switch
SD
RA
MS
RA
M
SD
RA
MS
RA
M
Co
nfig
NID (FPGA)
Memory
Network Interface
RAD (FPGA)
FPX Block Diagram FPX Photo
PR
OM
Cache
Prog
ramC
acheP
rogram
Flo
wB
uffer
Ro
ute
Filter
Exte
ns
ible
Mo
du
les
Layered Protocol Wrappers
Switch
SD
RA
MS
RA
M
SD
RA
MS
RA
M
Co
nfig
NID (FPGA)
Memory
Network Interface
RAD (FPGA)
FPX Block Diagram FPX Photo
Lecture 13: Reconfigurable Computing Applications October 10, 2013
FPX Hardware in WUGS-20 Switch
Lecture 13: Reconfigurable Computing Applications October 10, 2013
System-On-Chip Firewall
Layered Protocol Wrappers
Interfaces to Off-Chip Memories
PayloadScanner
TCAMFilter
FlowBuffer
Queue Manager
Datainputfrom
GigabitEthernet
or SONET
Line Card
Free List Manager
SRAM 1Controller
SDRAM 1Controller
PacketScheduler
Dataoutput
To switch,Gigabit
Ethernet,or
SONETLine Card
Payload Match Bits Flow ID
ExtensibleModule(s)
SDRAM 2Controller
Xilinx XCV2000E FPGA
Layered Protocol Wrappers
Interfaces to Off-Chip Memories
PayloadScanner
TCAMFilter
FlowBuffer
Queue Manager
Datainputfrom
GigabitEthernet
or SONET
Line Card
Free List Manager
SRAM 1Controller
SDRAM 1Controller
PacketScheduler
Dataoutput
To switch,Gigabit
Ethernet,or
SONETLine Card
Payload Match Bits Flow ID
ExtensibleModule(s)
SDRAM 2Controller
Xilinx XCV2000E FPGA
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Content Matching Module
regex_app(given)
32
dataen_out_appld_out_appl
sof_out_appleof_out_applsod_out_appltca_out_appl
clkreset_lenable_l
dataen_appl_ind_appl_insof_appl_ineof_appl_insod_appl_intca_appl_in
Matched
ready_l
32
8To extended Bits of CAM
To existingMP1 circuit
FromProtocol
Wrappers
wrapper_module.vhd
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Packet matching w/ Content Addressable Memory
° Sample Packet:- Source Address = 128.252.5.5 (dotted.decimal)
- Destination Address = 141.142.2.2 (dotted.decimal)
- Source Port = 4096 (decimal)
- Destination Port = 50 (decimal)
- Protocol = TCP (6)
- Payload = “Consolidate your loans. CALL NOW”
– Payload Lists = { General SPAM (0), Save Money SPAM (1) }
– Content Vector = “00000011” (binary) = x”03” (hex)
7103 3971
Src IP (hex) =80FC0505
Dest IP (hex) =8D8E0202
SrcPort = 1000
Dest Port =0050
Proto= 06
084072
All values shown In hex
Con-tent= 03
111 104
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Sample Filter
- Source Address = 128.252.0.0 / 16
- Destination Address = 141.142.0.0 / 16
- Source Port = Don’t Care
- Destination Port = 50
- Protocol = TCP (6)
- Payload includes general SPAM (List 0)
7103 3971
Src IP (hex) =80FC0505
Dest IP (hex) =8D8E0202
SrcPort = 1000
Dest Port =0050
Proto= 06
084072
Src IP value =80FC0000
Dest IP (hex) =8D8E0000
SrcPort = 0000
Dest Port =
50
Proto= 06
Src IP (hex) =FFFF0000
Dest IP (hex) =FFFF0000
SrcPort = 0000
Dest Port =FFFF
Proto= FF
Value
Mask: 1=care0=don’t care
IP Packet
Con-ten t=
01
Con-ten t=
01
Con-tent== 03
DROP the packet : It matches the filter
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Packet Classifier with FlowID
CAM MASK [1]
CAM VALUE [1]
CAM MASK [2]
CAM VALUE [2]
CAM MASK [3]
CAM VALUE [3]
CAM MASK [N]
CAM VALUE [N]
Flow ID [1]112 bits
Flow ID [2]
Flow ID [3]
Flow ID [N]
Flow ID
. . .. . .
. . .
16 bits
Value Comparators
Mask Matchers
Priority Encoder
Resulting Flow
Identifier
Flow List
Source Address Destination Address
16 bits
Payload Match Bits
Source Port
Dest.Port
Protocol
- - CAM Table - -
Bits in IP Header
Lecture 13: Reconfigurable Computing Applications October 10, 2013
Other Modules Implemented
° IPv4 CAM Filter• 104 Bit header matching
° Fast IP Lookup (FIPL)• Longest Prefix Match
• MAE-West at 10M pkts/second
° Packet Content Scanner• Reg. Expression Search
° Data Queueing• Per-flow queue in SDRAM
° IPv6 Tunneling Module• Tunnels IPv6 over IPv4
° Statistics Module• Event counter
° Traffic Generator• Per-flow mixing
° Video Recoder• Motion JPEG
° Embedded Processor• KCPSM
Recommended