Upload
jeffrey-harris
View
232
Download
4
Embed Size (px)
Citation preview
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Reconfigurable Architectures• Forces that drive a Reconfigurable Architecture
– Price• Mass production 100K to millions• Experimental 1 to 10’s
– Granularity of reconfiguration• Fine grain• Course Grain
– Degree of system integration/coupling• Tightly• Loosely
All are a function of the application that will run on the Architecture
2 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Example Points in (Price,Granularity,Coupling) Space
Price
$100’s
$1M’s
Granularity
Coarse
Fine
CouplingLoose Tight
Intel /AMD
Int
float
RFU
Processor
PC
ML507
Ethernet
Decode
Exec
Store
3 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
What’s the point of a Reconfigurable Architecture
• Performance metrics– Computational
• Throughput• Latency
– Power• Total power dissipation• Thermal
– Reliability• Recovery from faults
Increase application performance!
4 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Typical Approach for Increasing Performance
• Application/algorithm implemented in software– Often easier to write an application in software
• Profile application (e.g. gprof)– Determine where the application is spending its time
• Identify kernels of interest– e.g. application spends 90% of its time in function
matrix_multiply()• Design custom hardware/instruction to accelerate kernel(s)
– Analysis to kernel to determine how to extract fine/coarse grain parallelism (does any parallelism even exist?)
Amdahl’s Law!
5 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity
6 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Coarse Grain
• rDPA: reconfigurable Data Path Array• Function Units with programmable interconnects
ALU ALU ALU
ALU ALU ALU
ALU ALU ALU
Example
7 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Coarse Grain
• rDPA: reconfigurable Data Path Array• Function Units with programmable interconnects
ALU ALU ALU
ALU ALU ALU
ALU ALU ALU
Example
8 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Coarse Grain
• rDPA: reconfigurable Data Path Array• Function Units with programmable interconnects
ALU ALU ALU
ALU ALU ALU
ALU ALU ALU
Example
9 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Fine Grain
• FPGA: Field Programmable Gate Array• Sea of general purpose logic gates
CLB CLB CLB CLB
CLB CLB CLB CLB
CLB CLB CLB CLB
CLB CLB CLB CLB
Configurable Logic Block
10 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Fine Grain
• FPGA: Field Programmable Gate Array• Sea of general purpose logic gates
CLB CLB CLB
CLB CLB CLB CLB
CLB CLB CLB CLB
CLB CLB CLB CLB
Configurable Logic Block
11 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Fine Grain
• FPGA: Field Programmable Gate Array• Sea of general purpose logic gates
CLB CLB
CLB
CLB
CLB CLB CLB CLB
Configurable Logic Block
12 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Trade-offsTrade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits
1024-bits
2-LUT
10-LUTMicroprocessor
13 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Trade-offsTrade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits
1024-bits
2-LUT
10-LUTMicroprocessor
4
3
3
AB
op3
14 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Trade-offsTrade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits
1024-bits
2-LUT
10-LUTMicroprocessor
4
3
3
AB
op3
4
3
3AB
op3
4
3
3
AB
op3
15 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Trade-offsTrade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits
1024-bits
2-LUT
10-LUTMicroprocessor
4
3
3
AB
op
3
4
3
3AB
op
3
3
3
3
AB
op
16 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Trade-offsTrade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits
1024-bits
2-LUT
10-LUTMicroprocessor
4
3
3
AB
op
3
4
3
3AB
op
3
4
3
3
AB
op
3
17 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Trade-offsTrade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits
1024-bits
2-LUT
10-LUT
Bit logic and constants
18 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Trade-offsTrade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits
1024-bits
2-LUT
10-LUT
Bit logic and constants
(A and “1100”) or (B or “1000”)
19 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Trade-offsTrade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits
1024-bits
2-LUT
10-LUT
Bit logic and constants
(A and “1100”) or (B or “1000”)
A
B
20 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Trade-offsTrade-offs associated with LUT size
Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits
1024-bits
2-LUT
10-LUT
Bit logic and constants
(A and “1100”) or (B or “1000”)
A AND
OR
OR
1
0
B
4
4
It’s much worse, each 10-LUT only has one output
Area that wasrequired using
2-LUTS
21 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: Example Architectures
• Fine grain: GARP
• Course grain: PipeRench
22 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: GARP
CPU RFU
Garp chip
Memory
I-cache D-cache
Configcache
23 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: GARP
CPU RFU
Garp chip
Memory
I-cache D-cache
Configcache
RFUcontrol
(1)Execution(16, 2-bit)
N
PE (Processing Element)
24 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: GARP
CPU RFU
Garp chip
Memory
I-cache D-cache
Configcache
RFUcontrol
(1)Execution(16, 2-bit)
N
PE (Processing Element)Example computations in one cycleA<<10 | (b&c)(A-2*b+c)
25 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: GARP
CPU RFU
Garp chip
Memory
I-cache D-cache
Configcache
Impact of configuration size• 1 GHz bus frequency•128-bit memory bus• 512Kbits of configuration size
On a RFU context switch how longto load a new full configuration?
4 microseconds
An estimate of amount of time for theCPU perform a context switch is ~5 microseconds
~2x increase context switch latency!!
26 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: GARP
CPU RFU
Garp chip
Memory
I-cache D-cache
Configcache
RFUcontrol
(1)Execution(16, 2-bit)
N
PE (Processing Element)
“The Garp Architecture and C Compiler”http://www.cs.cmu.edu/~tcal/IEEE-Computer-Garp.pdf
27 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench • Coarse granularity
• Higher (higher) level programming
• Reference papers• PipeRench: A Coprocessor for Streaming Multimedia Acceleration
(ISCA 1999): http://www.cs.cmu.edu/~mihaib/research/isca99.pdf• PipeRench Implementation of the Instruction Path Coprocessor
(Micro 2000): http://class.ee.iastate.edu/cpre583/papers/piperench_Micro_2000.pdf
28 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
Interconnect
8-bit ALU
Reg file
PE8-bit ALU
Reg file
PE8-bit ALU
Reg file
PE
Interconnect
8-bit ALU
Reg file
PE8-bit ALU
Reg file
PE8-bit ALU
Reg file
PE
8-bit ALU
Reg file
PE8-bit ALU
Reg file
PE8-bit ALU
Reg file
PE
Glo
bal b
us
29 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
PE PE PEPE
PE PE PEPE
PE PE PEPE
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
3
4
30 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
PE PE PEPE
PE PE PEPE
PE PE PEPE
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
3
4
31 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
PE PE PEPE
PE PE PEPE
PE PE PEPE
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
3
4
0
1
32 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
PE PE PEPE
PE PE PEPE
PE PE PEPE
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
3
4
0
1
0
1
2
33 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
PE PE PEPE
PE PE PEPE
PE PE PEPE
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
3
4
0
1
0
1
2
1
2
3
34 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
PE PE PEPE
PE PE PEPE
PE PE PEPE
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
3
4
0
1
0
1
2
1
2
3
2
3
4
35 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
PE PE PEPE
PE PE PEPE
PE PE PEPE
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
3
4
0
1
0
1
2
1
2
3
2
3
4
0
3
4
36 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
PE PE PEPE
PE PE PEPE
PE PE PEPE
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
3
4
0
1
0
1
2
1
2
3
2
3
4
0
3
4
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
37 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
PE PE PEPE
PE PE PEPE
PE PE PEPE
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
3
4
0
1
0
1
2
1
2
3
2
3
4
0
3
4
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
38 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
PE PE PEPE
PE PE PEPE
PE PE PEPE
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
3
4
0
1
0
1
2
1
2
3
2
3
4
0
3
4
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
0
1
39 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
PE PE PEPE
PE PE PEPE
PE PE PEPE
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
3
4
0
1
0
1
2
1
2
3
2
3
4
0
3
4
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
0
1
0
1
2
40 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
PE PE PEPE
PE PE PEPE
PE PE PEPE
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
3
4
0
1
0
1
2
1
2
3
2
3
4
0
3
4
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
0
1
0
1
2
3
1
2
41 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
PE PE PEPE
PE PE PEPE
PE PE PEPE
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
3
4
0
1
0
1
2
1
2
3
2
3
4
0
3
4
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
0
1
0
1
2
3
1
2
3
4
2
42 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
PE PE PEPE
PE PE PEPE
PE PE PEPE
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
3
4
0
1
0
1
2
1
2
3
2
3
4
0
3
4
0
Cycle
Pipelinestage
1 2 3 4 5 6
0
1
2
0
1
0
1
2
3
1
2
3
4
2
3
4
0
43 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Degree of Integration/Coupling • Independent Reconfigurable Coprocessor
– Reconfigurable Fabric does not have direct communication with the CPU
• Processor + Reconfigurable Processing Fabric– Loosely coupled on the same chip– Tightly coupled on the same chip
44 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Degree of Integration/Coupling M
ain M
emory
CPU
Fe
tch
De
code
Execute Me
mory
Write
Back
L1 Cache
L2 Cache
MemoryController
DMAController
I/OController
USB PCI PCI-Express SATA
Hard DriveNIC
ALU
FPU
45 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Degree of Integration/Coupling M
ain M
emory
CPU
Fe
tch
De
code
Execute Me
mory
Write
Back
L1 Cache
L2 Cache
MemoryController
DMAController
I/OController
USB PCI PCI-Express SATA
Hard DriveNIC
ALU
FPU
RPF
46 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Degree of Integration/Coupling M
ain M
emory
CPU
Fe
tch
De
code
Execute Me
mory
Write
Back
L1 Cache
L2 Cache
MemoryController
DMAController
I/OController
USB PCI PCI-Express SATA
Hard DriveNIC
ALU
FPURPF
47 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Degree of Integration/Coupling M
ain M
emory
CPU
Fe
tch
De
code
Execute Me
mory
Write
Back
L1 Cache
L2 Cache
MemoryController
DMAController
I/OController
USB PCI PCI-Express SATA
Hard DriveNIC
ALU
FPU
RPF
ConfigI/F
48 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Degree of Integration/Coupling M
ain M
emory
CPU
Fe
tch
De
code
Execute Me
mory
Write
Back
L1 Cache
L2 Cache
MemoryController
DMAController
I/OController
USB PCI PCI-Express SATA
Hard DriveNIC
ALU
FPU
RPF
ConfigI/F
49 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Degree of Integration/Coupling M
ain M
emory
CPU
Fe
tch
De
code
Execute Me
mory
Write
Back
L1 Cache
L2 Cache
MemoryController
DMAController
I/OController
USB PCI PCI-Express SATA
Hard DriveNIC
ALU
FPU
RPFI/O
ConfigI/F
50 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Degree of Integration/Coupling M
ain M
emory
CPU
Fe
tch
De
code
Execute Me
mory
Write
Back
L1 Cache
L2 Cache
MemoryController
DMAController
I/OController
USB PCI PCI-Express SATA
Hard DriveNIC
ALU
FPURFU
51 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
52 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Next Class
• Reconfiguration Management– Chapter 4
53 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Questions/Comments/Concerns
• Write down– Main point of lecture
– One thing that’s still not quite clear
– If everything is clear, then give an example of how to apply something from lecture
OR
54 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Lecture notes
55 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: PipeRench
• Scheduling virtual stage on to physical• Partial/Dynamically reconfig (each cycle)
56 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Granularity: GARP
• Impact of configuration size on performance• Context switching
• Garp feature• Dynamic reconfigurable• Store multiple configurations in an on chip
cache (4)• One configuration at a time
• Example app mapping to GARP (loop)• Amdahl's Law
The Garp Architecture and C Compiler• http://www.cs.cmu.edu/~tcal/IEEE-Computer-Garp.pdf
57 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Overview• Dimensions
– Price– Granularity– Coupling– To optimize App Performance (compute (throughput, latency),
Power, reliability)• RPF to efficiently implement VICs
– Main picture authors' wants to convey• What’s the point or having a Reconfigure arch
– Example (Increase App performance)• App -> SW/CPU• Profile• ID kernels of intense compute• Design custom hardware/instruction (Amdels law)
– Intel FPL paper, great example for reading by Friday
58 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)
Reconfigurable Architectures• RPF -> VIC (short slide)