Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Hardware/Software Co-Design
SRC-7 Programming Basics and Pipelining
Miaoqing HuangUniversity of Arkansas
Fall 2011
1 / 20
Outline
Basics of SRC-7 Programming
Pipelining
2 / 20
Framework of Program Running on SRC-7main.c
map1.mc map2.mc mapn.mc
macrom macron macrop macroq macrox macroy
software
hardware
I The hardware part of an application may be distributed intomultiple bitstream
I Each bitstream is specified by a MAP functionI MAP function is written in high level language, i.e., MAP C
I Complicated operations can be implemented using hardwaremodule
I Multiple modules can be instantiated in a single MAP fileI Data access to memory generally is implemented in MAP C
3 / 20
Basic Flow of MAP Function
ControllerAltera Stratix II
EP2S130
User Logic 1Altera Stratix II
EP2S180
User Logic 2Altera Stratix II
EP2S180
16 Banks of On-Board Memory (64 MB)
7.2 GB/s7.2 GB/s
4.8 GB/s256b
GlobalCommon Memory
1 GB
GlobalCommon Memory
1 GB
4.2GB/s
4.2GB/s
19.2 GB/s
12.8 GB/s
I Each MAP function is defined in a MAP C fileI All the code in MAP C file will be converted into hardware
description languageI Do not support complicated data structure and programming
models, such as recursive callsI No operating system or run time support on MAP processor
I Users need to handle the data communication, data access, anddata operations explicitly
I Basic flow: move data onto OBM→ process data→ move resultback to the main memory
I Small piece of data can be stored on FPGA using Block RAM4 / 20
Where are the data?
Memory
SNAP
μP
PCI-X
Memory
SNAP
μP
PCI-X
Gig Ethernet
etc. Chaining GPIO
DiskStorage Area
NetworkLocal Area
NetworkWide AreaNetwork
ControllerAltera Stratix II
EP2S130
User Logic 1Altera Stratix II
EP2S180
User Logic 2Altera Stratix II
EP2S180
16 Banks of On-Board Memory (64 MB)
7.2 GB/s7.2 GB/s
4.8 GB/s256b
GlobalCommon Memory
1 GB
GlobalCommon Memory
1 GB
4.2GB/s
4.2GB/s
19.2 GB/s
12.8 GB/s
I Data can be stored in main memory (i.e., host memory), globalcommon memory, and on-board memory (OBM)
I Memory systems are separatedI Data transfer between memories is explicit
I Global common memory is accessible to both microprocessor andFPGA
I Data transfer into and from the OBM has to explicitly initiated byuser logic
I On-board memory is the major venue for user logic to store dataI Implemented using SRAMI Supporting pipelined data access with some limitations
5 / 20
More on MAP function#include <libmap.h>
void poly (int n, long long dt_source[], long long dt_res[], int mapno){
......}
I The type of MAP function has to be voidI Use square bracket [] to define an array of data to be
transferredI The size of the data to be transferred is specified by the user
explicitlyI Pointer is still allowed in the MAP function
I Pointer arithmetic is NOT allowedI Scalar variables can be returned using pointers
void poly (long long dt_source[], long long *tproc int mapno){
......
*tproc = x - y;}
6 / 20
More on MAP function#include <libmap.h>
void poly (int n, long long dt_source[], long long dt_res[], int mapno){
......}
I The type of MAP function has to be voidI Use square bracket [] to define an array of data to be
transferredI The size of the data to be transferred is specified by the user
explicitlyI Pointer is still allowed in the MAP function
I Pointer arithmetic is NOT allowedI Scalar variables can be returned using pointers
void poly (long long dt_source[], long long *tproc int mapno){
......
*tproc = x - y;}
7 / 20
Outline
Basics of SRC-7 Programming
Pipelining
8 / 20
PipeliningI A pipeline is a set of data processing elements connected in
series, so that the output of one element is the input of the nextone.
I Each element carries out one part of a whole complicatedoperation
I Pipelining is the commonest technique in hardware design toachieve high performance
9 / 20
Why we need pipelining?I Improve the throughput
I Mechanic shop v.s. Carassembly line
I Mechanic shopI The mechanic needs to do
everythingI It takes hours to fix just one
carI Sometimes it takes
days!!!I Car assembly line
I Many workers worktogether
I Each worker just putsone or more componentsinto the car
I One assembly line canproduce hundreds orthousands of cars per day
10 / 20
Classic Five Stage RISC Pipeline
I Five stages1. Instruction fetch: a 32-bit instruction was fetched from the cache2. Decode: figure out what the function of the instruction3. Execute: carry out the instruction4. Memory Access: access memory in necessary
I Always check cache first if there is one5. Writeback: write result into the register file
11 / 20
Superpipleline in Modern Microprocessor
I The instruction pipeline on Pentium 4 consists of 20 stagesI 20 instructions can be executed simultaneously!!!I The latency of each stage is very short
I The processor can run very high frequency, e.g., 3∼4 GHz
I So, we should be happy. But we are not. Why?I Each instruction performs very basic operations
I E.g., addition, multiplication, bit shiftI A complicated operation may take thousands of instructions
I DES encryption, image processing operationsI Use hardware to design a very long pipeline that can
accommodate one complicated operation
12 / 20
Superpipleline in Modern Microprocessor
I The instruction pipeline on Pentium 4 consists of 20 stagesI 20 instructions can be executed simultaneously!!!I The latency of each stage is very short
I The processor can run very high frequency, e.g., 3∼4 GHzI So, we should be happy. But we are not. Why?
I Each instruction performs very basic operationsI E.g., addition, multiplication, bit shift
I A complicated operation may take thousands of instructionsI DES encryption, image processing operations
I Use hardware to design a very long pipeline that canaccommodate one complicated operation
13 / 20
Superpipleline in Modern Microprocessor
I The instruction pipeline on Pentium 4 consists of 20 stagesI 20 instructions can be executed simultaneously!!!I The latency of each stage is very short
I The processor can run very high frequency, e.g., 3∼4 GHzI So, we should be happy. But we are not. Why?
I Each instruction performs very basic operationsI E.g., addition, multiplication, bit shift
I A complicated operation may take thousands of instructionsI DES encryption, image processing operations
I Use hardware to design a very long pipeline that canaccommodate one complicated operation
14 / 20
Solve the Date Dependence in PipelineI Use shifter registers to save the
unused inputs
15 / 20
Solve the Date Dependence in PipelineI Use shifter registers to save the
unused inputs
16 / 20
Solve the Date Dependence in PipelineI Use shifter registers to save the
unused inputs
17 / 20
Solve the Date Dependence in PipelineI Use shifter registers to save the
unused inputs
18 / 20
Solve the Date Dependence in PipelineI Use shifter registers to save the
unused inputs
19 / 20
Solve the Date Dependence in PipelineI Use shifter registers to save the
unused inputs
20 / 20