20
Hardware/Software Co-Design SRC-7 Programming Basics and Pipelining Miaoqing Huang University of Arkansas Fall 2011 1 / 20

Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Hardware/Software Co-Design

SRC-7 Programming Basics and Pipelining

Miaoqing HuangUniversity of Arkansas

Fall 2011

1 / 20

Page 2: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Outline

Basics of SRC-7 Programming

Pipelining

2 / 20

Page 3: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Framework of Program Running on SRC-7main.c

map1.mc map2.mc mapn.mc

macrom macron macrop macroq macrox macroy

software

hardware

I The hardware part of an application may be distributed intomultiple bitstream

I Each bitstream is specified by a MAP functionI MAP function is written in high level language, i.e., MAP C

I Complicated operations can be implemented using hardwaremodule

I Multiple modules can be instantiated in a single MAP fileI Data access to memory generally is implemented in MAP C

3 / 20

Page 4: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Basic Flow of MAP Function

ControllerAltera Stratix II

EP2S130

User Logic 1Altera Stratix II

EP2S180

User Logic 2Altera Stratix II

EP2S180

16 Banks of On-Board Memory (64 MB)

7.2 GB/s7.2 GB/s

4.8 GB/s256b

GlobalCommon Memory

1 GB

GlobalCommon Memory

1 GB

4.2GB/s

4.2GB/s

19.2 GB/s

12.8 GB/s

I Each MAP function is defined in a MAP C fileI All the code in MAP C file will be converted into hardware

description languageI Do not support complicated data structure and programming

models, such as recursive callsI No operating system or run time support on MAP processor

I Users need to handle the data communication, data access, anddata operations explicitly

I Basic flow: move data onto OBM→ process data→ move resultback to the main memory

I Small piece of data can be stored on FPGA using Block RAM4 / 20

Page 5: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Where are the data?

Memory

SNAP

μP

PCI-X

Memory

SNAP

μP

PCI-X

Gig Ethernet

etc. Chaining GPIO

DiskStorage Area

NetworkLocal Area

NetworkWide AreaNetwork

ControllerAltera Stratix II

EP2S130

User Logic 1Altera Stratix II

EP2S180

User Logic 2Altera Stratix II

EP2S180

16 Banks of On-Board Memory (64 MB)

7.2 GB/s7.2 GB/s

4.8 GB/s256b

GlobalCommon Memory

1 GB

GlobalCommon Memory

1 GB

4.2GB/s

4.2GB/s

19.2 GB/s

12.8 GB/s

I Data can be stored in main memory (i.e., host memory), globalcommon memory, and on-board memory (OBM)

I Memory systems are separatedI Data transfer between memories is explicit

I Global common memory is accessible to both microprocessor andFPGA

I Data transfer into and from the OBM has to explicitly initiated byuser logic

I On-board memory is the major venue for user logic to store dataI Implemented using SRAMI Supporting pipelined data access with some limitations

5 / 20

Page 6: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

More on MAP function#include <libmap.h>

void poly (int n, long long dt_source[], long long dt_res[], int mapno){

......}

I The type of MAP function has to be voidI Use square bracket [] to define an array of data to be

transferredI The size of the data to be transferred is specified by the user

explicitlyI Pointer is still allowed in the MAP function

I Pointer arithmetic is NOT allowedI Scalar variables can be returned using pointers

void poly (long long dt_source[], long long *tproc int mapno){

......

*tproc = x - y;}

6 / 20

Page 7: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

More on MAP function#include <libmap.h>

void poly (int n, long long dt_source[], long long dt_res[], int mapno){

......}

I The type of MAP function has to be voidI Use square bracket [] to define an array of data to be

transferredI The size of the data to be transferred is specified by the user

explicitlyI Pointer is still allowed in the MAP function

I Pointer arithmetic is NOT allowedI Scalar variables can be returned using pointers

void poly (long long dt_source[], long long *tproc int mapno){

......

*tproc = x - y;}

7 / 20

Page 8: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Outline

Basics of SRC-7 Programming

Pipelining

8 / 20

Page 9: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

PipeliningI A pipeline is a set of data processing elements connected in

series, so that the output of one element is the input of the nextone.

I Each element carries out one part of a whole complicatedoperation

I Pipelining is the commonest technique in hardware design toachieve high performance

9 / 20

Page 10: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Why we need pipelining?I Improve the throughput

I Mechanic shop v.s. Carassembly line

I Mechanic shopI The mechanic needs to do

everythingI It takes hours to fix just one

carI Sometimes it takes

days!!!I Car assembly line

I Many workers worktogether

I Each worker just putsone or more componentsinto the car

I One assembly line canproduce hundreds orthousands of cars per day

10 / 20

Page 11: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Classic Five Stage RISC Pipeline

I Five stages1. Instruction fetch: a 32-bit instruction was fetched from the cache2. Decode: figure out what the function of the instruction3. Execute: carry out the instruction4. Memory Access: access memory in necessary

I Always check cache first if there is one5. Writeback: write result into the register file

11 / 20

Page 12: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Superpipleline in Modern Microprocessor

I The instruction pipeline on Pentium 4 consists of 20 stagesI 20 instructions can be executed simultaneously!!!I The latency of each stage is very short

I The processor can run very high frequency, e.g., 3∼4 GHz

I So, we should be happy. But we are not. Why?I Each instruction performs very basic operations

I E.g., addition, multiplication, bit shiftI A complicated operation may take thousands of instructions

I DES encryption, image processing operationsI Use hardware to design a very long pipeline that can

accommodate one complicated operation

12 / 20

Page 13: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Superpipleline in Modern Microprocessor

I The instruction pipeline on Pentium 4 consists of 20 stagesI 20 instructions can be executed simultaneously!!!I The latency of each stage is very short

I The processor can run very high frequency, e.g., 3∼4 GHzI So, we should be happy. But we are not. Why?

I Each instruction performs very basic operationsI E.g., addition, multiplication, bit shift

I A complicated operation may take thousands of instructionsI DES encryption, image processing operations

I Use hardware to design a very long pipeline that canaccommodate one complicated operation

13 / 20

Page 14: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Superpipleline in Modern Microprocessor

I The instruction pipeline on Pentium 4 consists of 20 stagesI 20 instructions can be executed simultaneously!!!I The latency of each stage is very short

I The processor can run very high frequency, e.g., 3∼4 GHzI So, we should be happy. But we are not. Why?

I Each instruction performs very basic operationsI E.g., addition, multiplication, bit shift

I A complicated operation may take thousands of instructionsI DES encryption, image processing operations

I Use hardware to design a very long pipeline that canaccommodate one complicated operation

14 / 20

Page 15: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Solve the Date Dependence in PipelineI Use shifter registers to save the

unused inputs

15 / 20

Page 16: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Solve the Date Dependence in PipelineI Use shifter registers to save the

unused inputs

16 / 20

Page 17: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Solve the Date Dependence in PipelineI Use shifter registers to save the

unused inputs

17 / 20

Page 18: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Solve the Date Dependence in PipelineI Use shifter registers to save the

unused inputs

18 / 20

Page 19: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Solve the Date Dependence in PipelineI Use shifter registers to save the

unused inputs

19 / 20

Page 20: Hardware/Software Co-Designmqhuang/courses/5013/f2011/... · Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc macro m macro n macro p macro q macro x macro

Solve the Date Dependence in PipelineI Use shifter registers to save the

unused inputs

20 / 20