31
Octavian Cret Octavian Cret , K , K a a lm lm a a n n Pusztai Cristian Vancea, Pusztai Cristian Vancea, Balint Szente Balint Szente Technical University of Cluj-Napoca, Technical University of Cluj-Napoca, Romania Romania CREC: A Novel CREC: A Novel Reconfigurable Reconfigurable Computing Design Computing Design Methodology Methodology

Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

Embed Size (px)

Citation preview

Page 1: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

Octavian CretOctavian Cret, K, Kaalmlmaan n Pusztai Pusztai Cristian Vancea, Balint SzenteCristian Vancea, Balint Szente

Technical University of Cluj-Napoca, RomaniaTechnical University of Cluj-Napoca, Romania

CREC: A Novel CREC: A Novel Reconfigurable Computing Reconfigurable Computing

Design MethodologyDesign Methodology

Page 2: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

22

IntroductionIntroduction

CREC: low-cost general-purpose CREC: low-cost general-purpose reconfigurable computer;reconfigurable computer;

DynamicallyDynamically generated architecture; generated architecture;

Built in a Hardware/Software CoDesign Built in a Hardware/Software CoDesign manner;manner;

Based on FPGA devices, on VHDL Based on FPGA devices, on VHDL language and high level language (Java);language and high level language (Java);

No need for integration in a dedicated No need for integration in a dedicated VLSI chip.VLSI chip.

Page 3: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

33

CREC’s Main FeaturesCREC’s Main Features

Reconfigurable Reconfigurable RISCRISC computer computer;;

ParallelParallel computer: each register has an computer: each register has an associated Execution Unit (EU)associated Execution Unit (EU);;

All the EUs have an All the EUs have an identicalidentical structure, and structure, and each one is able to execute any kind of each one is able to execute any kind of instruction from the CREC Instruction Setinstruction from the CREC Instruction Set;;

Having a greater number of EUs has the Having a greater number of EUs has the advantage of introducing advantage of introducing Instruction Level Instruction Level ParallelismParallelism..

Page 4: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

44

CREC Design FlowCREC Design Flow

AApppplliiccaattiioonn ssoouurrccee ccooddee

((wwrriitttteenn iinn CCRREECC AAsssseemmbbllyy LLaanngguuaaggee))

PPaarraalllleell CCoommppiilleerr

((ddeetteerrmmiinnaattiioonn ooff tthhee nnuummbbeerr ooff

sslliicceess aanndd iinnssttrruuccttiioonnss sscchheedduulliinngg))

VVHHDDLL ssoouurrccee ccooddee GGeenneerraattoorr

((wwrriitttteenn iinn JJAAVVAA))

VVHHDDLL ffiillee CCoommppiillaattiioonn

FFPPGGAA CCoonnffiigguurraattiioonn

PPrroocceessss

AApppplliiccaattiioonn EExxeeccuuttiioonn

IInntteeggrraatteedd CCRREECC DDeevveellooppmmeenntt SSyysstteemm

Page 5: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

55

The Parallel Compiler (I.)The Parallel Compiler (I.)

Parses the CREC-RISC source codeParses the CREC-RISC source code;;

Takes important decisions upon the execution Takes important decisions upon the execution system that will be generatedsystem that will be generated;;

Divides a program that is written in a sequential Divides a program that is written in a sequential manner into portions of code to be executed at manner into portions of code to be executed at the same time;the same time;

Determines the minimal number of program Determines the minimal number of program slicesslices;;

Determines which instructions will be executed Determines which instructions will be executed in parallel in each slicein parallel in each slice..

Page 6: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

66

The Parallel Compiler (II.)The Parallel Compiler (II.)

Uses a set of rules;Uses a set of rules;

An example: each slice can contain at most one An example: each slice can contain at most one LoadLoad, , StoreStore or or JumpJump instruction; instruction;

Reads the application source code (in CREC Reads the application source code (in CREC assembly language) and generates a file in a assembly language) and generates a file in a specificspecific format, giving a description of the format, giving a description of the tailored CRECtailored CREC;;

The resulting CREC architecture contains only The resulting CREC architecture contains only the hardware needed to execute the subset of the hardware needed to execute the subset of instructions used in the program.instructions used in the program.

Page 7: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

77

Page 8: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

88

Results of the Parallel CompilerResults of the Parallel Compiler

The size of the various functional partsThe size of the various functional parts;;

The subset of instructions involvedThe subset of instructions involved;;

The number of execution unitsThe number of execution units ( (NN););

The sequence of instructions making up The sequence of instructions making up the programthe program;;

The resulting CREC architecture contains The resulting CREC architecture contains only the hardware needed to execute the only the hardware needed to execute the subset of instructions used in the program.subset of instructions used in the program.

Page 9: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

99

Slices Slices

The instructions that are assigned to each The instructions that are assigned to each EU to be executed at a same moment of EU to be executed at a same moment of time make up a program time make up a program sliceslice;;

The whole program is divided into slices;The whole program is divided into slices;

The slice’s size depends on the designed The slice’s size depends on the designed number of execution units used for number of execution units used for program execution.program execution.

Page 10: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

1010

Program sequence, and the instruction scheduling:Program sequence, and the instruction scheduling: [1] MOV R1,2[1] MOV R1,2 [2] MOV R2,3[2] MOV R2,3 [3] MOV R3,3[3] MOV R3,3 [4] ADD R1,R2[4] ADD R1,R2 [5] DEC R3[5] DEC R3 [6] JNZ[6] JNZ R3R3,[,[44]] [7] MOV ST[7] MOV STORORB,R1B,R1 [8] STORE [8] STORE [[200200]]

Program ExampleProgram Example

Classical, non-optimal multiplication of two integers Classical, non-optimal multiplication of two integers without overflow check using three EUswithout overflow check using three EUs

Page 11: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

1111

VHDL Source Code GeneratorVHDL Source Code Generator

VHDL fileVHDL filess contain an already written source contain an already written source code, where the main architecture’s parameters code, where the main architecture’s parameters are given as are given as genericsgenerics and and constantsconstants;;

The following components can be tailored:The following components can be tailored: The number of EUs;The number of EUs; The register’s width in all the EUs;The register’s width in all the EUs; The size of the Instructions Memory and Operands The size of the Instructions Memory and Operands

Memory for each EU;Memory for each EU; The size of the Data Stack and Slice Stack Memory;The size of the Data Stack and Slice Stack Memory; The slice-mapping block, containing instructions.The slice-mapping block, containing instructions.

Page 12: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

1212

CREC General ArchitectureCREC General Architecture

EEUU11 EEUU22

SSlliiccee MMeemmoorryy

SSlliiccee CCoouunntteerr

SSlliiccee SSttaacckk MMeemmoorryy

DDaattaa SSttaacckk MMeemmoorryy

LLooaadd BBuuffffeerr

SSttoorree BBuuffffeerr

DDaattaa MMeemmoorryy

EEUUNN

Addr

AAddddrr

OOppeerraanndd MMeemmoorryy 11

……

AAddddrr

IInnssttrruuccttiioonnss MMeemmoorryy 11

AAddddrr

OOppeerraanndd MMeemmoorryy 22

AAddddrr

IInnssttrruuccttiioonnss MMeemmoorryy 22

AAddddrr

OOppeerraanndd MMeemmoorryy NN

AAddddrr

IInnssttrruuccttiioonnss MMeemmoorryy NN

……

Page 13: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

1313

The Hardware ArchitectureThe Hardware Architecture

The The NN Execution Units; Execution Units;

Instruction Memories;Instruction Memories;

Data Stack Memory (for Data Stack Memory (for PushPush and and PopPop););

Slice Stack Memory (for Slice Stack Memory (for CallCall and and ReturnReturn););

A Slice Program Counter;A Slice Program Counter;

A Slice-mapping Memory;A Slice-mapping Memory;

Store Buffer and Load Buffer;Store Buffer and Load Buffer;

Data Memory (external or internal);Data Memory (external or internal);

Operand Memories.Operand Memories.

Page 14: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

1414

The Instruction SetThe Instruction Set

Relatively Relatively largelarge instruction set, contains instruction set, contains more instructions than the usual more instructions than the usual microcontrollers have;microcontrollers have;

Every instruction performs operation only Every instruction performs operation only on on unsignedunsigned integers; integers;

Each EU is potentially able to execute Each EU is potentially able to execute any any kindkind of instruction from the CREC of instruction from the CREC Instruction Set.Instruction Set.

Page 15: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

1515

AdditionAddition with or without Carry; with or without Carry;

SubtractionSubtraction with or without Borrow and with or without Borrow and comparecompare;;

Logical functions: Logical functions: AndAnd, , OrOr, , XorXor, , NotNot and and Bit Bit TestTest;;

ShiftShift arithmetic and logic to left/right; arithmetic and logic to left/right;

RotateRotate and rotate through Carry to left/right; and rotate through Carry to left/right;

IncrementIncrement//DecrementDecrement and and 2’s Complement2’s Complement..

Data Manipulation InstructionsData Manipulation Instructions

Page 16: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

1616

Instruction Format and ExampleInstruction Format and Example

““GG” defines the Instruction Group (Data Manipulation);” defines the Instruction Group (Data Manipulation);

““CodeCode” is the operation code (ex. Add, Sub);” is the operation code (ex. Add, Sub);

““TypeType” specifies the operation type (ex. with/without Carry);” specifies the operation type (ex. with/without Carry);

““LoadLoad” contains the load signals for the register and for the ” contains the load signals for the register and for the Carry and Zero flags;Carry and Zero flags;

““DD” is the Register/Data selection for the second operand.” is the Register/Data selection for the second operand.

Page 17: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

1717

Program Control InstructionProgram Control Instruction

Slice counter manipulation: Slice counter manipulation: JumpJump, , CallCall and and ReturnReturn;;

Data movement: Data movement: MoveMove;;

Stack manipulation: Stack manipulation: PushPush and and PopPop;;

Input from and Output to port: Input from and Output to port: InIn and and OutOut;;

LoadLoad from and from and StoreStore to external memory; to external memory;

For great flexibility every instruction exists also in For great flexibility every instruction exists also in the conditioned form: the conditioned form: CC ( (CarryCarry), ), ZZ ( (ZeroZero), ), EE ( (EqualEqual), ), AA ( (AboveAbove), ), AEAE ( (Above or EqualAbove or Equal), ), BB ( (BelowBelow), ), BEBE ((Below or EqualBelow or Equal) and with negation too.) and with negation too.

Page 18: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

1818

Instruction Format and ExampleInstruction Format and Example

““GG” defines the Instruction Group (Program Control);” defines the Instruction Group (Program Control);

““CodeCode” is the operation code (ex. Jump, Call);” is the operation code (ex. Jump, Call);

““ConditionsConditions” ” field contains the code for validating the field contains the code for validating the

execution of a given instructionexecution of a given instruction;;““RR” is the load signal for the Register (ex. Move);” is the load signal for the Register (ex. Move);

““DD” is the Register/Data selection for the second operand.” is the Register/Data selection for the second operand.

Page 19: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

1919

The Execution UnitThe Execution Unit

Decoding UnitDecoding Unit – decodes the instruction code; – decodes the instruction code;

Control UnitControl Unit – generates the control signals for – generates the control signals for the Program Control Instruction group;the Program Control Instruction group;

Multiplexer UnitMultiplexer Unit – the second operand of the – the second operand of the binary instructions is multiplexed by this unit;binary instructions is multiplexed by this unit;

Operating UnitOperating Unit – realizes data manipulating – realizes data manipulating operations;operations;

Accumulator UnitAccumulator Unit – stores the instruction result; – stores the instruction result;

Flag UnitFlag Unit – contains the two flag bits: Carry Flag – contains the two flag bits: Carry Flag (CF), and the Zero Flag (ZF) (CF), and the Zero Flag (ZF)

Page 20: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

2020

ZZFF CCFF

FFllaagg UUnniitt RReeggiisstteerr

AAccccuummuullaattoorr

SShhiifftt LLeefftt UUnniitt SSHHLL//RROOLL//NNEEGG

IINNCC//DDEECC//

SShhiifftt RRiigghhtt UUnniitt SSHHRR//RROORR//NNOOTT

LLooggiicc UUnniitt AANNDD//OORR//XXOORR

AArriitthhmmeettiicc UUnniitt AADDDD//SSUUBB

CCaarrrryy GGeenneerraattoorr

OOppeerraattiinngg UUnniitt

RReegg//DDaattaa MMUUXX

RReeggiisstteerr MMUUXX DDaattaa MMUUXX

MMuullttiipplleexxeerr UUnniitt

II mmmm

ee ddii aa

tt ee

OOpp ee

rr aann dd

LLoo aa

dd BB

uu ffff ee

rr

SS tt aa

cc kk

II nnpp uu

tt PP oo

rr tt

RR11

RR22

RRNN

CCoonnttrrooll SSiiggnnaall GGeenneerraattoorr CC

oo nn tt

rr ooll U

Unn i

i tt

JJ MMPP

CC

AALL

LL

RREE

TT

PP UU

SS HH

PP O

OPP

LL

OOAA

DD

SS TT

OORR

EE W

MMOO

VV SS

TTBB

OO

UUTT

RREE

GG// DD

AATT

AA RReeggiisstteerr

VVaalluuee OOppeerraanndd

VVaalluuee

IInnssttrruuccttiioonn CCooddee

CCoonnddiittiioonn GGeenneerraattoorr

CCOONNDDIITTIIOONN BBUUSS

CCOONNDDIITTIIOONN BBUUSS

IInnssttrruuccttiioonn DDeeccooddeerr

DDeeccooddiinngg UUnniitt

EEXX

EECC

UUTT

II OONN

UUNN

II TT

Page 21: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

2121

The Optimized Operating UnitThe Optimized Operating Unit

Symmetrical organization: aSymmetrical organization: at the right side are t the right side are the binary instruction blocks, and at the left side the binary instruction blocks, and at the left side are the unary operation blocks (performing are the unary operation blocks (performing operations only on the accumulator);operations only on the accumulator);

The blocks use The blocks use only one levelonly one level of FPGA slices; of FPGA slices;

All four subunits use the same number of slices;All four subunits use the same number of slices;

Takes advantage of the Fast Carry Lines;Takes advantage of the Fast Carry Lines;

The size of the The size of the Operating Unit is growing Operating Unit is growing linearlylinearly with the word length.with the word length.

Page 22: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

2222

Virtex Optimized Arithmetic UnitVirtex Optimized Arithmetic Unit

The basic 2-bit ADD/SUB cell using the Fast Carry The basic 2-bit ADD/SUB cell using the Fast Carry Lines consumes only one Xilinx VirtexE slice.Lines consumes only one Xilinx VirtexE slice.

Page 23: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

2323

Arithmetic and Logic OpcodesArithmetic and Logic Opcodes

Opcodes of the arithmetic unitOpcodes of the arithmetic unit

Opcodes of the logic unitOpcodes of the logic unit

Where Where LL is the “ is the “Not LoadNot Load” and ” and SS is the “ is the “SubtractSubtract” signal ” signal

Page 24: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

2424

Virtex Optimized Shift Left UnitVirtex Optimized Shift Left Unit

The basic 2-bit SHL/ROL/NEG/INC/DEC cell using The basic 2-bit SHL/ROL/NEG/INC/DEC cell using the Fast Carry Lines consumes only one slice.the Fast Carry Lines consumes only one slice.

Page 25: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

2525

Virtex Optimized Shift Right UnitVirtex Optimized Shift Right Unit

The basic 2-bit SHR/ROR/NOT cell using the Fast The basic 2-bit SHR/ROR/NOT cell using the Fast Carry Lines consumes only one Xilinx VirtexE slice.Carry Lines consumes only one Xilinx VirtexE slice.

Page 26: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

2626

Shift Left and Right OpcodesShift Left and Right Opcodes

Opcodes of the shift left unitOpcodes of the shift left unit

Opcodes of the shift right unitOpcodes of the shift right unit

Where Where SS is the “ is the “ShiftShift” and ” and DD is the “ is the “DecrementDecrement” signal” signal

Where Where SS is the “ is the “ShiftShift” and ” and NN is the “ is the “NotNot” signal” signal

Page 27: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

2727

Shift and Rotate OperationsShift and Rotate Operations

SHLSHL – Shift Left;– Shift Left;

SALSAL – Shift Arithmetic Left;– Shift Arithmetic Left;

ROLROL – Rotate Left;– Rotate Left;

RCLRCL – Rotate through – Rotate through Carry Left.Carry Left.

SHRSHR – Shift Right;– Shift Right;

SARSAR – Shift Arithmetic Right;– Shift Arithmetic Right;

RORROR – Rotate Right;– Rotate Right;

RCRRCR – Rotate through Carry – Rotate through Carry Right.Right.

Page 28: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

2828

Execution Unit ResourcesExecution Unit Resources

A complete Execution Unit (with all the A complete Execution Unit (with all the subunits generated) having 8-bit wide subunits generated) having 8-bit wide accumulator consumes 20 CLBs, that is accumulator consumes 20 CLBs, that is approximately 0.6% of a Xilinx Virtex600E approximately 0.6% of a Xilinx Virtex600E FPGA chip;FPGA chip;

An Execution Unit with 16-bit wide register An Execution Unit with 16-bit wide register consumes 35 CLBs, that is approximately consumes 35 CLBs, that is approximately 1% of the available CLBs.1% of the available CLBs.

Page 29: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

2929

Experimental ResultsExperimental Results

Functional Parallel compiler;Functional Parallel compiler;

Execution Units optimized for Xilinx VirtexE device;Execution Units optimized for Xilinx VirtexE device;

Slice Memory and Stack Memory under test;Slice Memory and Stack Memory under test;

A CREC architecture having 4 EUs with 4-bit wide A CREC architecture having 4 EUs with 4-bit wide registers occupies 4% of the CLBs and 5% of the registers occupies 4% of the CLBs and 5% of the BlockRAMs in the Virtex600E device;BlockRAMs in the Virtex600E device;

A CREC architecture having 4 EUs with 16-bit wide A CREC architecture having 4 EUs with 16-bit wide registers occupies 18% of the CLBs and 20% of the registers occupies 18% of the CLBs and 20% of the BlockRAMs in the Virtex600E device; BlockRAMs in the Virtex600E device;

The operating clock frequency is 100 MHz.The operating clock frequency is 100 MHz.

Page 30: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

3030

Performance evaluationPerformance evaluation

The performance indexes show how many times The performance indexes show how many times faster a given algorithm is executed on an faster a given algorithm is executed on an optimised CREC system than in the case of optimised CREC system than in the case of classical execution flowclassical execution flow

Page 31: Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design

3131

Conclusions and Further WorkConclusions and Further Work

Creating the possibility of writing high-level Creating the possibility of writing high-level programs for CREC;programs for CREC;

Extend the functionalities of the Parallel Extend the functionalities of the Parallel Compiler, then create a C or PASCAL Compiler, then create a C or PASCAL compiler for CREC applications;compiler for CREC applications;

Several variants of CREC architecturesSeveral variants of CREC architectures;;

Hardware distributed computing, using the Hardware distributed computing, using the FPGA configuration over the Internet.FPGA configuration over the Internet.