Upload
fred-offield
View
216
Download
1
Embed Size (px)
Citation preview
Hardware Compilation
Gordon J. Pace
December 2007
Introduction
Circuits can be used to implement algorithms …
And are obviously as expressive as software …
But would you want to design a circuit for a task by hand?
How can high level languages be automatically compiled to hardware?
But …
Running compiled code as software involves simply putting the program in memory …
But running code compiled to hardware required building the circuit. Or does it?
One can use programmable logic devices (such as FPGAs, field-programmable gate arrays).
Programmable Logic Devices
Not as fast as ASIC (application-specific integrated circuit), and not as big, and not as cheap…
But you can download any circuit onto such a board!
Consists of a large array of gates which can be interconnected as desired.
May include more complex units on-board than simple logic gates.
This lecture
I will be introducing the concepts behind hardware compilation, by starting with a simple imperative parallel language and slowly extend it.
I will describe hardware using Lava – a hardware description language embedded in Haskell.
Which will enable me to show you the code of an actual hardware compiler.
But before we start …
A short reminder about circuit components and a Lava primer
Hardware Components
We will be using the following basic components in our circuits:
and or
not delay
Building a Multiplexer
sel
i0
i1
oo
sel
i0
i1
o = if sel then i1 else i0
mux (sel, (i0, i1)) = or2 (m0, m1) where m0 = and2 (i0, inv sel) m1 = and2 (i1, sel)
Building a Register
o = if write then i else keep the previous value
register (write, i) = o where o = mux (write, (old, i)) old = delay low o
io
write
i
write
o
Back to Hardware Compilation
How can one compile a high-level program into a hardware circuit?
main (input n, pow: int16, output result: int16) { local e: int16;
result := 1; e := pow; while (e > 0) { result := result * num; e := e – 1; }}
n
pow
result
But how should it work?
Option 1:
Execute the algorithm at every clock tickWhat about non-terminating programs?Behaviour is identical in all clock ticksAlgorithms can only describe combinational
circuitsWhat about long combinational logic?How long should a clock tick take?
But how should it work?
Option 2:
Program runs over several clock ticksWhen will the result be available?How will we know that the result is ready?Should the control over clock ticks be in the
hands of the programmer or the compiler?
Mini-Flash
An imperative parallel language with one output variable
Mini-Flash Syntax
program ::= Skip | Delay | Emit | program ; program | if signal then program else
program | while signal do program | program || program
Informal semantics of Mini-Flash
Skip terminates immediately and does nothing.
Delay takes one clock tick to terminate.; is sequential composition.|| is fork-join parallel composition –
execute in parallel, and terminate once both sides have terminated.
Conditionals and loops work as usual.
What about Emit?
The program has only one output wire,Which is always low unless an Emit
instruction has been executed in that clock tick.
The Emit instruction terminates immediately.
Macros
We will be defining macros to avoid giving long, difficult to understand
programs as examples,and to avoid having to deal with function
definitions.
emit1 ≡ Emit; Delay
Example: An Oscillator
oscillator ≡ forever { Delay; emit1 }
where forever is defined as:
forever P ≡ while high do P
Example: Reading Input
Copy the input to the outputcopy in ≡
forever { if in then emit1 else Delay }
Copy input to output but avoiding two sequential high values:safecopy in ≡
forever if in then { emit1; Delay } else
Delay
Example: Waiting for a Signal
Block the program until a signal becomes high:
wait in ≡ while (inv in) do Delay
A final example: A detonator
Two persons get a controller each. Each controller has two buttons, which must be pressed in sequence (possibly with a pause in between) to enable. Both controllers must be triggered to enable the detonator.
detonator ((a, b), (c, d)) ≡ { wait a; wait b } || { wait c; wait
d }; Emit
a
b
c
d
Compiling Mini-Flash
From programs to circuits.
The Problem
Unlike execution of software on traditional processors, hardware in inherently parallel.How do we perform sequential
composition?How do we know a circuit has finished its
computation?
The Shape of Circuits to Come
Solution: The circuits will have extra input start to start their execution and an extra output finish, to mark their termination:
start
finish
emitinputs
Compiling Skip
start
finish
emitlow
Compiling Delay
start
finish
emitlow
Compiling Emit
start
finish
emit
Compiling P;Q
start
finish
emitP
Q
Compiling Conditionals
start
finish
emitP
Q
cond
Compiling Parallel Composition
start
finish
emitP
Q
synchroniser
The Synchroniser
Outputs high when both inputs have become high …
Can be implemented using the other language constructs1:
synchronise (f1, f2) ≡ forever { wait (or2 (f1, f2)); if (and2 (f1, f2)) then Skip else { Delay; wait (or2 (f1, f2))
}; emit1}
1 Well, this synchroniser almost always works …
Compiling Loops
start
finish
emitP
cond
Beware!
start
finish
emitskip
Compiling:
while high do Skip
high
Beware!
start
finish
emit
Compiling:
while high do Skip
high
low
Beware!
start
finish
emit
high
low
Compiling:
while high do Skip
Beware!
start
finish
emit
high
low
Compiling:
while high do Skip
Beware!
start
finish
emitlow
Compiling:
while high do Skip
Beware!
start
finish
emitlow
Compiling:
while high do Skip
Combinational loop!
Beware!
start
finish
emitlow
Compiling:
while high do Skip
To avoid combinational
loops, ensure that bodies of while
loops always take time to execute.
Writing a Mini-Flash Compiler
From programs to circuits in one page of code!
Mini-Flash Syntax
data MiniFlash = Skip | Delay | Emit | MiniFlash :>: MiniFlash | IfThenElse Signal (MiniFlash, MiniFlash) | While Signal MiniFlash | MiniFlash :|: MiniFlash
Mini-Flash in Lava
The type of circuits produced:
type MiniFlashCircuit = Signal -> (Signal, Signal)
The compiler:
compile :: MiniFlash -> MiniFlashCircuit
Compiling Skip
start
finish
emitlow
compile Skip start = (finish, emit) where finish = start emit = low
Compiling Delay
compile Delay start = (finish, emit) where finish = delay low start emit = low
start
finish
emitlow
Compiling Emit
start
finish
emit
compile Emit start = (finish, emit) where finish = start emit = start
Compiling P;Q
compile (p :>: q) start = (finish, emit) where (middle, emit1) = compile p start (finish, emit2) = compile q middle emit = or2 (emit1, emit2)
start
finish
emitP
Q
Compiling Conditional
compile (IfThenElse c (p, q)) start = (finish, emit) where (finish1, emit1) = compile p (and2 (start, c)) (finish2, emit2) = compile q (and2 (start, inv c)) emit = or2 (emit1, emit2) finish = or2 (finish1, finish2)
start
finish
emit
Q
Compiling Parallel Composition
compile (p :|: q) start = (finish, emit) where (finish1, emit1) = compile p start (finish2, emit2) = compile q start emit = or2 (emit1, emit2) finish = synchroniser (finish1, finish2)
start
finish
emit
synchroniser
Compiling Loops
compile (While c p) start = (finish, emit) where start1 = or2 (start, finish1) (finish1, emit) = compile p (and2 (c, start1)) finish = and2 (inv c, start1)
start
finish
emit
Sanity Check for Loop Body
takesTime Skip = FalsetakesTime Emit = FalsetakesTime Delay = TruetakesTime (IfThenElse _ (p, q)) =
takesTime p && takesTime qtakesTime (p :>: q) = takesTime p || takesTime qtakesTime (p :|: q) = takesTime p || takesTime qtakesTime (While _ _) = False
Warning!This may give false negatives
Extending the Language
Assignments
Extending the language to still have one output variable, but set using assignments rather than Emit signals.
Changes the shape of the circuits produced: start
finish
value
inputsassign
Assignments
Extending the language to still have one output variable, but set using assignments rather than Emit signals.
Changes the shape of the circuits produced: start
finish
value
inputsassign
High when the output is being
assigned a value
Contains the value which is being assigned
Assignments
Combining the output of two circuits now requires more logic:
value1
assign1
value2
assign2
value
assign
Assignments
Combining the output of two circuits now requires more logic:
value1
assign1
value2
assign2
value
assign
What happens if two parallel blocks assign to the variable at the same
time?
Assignments
At the top level, we can extract the actual value of the output:
start
finish
value
inputsassign
reg
actual value
Reading the Output Variable
We simply add another input wire (which could be bundled with the inputs we already have) and connect it at the top level:
start
finish
value
inputsassign
reg
actual value
Multiple Output Variables
Can be done by replicating the output wires:
start
finish
emit1inputs emit2
emitn
…
Multiple Output Variables
And replicating the logic used to combine the outputs.
emitP1
emitP2
emitPn
…
emitQ1
emitQ2
emitQn
…
emit1
emit2
emitn
…
Multiple Output Variables
And replicating the logic used to combine the outputs.
emitP1
emitP2
emitPn
…
emitQ1
emitQ2
emitQn
…
emit1
emit2
emitn
…
Note that the number of outputs has to be statically determined
at compile-time!
Other Possible Extensions
Channel communication.Common blocks of code (reusing the
same hardware is not always possible).Adding in-built functions (eg arithmetic).Dynamic variable allocation (well, sort
of).
Conclusions
Hardware Compilation
So algorithms can be compiled directly into circuits.
The extra start and finish wires handle the logic of the program counter in software.
After all, hardware is not so different from software.
Other Issues Arising
Compact placement of gates is an NP-hard problem. How do we decide placement at compile time?
Some compilation/synthesis techniques compile algorithms without explicit timing.
The compilation presented here is very naïve. How can one optimise (number of gates, power consumption, combinational depth, etc)?
A single algorithm can now be realised in a combination of software and hardware. How can one partition code for software-hardware codesign effectively?
Extra slides
Example: Waiting for an edge
Detect a rising edge:
rising in ≡ forever { wait (inv t); wait t; emit1 }
A falling edge:
falling in ≡ rising (inv in)Any edge:
edge in ≡ rising in || falling in