AND/OR gates, Boolean algebra - Massey Universitymjjohnso/notes/59233/all.doc · Web viewPhase 1: Analyse the problem in terms of its inputs Generate an architecture Phase 2: Design

Combinatorial DevicesA digital circuit can be in one of two states 0 or 1.

0-1 volts -> logic 02-5 volts -> logic 1

A gate is a simple electronic device that can be used to compute various combinations of logic states.

A transistor can be thought of as a simple switch either closed or open.

Gates are made from transistors:

Invertor or NOT gate

NAND gate

NOR gate

159.233 Lecture 1 - 1

Logic symbols used for these combinations are:

A X0 11 0

A

B

X

NAND

A B X0 0 10 1 11 0 11 1 0

A

B

X

NOR

A B X0 0 10 1 01 0 01 1 0

159.233 Lecture 1 - 2

A

B

X

AND

A B X0 0 00 1 01 0 01 1 1

A

B

X

OR

A B X0 0 00 1 11 0 11 1 1

A NAND or a NOR gate can be built from just two transistors

AND or OR require three, so most logic is built from NAND’s and NOR’s

All logic circuits can be built just from NAND gates (or NOR)

eg an invertor:

A

1

X

NAND

159.233 Lecture 1 - 3

Boolean AlgebraAnother way to describe circuits is by using Boolean Algebra.

Variables (normally capital letters) can be either 0 or 1 and can be combined by :

AB -> A and B (sometimes written AB)

A + B -> A or B

-> not A

From an arbitrary truth table, we can generate a Boolean expression:

A B C0 0 10 1 01 0 11 1 1

Look at each row that produces a 1 in the C column. Form the expression using AND and NOT that generates a 1. OR all the rows that produce a 1 together:

C = + A + AB

To implement the function as a circuit:

159.233 Lecture 1 - 4

Use standard ANDs and ORs to start with. Use more than two inputs if necessary.

Convert to NANDs and NORs when finished.

Inverting circles can be added to/removed from either end of a line.

3 input devices can be formed from 2 input devices with the output combined with the third input

is the same as:

The following equivalence’s also allow us to simplify our circuits:

which are a result of De Morgan’s Law = +

Cache memories

Large memories DRAM (dynamic ram) are slow ~100ns

compared to our pipeline time of ~10ns

159.233 Lecture 1 - 5

Small memories SRAM (static ram) are fast ~10ns

but are ~10 times as expensive.

Memory is built using a hierarchy,

disk being the slowest and cheapest

then DRAM

then SRAM the fastest and most expensive.

A library is a good example of a memory hierarchy.

It is very important to have access to all books in a library, but at any one time you will only need a few books on one particular subject, and of those books maybe a few pages in each.

If you leave the books open, on your desk, at these particular pages, you have very fast access to the information.

If you are prepared to turn the pages you get slower access to other information in the book.

If you are prepared to go to the library you get even slower access to lots more information.

159.233 Lecture 1 - 6

The same principle applies to memory.

The Principle of Locality states that programs access a relatively small portion of their address space at any instant of time.

Temporal locality - if an item is referenced, it will tend to be referenced again soon

Spatial locality - if an item is referenced, items whose addresses are close by will tend to be referenced soon.

A Hit is when the processor requests data that is in the faster memory (cache).

Hit Rate is the percentage of accesses that are found in the cache.

Hit Time is the time taken to access the cache.

A Miss is when the processor requests data that is not in the cache.

Miss Rate is (1 - Hit Rate).

Miss Penalty is the time taken to move data from memory to cache.

The Block Size is the amount of data that is transferred from memory to cache when a miss occurs.

Typical values are:

Block size 4 to 128 bytesHit time 1 cycleMiss Penalty 8 cyclesMiss Rate 10%Cache Size 256KB

159.233 Lecture 1 - 7

When the processor wants to access data x5:

It first decides whether it is already in the cache.

If it isn’t it retrieves x5 from memory and inserts it in the cache.

X4X1

X3

X2

X4X1

X3

X2X5

cache before

cache after

How does the processor know if the data is in the cache?

How does the processor know its position in the cache?

The cache is ordered so that there is a relation between memory addresses and cache structure.

159.233 Lecture 1 - 8

Direct mapping

The position or index of a data item in the cache is determined directly from its memory address.

Consider a cache of 8 entries each entry 32 bits wide.

Direct mapping will place data from memory into locations in the cache at positions corresponding to the low-order 3 bits of their memory address:

location data cache index000000 23 000000001 44 001000010 13 010000011 48 011000100 17 100000101 56 101000110 44 110000111 22 111001000 39 000..010101 29 101..111111 33 111

the 3 low order bits give the position in the cache.

When the processor is first switched on, the cache will not contain valid data. So a valid bit is associated with each cache location, and is set when data is written.

159.233 Lecture 1 - 9

Lots of memory locations are mapped to the same cache location.

We also need to know which location is currently in the cache.

We do this with a cache tags.

The cache tag is the remaining bits of the address:

cache cache cache correspondingtag index data memory000 000 23 000000100 001 28 100001011 010 34 011010000 011 48 000011000 100 17 000100010 101 29 010101100 110 92 100110111 111 33 111111

To discover whether a data reference hits or misses the cache:

1 find the cache index of the address

2. match the cache tag with the cache tag part of the address

100110cache index is 110 -> cache tag is 100cache tag part of 100110 is 100100 == 100 -> hit

010011cache index is 011 -> cache tag is 000cache tag part of 010011 is 010000 != 010 -> miss

159.233 Lecture 1 - 10

To increase the size of the cache we can either increase the number of positions, or increase the size of the data stored at those positions

Example of a cache with 32 entries each 32 bytes of data:

cache tagcache index

byte select

31 9 4 0

Valid cache tagcache data

0131

2 110237

36

0

1

2

31

102371

== ?

Increasing the number of bytes for each cache entry involves a penalty when there is a miss. More bytes need to be copied from memory into the cache. Spatial locality means that this is not necessarily a bad thing!

Using the cache index to find the correct data line before comparing the cache tag involves two levels of computation.

159.233 Lecture 1 - 11

A fully associative cache combines the cache index with the cache tag.

cache tagbyte select

31 4 0

Valid cache tagcache data

0131

1482933

364829331 == ?

7562 == ?

33561 == ?

11335 == ?1

1

1

To find out whether the data is in the cache:

compare the cache tag part of the memory address with all the cache tags in the cache.

This matching is done in parallel in a fully associative cache.

159.233 Lecture 1 - 12

Cache replacement:

In a directly mapped cache:

If there is a cache miss then the cache is updated and the new entry replaces the old.

In a fully associative cache:

If there is a cache miss then the new entry can be placed anywhere.

Where do we put it?

Using the Least Recently Used algorithm - hardware keeps track of which cache entry has not been accessed for the longest time.

This can be tricky to implement!

159.233 Lecture 1 - 13

A simple pseudo LRU algorithm is:

Keep an index.If the cache at that value is accessed add one to the index.If a replacement is needed choose the cache entry at the index.

Cache writes:

Reading cache is straight forward. Writing to cache is not.

How do we keep cache data and memory data the same?

Write Back policy:

Keep an extra bit in each cache entry to mark whether the entry has been written to.

When the cache entry is replaced, write it back to memory if it has changed.

Write Through policy:

Always write to memory as well as to cache.Use a write buffer between memory to speed up memory access.

Full Adder

A full adder adds two binary numbers (A,B) together and includes provision for a carry in bit (Cin) and a carry out bit (Cout).

The truth table for a full adder is:

A B Cin Cout Sum0 0 0 0 0

159.233 Lecture 1 - 14

0 0 1 0 10 1 0 0 10 1 1 1 01 0 0 0 11 0 1 1 01 1 0 1 01 1 1 1 1

->Sum = A BCin + AB Cin + A B Cin + ABCinCout = ABCin + A BCin + AB Cin + ABCin

159.233 Lecture 1 - 15

DecoderA decoder accepts a binary encoded number as input and puts a logic 1 on the corresponding output line.

For 2 inputs -> 4 output lines3 inputs -> 8 output lines

eg for 3 inputs with the signal 101 on them:the 5’th output line will be 1, the rest 0

DecoderInputs

Outputs

I1 I0 A B C D0 0 1 0 0 0 A = I1 I00 1 0 1 0 0 B = I1 I01 0 0 0 1 0 C = I1 I01 1 0 0 0 1 D = I1 I0

AI1

B

C

D

I0

I0

I0 I0I1 I1

I1

159.233 Lecture 1 - 16

Decoder with output enable

DecoderInputs Outputs

OE

All outputs will be 0 unless OE (Output Enable) is Low

OE I1 I0 A B C D0 0 0 1 0 0 0 A = I1 I0 OE0 0 1 0 1 0 0 B = I1 I0 OE0 1 0 0 0 1 0 C = I1 I0 OE0 1 1 0 0 0 1 D = I1 I0 OE1 0 0 0 0 0 01 0 1 0 0 0 01 1 0 0 0 0 01 1 1 0 0 0 0

Design a simple decoder for 4 bit numbers....

159.233 Lecture 1 - 17

Multiplexor

Chooses one input from many and copies it to the output.

Inputs

Control inputs address

Output

OE

Common configurations are:

4 inputs - 2 address lines8 inputs - 3 address lines

The Output Enable line is the same as before

The truth table becomes unwieldy with so many variables, and is mostly empty, so we cut out all the uninteresting cases:

OE C1 C0 Q0 0 0 I00 0 1 I10 1 0 I20 1 1 I3

Q = OE ( C1 C0I0 + C1C0I1 + C1 C0I2 + C1C0I3)

159.233 Lecture 1 - 18

C0C1 C1 C0

I0

I1

I2

I3

OE

159.233 Lecture 1 - 19

Demultiplexor

1 input is copied to one of several outputs

Common configurations are:

1 input - 2 address lines - 4 outputs1 input - 3 address lines - 8 outputs

Input

Control inputs address

Outputs

Consider the decoder. It is equivalent, if we replace the input with OE

Glitches

So far we have assumed that all logic devices are infinitely fast.

Outputs immediately reflect inputs.

In practice this is not the case. There is a short delay between an input logic level changing and the corresponding output change.

Gate delays for TTL are typically 5 nanoseconds.

20 cm of wire will also delay a signal by 1 nanosecond.

159.233 Lecture 1 - 20

A1

A

A + A = TRUE

However consider what happens when the signal A goes from 1 to 0

A

A

A + A

This spurious 0 is called a glitch.

These glitches may or may not have disastrous effects.

It is possible to design circuits that have no glitches. This requires extra hardware and considerable effort.

Another approach is to design circuits that are insensitive to glitches. Wait a fixed time after a change - during which the glitches will go away.

This idea is the basis for clocked circuits that we will come to later.

159.233 Lecture 1 - 21

Memory

Feedback

If we loop the output of a circuit back into its input, we have a condition called feedback, and some interesting results are obtained.

This circuit will oscillate rapidly

Will eventually settle on one input 0 and the other 1 (either way)

159.233 Lecture 1 - 22

RS Flip-flopS

R

X

Q

Consider what happens when S and R are both 0

If Q is 0:the top NOR will have 0 0 as inputs and will output 1the bottom NOR will have 1 0 as inputs and will output 0

=> the state is stable

if Q is 1:the top NOR will have 0 1 as inputs and will output 0the bottom NOR will have 0 0 as inputs and will output 1

=> the state is stable

If S is switched to a 1the top NOR will have 1 X as inputs and will output 0the bottom NOR will have 1 0 as inputs and will output 1

=> Q is forced to be a 1 (S stands for Set)

If R is switched to a 1the bottom NOR will have 1 X as inputs and will output 0

=> Q is forced to be a 0 (R stands for Reset)

159.233 Lecture 1 - 23

RS Flip-flops are useful for switch debouncing.

Vcc

Switches make and break several times while in close contact.

S

R

X

Q

Vcc

Q will be Set to 1 if switch makes contact with S

Q will stay at 1 despite make/breaks on S until switch touches R

159.233 Lecture 1 - 24

Circuits can use a clock to provide timing pulses. Logic levels can be read on the clock pulse, which is allowed to go high only when glitches are unlikely.

The RS Flip-flop can be clocked.

S

R

X

Q

Clock

The output of the AND gates will only reflect S and R when the Clock is 1

When the Clock is 0 the outputs of the AND gates are always 0

The X output of a RS Flip-flop is normally Q

One problem with the RS Flip-flop is what happens when R and S are 1.

In this state the only stable state is both Q and Q 0 !

If both R and S return to 0 simultaneously Q jumps to 1 or 0 at random

otherwise Q will reflect which one returned to 0 last.

159.233 Lecture 1 - 25

Clocked D Flip-flopThe D flip-flop removes this ambiguity

DQ

Q

Clock

The Clock can now be considered a Strobe. A single clock pulse.

When the strobe is high, the value of D will be transferred to Q

Q will remain at that value until the strobe returns to a high state.

The D Flip-flop acts as a simple 1 bit memory.

As described, this type of flip-flop is level triggered. It depends on the strobe or clock being in a high state.

In practice this is not so desirable. It is better to transfer the value of D to the flip-flop when the strobe changes from a 0 to a 1

This is called edge triggered.

Edge triggered flip-flops are far more common.

They can be either 0 ->1 rising edge, or 1->0 falling edge

D

Clock

Q

159.233 Lecture 1 - 26

Clocked J-K Flip-flop

Rather than removing the ambiguity of the RS flip-flop as in the D flip-flop, the J-K flip-flop makes use of it.

R Q

S Q

J

K

J is equivalent to s. When J = 1 => Q becomes 1K is equivalnet to R. When K = 1 => Q becomes 0

When J=K=1 => Q becomes Q

The J-K flip-flop toggles the output when J=K=1.

Given a regular clock and J=K=1 the Q output will be a clock with half the frequency.

159.233 Lecture 1 - 27

Register

We construct a register by connecting together a group of D flip-flops

D Q D Q D Q D Q

Data in

Q0Q1Q2Q3

All the flip-flops load simultaneously from the data in lines

159.233 Lecture 1 - 28

BusA bus is a common set of wires connecting multiple devices together.

Buses are used to connect the data and address lines between computers, memory and i/o devices.

One problem with normal logic components is that their outputs are always active - either a 1 or a 0 (5V or Gnd).

If we connect the output of two gates together, where one output is a 0 and the other is a 1, then the result will be somewhere in between

If we want to use a bus architecture we must make sure that only one device at a time outputs a logic signal of 0 or 1

All other devices must go into a high impedance state

ie they look like they are not connected at all.

Tri-state devices have this capability.

A tri-state device is like a normal gate but with the added capability of going into this off-line mode.

In tri-state devices the OE output-enable line drives the outputs into this third state rather than into the 0 logic state as previously.

Not all devices are tri-state. To connect these devices to a bus we need to put a tri-state driver chip in between the device and the bus.

A tri-state driver just sends its input to its output, but also has an OE input to allow the output to go ‘not connected’.

OE

RAM - Random Access Memory

Static memory is an array of D flip-flops.

159.233 Lecture 1 - 29

The access time is the same for all bits in the array.

Most memory chips are 1 bit wide, a byte is 8 chips side-by-side.

1st chip

2nd chip

3rd chip

etc..

(Some memory chips are several bits wide - normally a power of 2)

The desired bit is selected by the use of address lines.

159.233 Lecture 1 - 30

A typical memory device - 1 bit wide

CS R/W

Data

Address

The CS Chip Select line acts as an Enable line to the chip(when several banks of chips are being used)and also to tri-state the Data line depending on how it is being accessed

The R/ W determines whether the memory is written to or read from.

When memory is being read the Data line acts as an output

159.233 Lecture 1 - 31

Write access to memoryWe need to: select 1 flip-flop

perform the write on a clock pulse

the pulse must arrive after the chip is selected and a write operation specified.

Use a decoder and the address lines to select a particular flip-flop

AND this signal with the CS, R/ W to act as a clock to strobe the data in.

D

D

D

DDecoder

R/W CS

A0 A1

Data in

159.233 Lecture 1 - 32

Read access from memory

D

D

D

DDecoder

R/W CS

A0 A1

Data in

Q

Q

Q

Q

Data out

A0A1

mux

Place the correct bit on the output line by using a multiplexorUse the CS, R/ W to enable a tri-state buffer.

159.233 Lecture 1 - 33

1Mb MemoriesWe can make the above memory larger by making it Byte wide rather than bit wide.

This just means we have to copy 8 times the flip-flop column and its associated data in and data out lines.

The lines from the decoder and CS/RW logic go to each of the flip-flop columns.

For large Bit wide memories, we soon run into large numbers of pins to connect to the chip.

1Mbit needs:20 address lines1 data line1 CS line1 R/W line2 power/ground

The decoder within the chip will need 1 million output lines!

To get round these problems, we arrange the memory as a 2 dimensional grid, 1024 x 1024 elements.

We use 2 decoders and send the address in 2 chunks - the row and column addresses.

1Mbit now needs:10 address lines1 data line1 CAS/RAS strobe (column/row strobe)1 CS line1 R/W line2 power/ground

The 2 decoders now have only 1024 lines each as outputs.

Shift Registers

159.233 Lecture 1 - 34

Contain several flip-flops in a row. One bit is input at one end on each clock pulse. Each other bit moves one place to the right (or left). The outputs of each flip-flop are available simultaneously.

D Q D Q D Q D Q

Q0Q1Q2Q3

Data

Clock

We can use shift registers for serial to parallel conversion. Input 8 bits on 8 pulses, then read data simultaneously.

Division by 2

We can use the above principle to implement a divide by 2 circuit.

In the parallel load register (lecture 4) we can use an extra control line

shift/ load

to determine whether the input to a flip-flop comes from the data lines or from the previous flip-flop (the left-most flip-flop gets a 0 in shift mode)

(use the control line as the address of a 2 input multiplexor, the output is connected to the input of the flip-flop)

Try this as an exercise.

(design your own 2 input mux using AND/OR gates)

159.233 Lecture 1 - 35

Multiplication by 2

Multiplication can be done by shifting to the left.

Higher powers of two can be done by shifting several times.

An extra flip-flop can be added to the ends of the chain.

In the case of division by 2, this flip-flop will contain the remainder.

In the case of multiplication the flip-flop will flag whether an overflow has occurred.

The 74LS75B is a parallel load/shift right register 4 bits wide, which (with external assistance) can do shift left as well.

159.233 Lecture 1 - 36

CountersAdd one to the current number.

0 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1

For the low order bit:

just swap between 0 and 1 - ie it toggles

For higher order bits:the n’th bit toggles when the n-1’th bit changes from 1 to 0

Use a J-K flip-flop (falling edge triggered) as a toggle.

CLK

Q

Use the Qn-1 output as a clock to the Qn flip-flop

CLK

5v

J Q

Q0

K

J Q

Q1

K

J Q

Q2

K

J Q

Q3

K

159.233 Lecture 1 - 37

CLK

Q0

Q1

Q2This is an ideal case - unfortunately the output of a flip-flop, like any gate, is slightly delayed with respect to a changing input.

So the toggling of Q1 will be delayed by a few nanoseconds.

Q2 will only toggle on a change in Q1 so it will be delayed by twice the amount.

This effect will keep on compounding.

These glitches can have quite a disturbing effect.

Consider the case when the counter is at 0111

At the next pulse we would expect the counter to show 1000

However because of the delays the following patterns will all occur

0110 Q0 changes but Q1 hasn’t caught up yet

0100 Q1 has now changed but Q2 is still to flip over

0000 Q2 now has reacted but Q3 is still to change

1000 eventually we get the right answer.

Circuits can be designed so that this ripple effect can be safely ignored, However it is possible to design a counter that changes all the gates simultaneously.

159.233 Lecture 1 - 38

Synchronous Counters

We cannot use the preceding bit as a pulse.

The same pulse must be used for each flip-flop.

How do we know when to toggle a flip-flop?

The synchronous counter uses the following fact:

a bit will toggle if all the low-order bits in the previous state are 1

J Q Q Q Q

5v

Clock

K

J J J

K K K

Flip-flops toggle at the same time - but only if J-K’s are high.

J-K’s are formed by ANDing the output of the previous bits.

Because the new output comes after the toggle by a few nanoseconds

The new state is not involved in any of the inputs

The AND gates are sampling the previous state.

We are deliberately using to our advantage the gate delay time.

159.233 Lecture 1 - 39

Synchronous up-down counter

As an exercise create a synchronous down counter.

(remember that the J-K flip-flop also has a Q output)

Using a control line

up/ down

we can also create a synchronous up-down counter try it!

Algorithmic State Machines (ASMs)

The functionality of complex machines is divided into two parts:

An architecture - details of the components that performactions, and their interconnections

A controller - instructs the architecture to perform particularactions, and when to do them.

Architectures can be of many different types:

Digital eg a CPU

Mechanical eg a lift

Electrical eg a set of traffic lights

Hydraulic etc

For a computer processor:159.233 Lecture 1 - 40

Controller processor

memory

commands

status

Machine instructions are stored in the memory

The controller executes a loop performing these actions:

tell the processor to fetch an instruction from memory

tell the processor to perform the actions requested by the instruction

159.233 Lecture 1 - 41

status lines contain:

The instruction

Other status information eg

Z flag

NEG flag

OVFL flag

which will give the state of the accumulator after the instruction

Using these status flags the controller can follow different paths through the program

command lines allow the controller to perform specific actions

The processor architecture will contain specific logic elements eg

registersmultiplexorsadders

Command lines will control these devices:

Load a registerOutput enable a registerSelect a multiplexors inputRead/write enable memory

159.233 Lecture 1 - 42

The controller is implemented as a state machine

Z

Q

T

F

Each state starts in a square box and lasts until you reach the next square box.

159.233 Lecture 1 - 43

State 0

State 1

State 2

Z

Q

T

F

Everything within a state happens simultaneously

States can be numbered arbitrarily but it makes sense to try and keep to some pattern and to label the most common state 0

Command lines (outputs), listed in the square boxes, are true (1) when the controller is in that state false (0) otherwise.

Status lines (inputs) are tested in diagonal boxes. The two paths leading away from the test will be labelled with either T or F.

Outputs listed in rounded boxes are only asserted if that path is taken.

The state machine goes from one state to another on the tick of a clock.

159.233 Lecture 1 - 44

How to implement an ASM:

1. Use a register to store the present state number

2. Use some combinatorial logic to calculate what the next stateshould be.

3. On the tick of the clock transfer that new state into the register

4. repeat indefinitely

The register stores the number of the current state.

How many bits does it take to store a state?

If there are 2 states we only need 1 bit wide register

0 -> 01 -> 1

if there are 4 states we will need a 2 bit wide register

0 -> 001 -> 012 -> 103 -> 11

if there are 8 states we will need a 3 bit wide register

0 -> 0001 -> 001

etc

If we have seven states we will have to use 3 bits - one of the states will never occur.

159.233 Lecture 1 - 45

clock

status

inputs

outputs

next state

current state

Register

combinatorial logic for next state

We have a practical problem when we first start the machine:

Which state will it start in?

To make sure we don’t start in a state that could lead to self-destruction, we make sure we always start in state 0

This is achieved by holding a Reset line high when power is first applied, removing the Reset only after a little time has elapsed

Controlling the states in an ASM

The current state is stored in a register (a set of D flip-flops)

The next state is the result of combining the previous state with the inputs lines and status lines of the ASM.

We can draw up a table of current state / inputs / status / next state

159.233 Lecture 1 - 46

State 0

State 1

State 2

Z

Q

T

F

159.233 Lecture 1 - 47

Current state Next state ConditionAB AB

0 00 1 01 Z2 10

1 01 0 00 -2 10 0 00 -

In state 0 the B flip-flop -> Zthe A flip-flop ->

In state 1 the B flip-flop -> 0the A flip-flop -> 0

In state 2 the B flip-flop -> 0the A flip-flop -> 0

We could use a combination of logic gates to generate our new state - but this gets very messy for large numbers of states / inputs.

A very neat way of solving the problem is to use table look-up

Given our present state look it up in a table and read off the new state.

A multiplexor acts as a table look up.

159.233 Lecture 1 - 48

Current state

1 0 1 0

T F T F

0

1

2

3

0

1

2

3

0 0

We will need to include the Z input for the 0 state:

Current state

0 0

T F

0

1

2

3

0 or 1

0 0

T F

0

1

2

3

1 or 0

Z Z Z Z

Our complete look up table looks like this:

159.233 Lecture 1 - 49

Current state

A

T F

0

1

2

3

New A

A B

T F

0

1

2

3

New B

Z Z Z Z

B

Notice that although state 3 doesn’t exist we make it go to state 0.

Inputs will in general be some combination of logic symbols.

159.233 Lecture 1 - 50

Output signals

There are two standard ways of generating output signals from an ASM.

One way or the other will normally involve less chips.

Use one mux chip for each output.

Feed the current state into the mux. For each of the possible states use the correct inputs to give the desired output signal.

A

T F

0

1

2

3

Q

Z Z

B

Use a decoder chip on the current state.

Decoder

A

B

0

1

2

3

Z Q

A more complicated ASM159.233 Lecture 1 - 51

X

T

F

00

01

1011

T

T

F

F

F

T

T

F

Y

W

X

Z


0 00 1 01 Y2 103 11 X

1 01 0 002 10 W

2 10 0 00 T3 11 0 00 Z

2 10 ZX3 11

159.233 Lecture 1 - 52

In state 0 the B flip-flop -> Y + X -> X + Ythe A flip-flop -> + X -> X +

In state 1 the B flip-flop -> Fthe A flip-flop -> W

In state 2 the B flip-flop -> Fthe A flip-flop -> F

In state 3 the B flip-flop ->the A flip-flop -> ZX + -> X +

T FY Z

T FY Z

X

Current stateA

0

1

2

3

New A

B

A B

0

1

2

3

New B

Current state

XW

159.233 Lecture 1 - 53

A Multiplication Circuit

Phase 1:

Analyse the problem in terms of its inputs

Generate an architecture

Phase 2:

Design a controller (ASM)

Think in terms of normal long multiplication - but in binary not decimal

0111 multiplicand x 1010 multiplier ---- 0000 0111partial products 0000 0111 ------- 01000110 product

This way is not satisfactory:

We have to remember all the partial products before we can sum them to get the final product -> we would need lots of registers

Instead we can keep a running total in a single register.

If the multiplicand and multiplier are each n bits wide, the running total needs to be 2n bits wide.

159.233 Lecture 1 - 54

Multiplier

1. Start the running total with the multiplier in the low order bits.

2. Increment a loop counter.

3. Check the state of the lowest bit.

4. If it is a 1 add the multiplicand to the high order bits of the total.

5. Shift the running total 1 bit to the right.

6. Repeat from 2 until the loop counter gets to n.

The multiplier has now disappeared from the total.

The total itself has the first terms in the addition shifted down to the low order end

ie

Counter Low Add ShiftBit

1 00001010 -> 0 -> -> 00000101

2 00000101 -> 1 -> 00000101 -> 001110100111000001110101

3 00111010 -> 0 -> -> 00011101

4 00011101 -> 1 -> 00011101 -> 010001100111000010001101

01000110 -> answer

159.233 Lecture 1 - 55

159.233 Lecture 1 - 56


0 00 0 001 01 START

1 01 2 10 LOBIT3 11

2 10 3 11 -3 11 0 00 EQZ

1 01

159.233 Lecture 1 - 57

START T FLOBIT EQZ

A

0

1

2

3

New A

B

A

0

1

2

3

New B

B A

B 2bit register

EQZ

Decoder

A

B

0

1

2

3

CLRCNT

LDMPLR RSTASTART

INCRLDPRODSHIFTDONE

EQZ

Designing a Computer

von Neumann model

general purpose architecture

programmed by stored instructions

159.233 Lecture 1 - 58

linearly addressed memory

program and data stored in memory

results generated into an accumulator by an ALU

The Pico-computer architecture

Program counter

Accumulator

Arithmetic Logic Unit

A Buffer Register

Input Port

const mux

Memory

Memory Address Register

EQZ

1 0

Bus

Tristate buffer

Tristate buffer

Tristate buffer

Tristate buffer

OEPC

INCPCLDPC

OECONST

CO/C1

OEACC

LDACC

OEPORT

sub

OEMEMWEMEM

LDMARLDABR xor invertor

zero detect

Program Counter PC keeps track of current instruction

Arithmetic/logic ALU performs data manipulations - add/sub

Accumulator Acc temporary store for results of ALU

Input port IO interfaces to outside world

Const mux easy access to the constants 0 and 1159.233 Lecture 1 - 59

Buffer register ABR buffer for the ALU

EQZ TRUE if accumulator is zero

Memory address MAR select a particular address in memoryregister

Memory 256 8 bit bytes

Bus 8 bits wide

159.233 Lecture 1 - 60

Design of the ALU

Accepts two numbers from the Bus

Calculates sum or difference - specified by the controller

Stores result in accumulator

Addition:

Acc

Adder

ABR

LDACC

LDABR

Use an 8 bit adder

Put one operand on the Bus, assert LDABR to store it in ABR

Put other operand on the Bus

Load result of adder in accumulator by asserting LDACC

159.233 Lecture 1 - 61

Subtraction:

a - b

using 2’s complement notation =>

a + (2’s complement)b

2’s complement b = complement b + 1

Use an Exclusive OR gate as a programmable complementor:

s b EOR0 0 00 1 11 0 11 1 0

When s = 0 the output is b

When s = 1 the output is b

so for a b that is 4 bits wide:

complement enable

b b or b

159.233 Lecture 1 - 62

To subtract use the subtract control line as a complement enableand also as a Carry in to the adder => complement + 1

Acc

Adder

ABR

LDACC

LDABR EOR array

C

sub

We now need to add a zero detect circuit to the Accumulator, and a tri-state gate to attach the accumulator to the Bus.

zero detect:

Accumulator

EQZ

159.233 Lecture 1 - 63

The complete ALU with signals that the controller needs

Acc

Adder

ABR

LDACC

LDABR EOR array

C

sub

zero detectEQZ

OEACCTri-state buffer

The Controller

The controller is an ASM that implements the instruction set of the computer

When the power is first turned on, and after each instruction has completed the controller:

uses the value in the PC to address memory

the value at that memory location is the next instruction

issues a sequence of signals to the architecture correspondingto the instruction

159.233 Lecture 1 - 64

The input signals to the controller are:

The current instruction

EQZ

The output signals generated by the controller are:

LDABR load a buffer registersub subtract instead of addLDACC load the accumulatorOEACC output enable the accumulatorOECONST output enable the constantC0/C1 select between constant 0 or 1OEPC output enable the Program CounterRESET reset line to gatesLDPC load the PCINCPC increment the PCWEMEM write enable memoryOEPORT output enable the I/O portOEMEM output enable memoryLDMAR load memory address registerDISABLE disables the clock (user re-enables it)

159.233 Lecture 1 - 65

Instruction set

The pico-computer only has 8 instructions:

0 LDA operand Load value at location operand into Acc

1 ADD operand Add value at location operand to Acc

2 SUB operand Subtract value at location operand from Acc

3 STA operand Store value in Acc at location operand

4 JPZ operand If value in Acc is 0 jump to operandotherwise proceed to next instruction

5 IN Input a number from I/O port

6 CLR Load accumulator with 0

7 HLT Halt the clock

So a set of instructions to input three numbers and add them together is:

5 input number to acc

3 100 store acc to location 100


3 101 store acc to location 101


1 101 add value from location 101

1 100 add value from location 100

7 Halt

159.233 Lecture 1 - 66

Instructions in detail

For all these instructions we may assume that the PC has already been incremented

0 LDA operand

Algorithm:

1 Output enable const mux (0) onto bus

Load bus (0) into ABR

2 Load MAR with location of address of operand

3 Load MAR with address of operand

4 Output enable memory

Load Acc with sum of ABR (0) and bus

OECONST LDABR

OEPC LDMAR INCPC

OEMEM LDMAR

OEMEM LDACC

LDA

159.233 Lecture 1 - 67

1 ADD operandAlgorithm:

1 Output enable accumulator onto bus

Load bus (accumulator) into ABR



4 Output enable memory

Load Acc with sum of ABR (accumulator) andbus

OEACC LDABR

OEPC LDMAR INCPC

OEMEM LDMAR

OEMEM LDACC

ADD

2 SUB operandAlgorithm:


Load bus (accumulator) into ABR



4 Output enable memory onto bus

Load Acc with sum of ABR (accumulator) andbus

Subtract line enable

OEACC LDABR

OEPC LDMAR INCPC

OEMEM LDMAR

OEMEM LDACC SUB

SUB

159.233 Lecture 1 - 68

3 STA operandAlgorithm:




Write-enable memory

OEPC LDMAR INCPC

OEMEM LDMAR

OEACC WEMEM

STA

4 JPZ operandAlgorithm:

1 Load PC into MARincrement PC in case jump doesn’t occur

2 Output enable memory (operand) onto bus

If EQZ (accumulator contents) enable LDPCto load operand into PC

OEPC LDMAR INCPC

OEMEM

JPZ

T

FEQZ

LDPC

159.233 Lecture 1 - 69

5 INAlgorithm:



2 Halt the clock

Output enable the I/O port onto the bus

Load the accumulator with the sum of ABR (0)and the value from the I/O port

won’t take effect until user starts clock

OECONST LDABR

OEPORT LDACC DISABLE

IN

6 CLRAlgorithm:




Load the accumulator with the sum of ABR (0)and the value on the bus (0)

OECONST LDABR

OECONST LDACC

CLR

7 HALTAlgorithm:1 Disable the clock HALT

DISABLE

Simple ASM for the Pico-computer159.233 Lecture 1 - 70

OECONST LDABR

OEPC LDMAR INCPC

OEMEM LDMAR

OEMEM LDACC

LDAOEACC LDABR

OEPC LDMAR INCPC

OEMEM LDMAR

OEMEM LDACC

ADDOEACC LDABR

OEPC LDMAR INCPC

OEMEM LDMAR

OEMEM LDACC SUB

SUB

OEPC LDMAR INCPC

OEMEM LDMAR

OEACC WEMEM

STA

F

OEPC LDMAR INCPC

OEMEM

JPZ

TEQZ

LDPC

OECONST LDABR

OEPORT LDACC DISABLE

INOECONST LDABR

OECONST LDACC

CLR HALT

DISABLE

INSTR

OEPC LDMAR INCPC

OEMEM LDIR

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22 23

159.233 Lecture 1 - 71

This ASM is straight forward, but it contains 24 states!

There are a lot of duplicates!

We can simplify the number of states

We also have a problem...

How do we get our program into the memory in the first place?

We need some extra hardware.

A toggle switch connected to the control line PLOAD (program load)

3 further switches:

Load address

Increment addr

Load memory

(together with our I/O port to enter the actual codes)

8 state ASM incorporating program load commands:

159.233 Lecture 1 - 72

DISABLEOECONST LDACC

SUB

OEMEM LDACC

OEPORT LDACC DISABLE OEACC

WEMEM LDPC

EQZ

LDMAR

JPZ

INSTR

OECONST OECONST OEACC

OEPC LDMAR INCPC

OEMEM

LDABR

INSTR

OEMEM LDIR

PLOADDISABLEPLOAD

ACTION

OEPORT LDMAR

INCMAR

OEPORT WEMEM

?

OEPC LDMAR INCPC

RESET

none

load addr

inc addr

load memory

hlt sta+jpz

add+subin+clrlda

0

1

2

3

4

5

6

7

n y

y

n

sta

pload.in

lda+add

subclrpload.hlt

y

n

lda+add+ sub+in+clr

159.233 Lecture 1 - 73

ASM transition table

Current state Next state ConditionABC ABC

0 000 1 001 -

1 001 2 0107 111 pload

2 010 3 011 lda+add+sub+in+clr6 110 hlt4 100 sta+jpz

3 011 4 100 lda+add+sub6 110 in+clr

4 100 5 101 -

5 101 6 1101 001 jpz

6 110 1 001 -

7 111 0 0007 111 pload

159.233 Lecture 1 - 74

State 0A = falseB = falseC = true

State 1A = ploadB = trueC = pload

State 2A = hlt + sta + jpzB = hlt + lda + add + sub + in + clrC = lda + add + sub + in + clr

State 3A = trueB = in + clrC = false

State 4A = trueB = falseC = true

State 5A =B =C = jpz

State 6A = falseB = falseC = true

State 7A = ploadB = ploadC = pload

159.233 Lecture 1 - 75

T F

AB 3 bit

register

0

1

2

3 New A4

5

6

7

ABC

C

0

1

2

3 New B4

5

6

7

0

1

2

3 New C4

5

6

7

lda

add

sub

sta

jpz

inclr

hlt

pload

decoder

inst register

data bus

ABC

ABC

159.233 Lecture 1 - 76

Microprogramming

The Pico-computer uses hardware to implement its controller.

Combinations of status lines are used to create the next state of the ASM, and multiplexors choose between them given the current state.

The mux’s are performing a ‘table-lookup’- a job that could just as easily be performed by a ROM.

- The Microprogrammed approach.

Benefits:

If the instruction set needs to be altered, or corrections made, it is a lot easier to change a ROM, than to change or add to the combinatorial logic of the hardwired approach.

A processor can be made to look like, ‘emulate’, another processor, by changing the micro-code, an instruction set of another processor can be implemented (as long as the architecture is compatible).

Extra instructions can be implemented depending on the type of program the processor is executing.

If the program is heavily maths based, we could implement one instruction that performs a Fast Fourier Transform on a set of numbers.

If its text based, we could implement a ‘strcpy’ instruction that performs a block move of data.

In both these instances we are only invoking 1 instruction instead of many. So the total cycle time will be reduced because of the smaller number of instruction fetches. (Data fetches and ALU operations will be the same)

159.233 Lecture 1 - 77

Micro-program controller

There is no longer an instr register. The µ-controller loads the opcode from the Bus when necessary into its own program counter.

The µ-program counter is reset to zero on power up. µ-addr - initially points to address 0 of µ-ROM.

µ-ROM contains a sequence of data bytes.These data bytes appear as output and control lines:

processor control linescondition selectnext µ-addropcode/(next µ-addr)

159.233 Lecture 1 - 78

The µ-PC is fairly complex:

The data stored in the counter can be either:

incremented if LD/(INC) is Falseor:

loaded from in if LD/(INC) is True

If LD is True, then the PC is either:

next µ-addr coming from ROMor:

the next opcode coming from the Bus

Depending on whether opcode/(next µ addr) is 1 or 0

The data stored in the counter changes on a clock tick.

out addresses ROM.

159.233 Lecture 1 - 79

Inside the ROM:

O D E I I C O L L L O W S O L N O E D D D E E A E D C N A A A S M M M B P P P S C C B U A E E L µ-addr cond opcode next C C C T C C R B R M M E sel /addr addr 0 1 . 8 . . . . . . . . . . . .goto fetch 1 1 . 10 . . . . . . . . . . . .goto LDA 2 1 . 14 . . . . . . . . . . . .goto ADD 3 1 . 18 . . . . . . . . . . . .goto SUB 4 1 . 22 . . . . . . . . . . . .goto STA 5 1 . 25 . . . . . . . . . . . .goto JPZ 6 1 . 29 . . . . . . . . . . . .goto CLR 7 1 . 31 . . . . . . . . . . . .goto HLTfetch 8 0 . 1 . . . . . . . 1 . . . mar=pc 9 1 1 . . 1 . . . . . . 1 . . pc++; goto Buslda 10 0 . . . . 1 . . 1 . . . . . abr=0 11 0 . 1 . 1 . . . . . 1 . . . mar=pc; pc++ 12 0 . . . . . . . . . 1 1 . . mar=m[mar] 13 1 . 8 . . . . . 1 . . . 1 . . a=abr+m[mar]; goto fetch

159.233 Lecture 1 - 80

O D E I I C O L L L O W S O L N O E D D D E E A E D C N A A A S M M M B P P P S C C B U A E E L µ-addr cond opcode next C C C T C C R B R M M E sel /addr addradd 14 0 . . . . . 1 . 1 . . . . . abr=a 15 0 . 1 . 1 . . . . . 1 . . . mar=pc; pc++ 16 0 . . . . . . . . . 1 1 . . mar=m[mar] 17 1 . 8 . . . . . 1 . . . 1 . . a=abr+m[mar]; goto fetchsub 18 0 . . . . . 1 . 1 . . . . . abr=a 19 0 . 1 . 1 . . . . . 1 . . . mar=pc; pc++ 20 0 . . . . . . . . . 1 1 . . mar=m[mar] 21 1 . 8 . . . . . 1 . 1 . 1 . . a=abr-m[mar]; goto fetchsta 22 0 . 1 . 1 . . . . . 1 . . . mar=pc; pc++ 23 0 . . . . . . . . . 1 1 . . mar=m[mar] 24 1 . 8 . . . . 1 . . . . . 1 . m[mar]=a;goto fetchjpz 25 0 . 1 . 1 . . . . . 1 . . . mar=pc; pc++ 26 2 . 28 . . . . . . . . . . . .if EQZ goto 28 27 1 . 8 . . . . . . . . . . . .goto fetch 28 1 . 8 . 1 . . . . . . . 1 . . pc=m[mar];goto fetchclr 29 0 . . . . 1 . . 1 . . . . . abr=0 30 1 . 8 . . . 1 . 1 . . . . . . a=abr+0;goto fetchhlt 31 1 . 8 . . . . . . . . . . . 1 disable;goto fetch

159.233 Lecture 1 - 81

Example - Instruction: LDA 100 at location 0.

µ-addr Condition opcode/ Next µSelect (next addr) address

0 1 0 8CS = 1 => LD is True; µ-PC will load data op/n = 0 => Data will come from Next µ addrNext µaddr = 8 => The µ-PC will be 8 next clock tick

8 0 0 0LDMAR;OEPC => mar = pc (=0)CS = 0 => µ-PC will inc to 9 on next clock tick

9 1 1 0INCPC;OEMEM => pc =1; Bus = mem[0]; (Bus=LDA = 1)CS = 1 => LD is True; µ-PC will load dataop/n = 1 => Data will come from Bus/opcodeBus = 1 => µ-PC will be 1 on next clock tick

1 1 0 10CS = 1 => LD is True; µ-PC will load dataop/n = 0 => Data will come from Next µ addrNext µ addr = 10 => The µ-PC will be 10 next clock tick

10 0 0 0LDABR;OECONST => Bus = 0; abr = Bus; (abr=0)CS = 0 => µ-PC will inc to 11 on next clock tick

11 0 0 0OEPC;INCPC;LDMAR => mar = pc (mar = 1); pc++; (pc = 2)CS = 0 => µ-PC will inc to 12 on next clock tick

12 0 0 0OEMEM;LDMAR => mar = m[mar] (mar = m[1] = 100)CS = 0 => µ-PC will inc to 13 on next clock tick

13 1 0 8OEMEM;LDACC => acc = abr + m[mar] (acc = 0 + m[100])CS = 1 => LD is True; µ-PC will load dataop/n = 0 => Data will come from Next µ addrNext µ addr = 8 => The µ-PC will be 8 next clock tick

159.233 Lecture 1 - 82

This type of ROM organisation is called Horizontal Microprogramming.

In the pico-computer there are 32 µ-instructions and each instruction contains 12 control lines.

These numbers are manageable. But a processor with 200 control lines and 4096 µ-instructions would require ~1Mbit of ROM.

This ROM has to be very fast and so is expensive.

One way of reducing the size of ROM is to consider how many output bits are active at any one time.

In the pico-computer there are 12bits of output so in theory there could be 212 different combinations.

In practice the ROM consists of common combinations of outputs, indeed there are only ~13 different outputs.

We can reduce the size of the ROM by encoding these 13 different outputs as a 4bit number, and decoding this number to generate the correct outputs

This is called Vertical Microprogramming - 200 control lines may be encoded using only 6 bits:

µ-ROM

decoding circuit

control lines

126

200

6

4096

159.233 Lecture 1 - 83

Vertical Microprogramming

This considerably reduces the amount of ROM needed, but at a cost - the hardware for the decoding circuit is fixed and so we lose some of the benefits of being able to program at this level.

One solution to this is to replace the decoding circuit by yet another ROM - nano-ROM.

Use table lookup to generate the control lines

µ-ROM

nano-ROM 200bit x 64

control lines

126

200

6

4096

We again have complete control over this programming level.

We have saved on ROM – together both ROMS need only ~35Kbits.

µ-ROM and nano-ROM can be replaced without having to do any hardware alterations.

One drawback is that ROM now needs to be twice as fast - as we are now doing two table lookups instead of one.

Construction of an ALU.

The picocomputer has a very primitive ALU. It consisted of just an Adder with the ability to subtract using 2’s compliment arithmetic. There was no Logic in it at all!

159.233 Lecture 1 - 84

The symbol for an ALU is normally written:

Function4-bit ALU

A B

C

Carry out

Where the Function input selects the desired combination of the two inputs.

If the Function input consists of 2 control lines, 4 functions are possible:eg:

00 Add C = A + B01 Subtract C = A - B10 And C = A and B11 Or C = A or B

If we increase the number to 3 control lines then further functions (8 in total) are possible - eg Xor

159.233 Lecture 1 - 85

ALU’s are constructed by creating each of the functions separately, and then using a multiplexor to select the desired output. The Function input is used for the input address lines of the mux.

add sub

and

or

sub

function

A

B

C

This ALU can be constructed using a 74153 4-input mux, a 7483 4-bit adder, and and or gates. The subtract function needs a little work. It has to generate the inverse of B, and the carry-in signal to the adder, as well as being used by the address lines of the mux.

f0

f1

sub carry in

B0

159.233 Lecture 1 - 86

Adders

There is one small problem with designing an n-bit adder. It is normal to design a 1-bit adder and then use n of them to handle n-bits - a ripple-carry adder

+a

b

carry in

carry out

sum

The problem is to do with the carry. Every time a signal has to go through a gate, it incurs a small delay. This gate delay is ~5 ns. The adder itself can be made from 2 levels of gates. The first level is an and array, this is then fed into the second level which is an or array.

To add 2 bits together takes 2 gate delays.

If we chain several 1-bit adders together to make an n-bit adder, we have to wait a considerable time for the carry out to propagate through. The carry out takes 2-gate delays for each single adder, and so will take 2*n gate delays to appear at the final bit.

For a 32-bit adder this will increase the time from 10ns to 320ns - 32 times slower!

159.233 Lecture 1 - 87

It is possible to speed up this process. Indeed, it is always possible to design a circuit that has only two levels of logic. But for a 32-bit adder, this is extremely hardware intensive.

Cin1 (the carry in of bit 1 ) =Cout0 (the carry out of bit 0)

Cout0 = (B0.Cin0) + (A0.Cin0) + (A0.B0)

=>

Cin1 = (B0.Cin0) + (A0.Cin0) + (A0.B0)

Cin2 (the carry in of bit 2) =Cout1 (the carry out of bit 1)

Cout1 = (B1.Cin1) + (A1.Cin1) + (A1.B1)

= (B1.B0.Cin0) + (B1.A0.Cin0) + (B1.A0.B0)+ (A1.B0.Cin0) + (A1.A0.Cin0) + (A1.A0.B0)+ (A1.B1)

This complexity can be simplified somewhat, by substituting

gi = ai.bi

which is true if bit i of the adder generates a carry out independent of carry in.

andpi = ai+bi

which is true if bit i of the adder propagates a carry in to carry out.

c1 = g0 + (p0.c0)c2 = g1 + (p1.g0) + (p1.p0.c0)c3 = g2 + (p2.g1) + (p2.p1.g0) + (p2.p1.p0.c0)c4 = g3 + (p3.g2) + (p3.p2.g1) + (p3.p2.p1.g0) + (p3.p2.p1.p0.c0)

159.233 Lecture 1 - 88

A carry is generated if carry in for some earlier bit is a one, and all intermediate bits propagate a carry. This is called a carry-lookahead adder

Now we have groups of 4-bits with the carry being handled as quickly as the addition.

To extend this scheme to more than 4-bits we can either:

block them together as before and let the carry ripple through

a partial carry-lookahead adder

or

use a higher level carry lookahead on each block of 4-bits

P0 = p3.p2.p1.p0

G0 = g3 + (p3.g2) + (p3.p2.g1) + (p3.p2.p1.g0)

C1 = G0 + (P0.c0)

and similarly for C2, C3 and C4

159.233 Lecture 1 - 89

Multipliers - Booth’s Algorithm

Booth observed that if we allow the ’adder’ to subtract as well as add, we can multiply numbers differently. When he invented this approach, shifting right was quicker than adding. If the multiplier contains sequences of 1’s or 0’s then we can reduce the number of additions.

Instead of just looking at the LOBIT of the shift register, we look at the LOBIT and the 'previous LOBIT'

The new algorithm is now:

Depending on the LOBIT and previous LOBIT do one of the following:

00: Middle of a string of 0’s - do nothing

01: End of a string of 1’s - add the multiplicand to the high orderresult

10: Beginning of a string of 1’s - sub multiplicand

11: Middle of a string of 1’s - do nothing

After this perform a one bit arithmetic right shift, and repeat n times.

An ASR shifts 1 bit to the right, but the high order bit inserted is the same as the previous high order bit

eg 0111 => 00111100 => 1110

The very first previous LOBIT is 0.

159.233 Lecture 1 - 90

Example of 2 * 6

Multiplicand Step Shift Regs0010 Initial values 0000 0110 0

1 0010 00: do nothing 0000 0110 00010 ASR 0000 0011 0

2 0010 10: sub mcand 1110 0011 00010 ASR 1111 0001 1

3 0010 11: do nothing 1111 0001 10010 ASR 1111 1000 1

4 0010 01: add mcand 0001 1000 10010 ASR 0000 1100 0

One side-effect of Booth’s algorithm is that it handles negative as well as positive numbers.

Using the previous algorithm, it is necessary to remember the initial signs, convert both numbers to be positive, and negate the final result if the two initial signs were different.

Instruction set design

The Pico-computer has 7 instructions, LDA IN ADD etc, they are of variable size – the IN instruction is just a single byte, but LDA takes 2 bytes. They also take a variable number of clock cycles IN takes 2 less cycles than LDA. They are also the idea of one person – others might have different ideas of what constitutes a good instruction set.

The controller specifies how the instructions are implemented by the hardware and in general they follow 5 basic steps:

Fetch: an instruction is brought into the control unit from memory.

Decode: The control unit decodes the instruction to determine the sequence of events necessary to execute it.

Memory: Any data necessary for the instruction are fetched from memory

159.233 Lecture 1 - 91

Execute: The operation is performed

Write: Results of the operation may be written to memory

Each machine code, expressed in Assembler language contains:

OpCode the operation to be performed andOperands the rest of the instructionAddresses some of the operands may be addresses of data

EgADD AX,[102]

The OpCode is ADD, there are two operands AX and [102], and one address [102]

Assembler language syntax is specific to the processor, and varies widely.Types of instructions

Five address machinesMachines with only one instruction type:

If we need to put all the information necessary into a single instruction, then it must contain the following:

OpCode Dest Addr, Src1 Addr, Src2 Addr, Cond Code, True Addr,False Addr

5 addresses -> 5 address machinee.g.

SUB [0],[1],[2],NZ,4,5

Which subtracts the value stored at location 2 from that at 1 and stores the result in 0. If the result is not zero, then the next instruction is at location 4 otherwise the next instruction is at 5.

159.233 Lecture 1 - 92

Some very early computers used this format, but it is not used today. It has the disadvantage that all instructions are very big. Addresses are normally 32 bits long, so the instruction will be ~160 bits long.

We can make the instructions shorter if we allow ourselves two types of instructions and 2 further registers:

159.233 Lecture 1 - 93

Three address machinesMachines with two instruction types.

Implement a Program Counter. If we order our program sequentially, then the False Addr can be set to default to the next instruction (PC + 1).

Implement a Flags Register. Condition Codes set by one instruction are remembered and can be tested by another.

The two instruction types are:

OpCode Dest Addr, Src1 Addr, Src2 Addr :Arithmetic/Logic instrs

OpCode Cond Code, True Addr :Control instructions

The longest instruction now has 3 addressese.g.

SUB [0],[1],[2]JNZ 4

159.233 Lecture 1 - 94

Two address machinesThree address machines still have a long instruction length (3 * 32 = 96), we can cut out the Dest Addr by assuming it is the same as one of the Src Addr.

This is often the case, but if it isn’t we need to invent an extra operation MOV.

We still have two types of instructions:OpCode Dest/Src1 Addr, Src2 Addr :Arithmetic/Logic instrs

:Memory/Memory move


The longest instruction now has 2 addresses -> 2 address machinee.g.

MOV [0],[1]SUB [0],[2]JNZ 4

Although this particular program is still the same length as the other two, in general programs will be shorter as several ALU instructions performed in sequence will only be ~64 bits. Shorter programs tend to be slower, but this is not always the case.

159.233 Lecture 1 - 95

One address machinesTwo address machines can be made even shorter, by implementing special temporary locations within the processor – Registers.

If there is only one register, its called the Accumulator. Several registers are called either by their special function or by a more general R1, R2 etc. There will only be a few registers ~32 so they can be encoded using only a few bits, rather than the 32 needed for memory.

We now have four types of instructions:OpCode Dest/Src1 Addr, Src2 Reg :Memory/Register moves


OpCode Dest/Src1 Reg, Src2 Addr :Register/Memory moves

OpCode Dest/Src1 Reg, Src2 Addr :Register/Memory ALU

The longest instruction now has 1 address -> 1 address machinee.g.

MOV A,[1]SUB A,[2]MOV [0],AJNZ 4

159.233 Lecture 1 - 96

Zero address machinesWe can’t have instructions that do not access memory – but we can restrict the ALU operations to be Register/Register.

We now have four types of instructions:

OpCode Dest/Src1 Addr, Src2 Reg :Memory/Register moves


OpCode Dest/Src1 Reg, Src2 Addr :Register/Memory moves

OpCode Dest/Src1 Reg, Src2 Reg :Register/Register ALU

e.g.

MOV A,[1]MOV B,[2]SUB A,BMOV [0],AJNZ 4

Often these machines are referred to as Load/Store. Only Registers are involved with the ALU. Some processors allow 3 registers to be specified for ALU operations, the MIPs processor that we will be looking at is one of them.

159.233 Lecture 1 - 97

Stack machine:

One, final, zero address instruction set is for a stack based machine


PUSH Src Addr :Stack operation

POP Src Addr :Stack operation

e.g.

PUSH [1]PUSH [2]SUBPOP [0]JNZ 4

There are a few stack machines around today – e.g. the Inmos Transputer.

Addressing modes

The Pico-computer has one addressing mode: Direct.

lda 100

jnz 4

Each instruction contains an address.

Either the address is used directly for the jnz instruction, or the contents of the address are transferred between memory and the accumulator.

There are several different assembler syntaxes for addresses

The pico-computer uses the address (a number) on its own. Another way is to enclose the number in square brackets.

159.233 Lecture 1 - 98

We need to consider what addressing modes we would like the computer to implement.

Direct Addressing:

add a,[100]

The contents of location 100 are added to the Accumulator

byte x; /*x happens to be at addr 100*/a += x;

159.233 Lecture 1 - 99

Immediate Addressing:

add a,100

The value 100 is part of the instruction.100 is added to the accumulator.

a += 100;

Register Addressing:

add a,b

The operands of the instruction refer to specific registers.The contents of register B are added to the accumulator

a += b;

Register Indirect Addressing:

add a,[b]

The operands of the instruction refer to specific registers

Register b contains an address.The contents of this address are added to the accumulator.

int x = 2;b = &x;a += *b;

159.233 Lecture 1 - 100

Register Indirect with Displacement Addressing:

add a,[b+10]

The operands of the instruction refer to specific registers

Register b contains a number.The desired memory address is 10 plus the contents of b.The contents of this address are added to the accumulator.

byte x[20];/*x is at addr 10*/b = 5;a += x[b];

Indexed Addressing:

add a,[b+c]

Add the contents of b and c, use the address so formed.The contents of this address are added to the accumulator.

byte array[10];b = array;c = 5;a += b[c];

159.233 Lecture 1 - 101

Indexed with Displacement Addressing:

add a,[b+c+10]

Add the contents of b, c and 10, use the address so formed.The contents of this address are added to the accumulator.

struct {byte k;byte j;

} s[5]; /*address of s is 10*/

/*b is set to i*sizeof(s)*//*c is set to offset of j within s*/a += s[i].j;

Memory Indirect Addressing:

add a,@[100]

The contents of address 100 is another address.The contents of this address are added to the accumulator.

byte x,*p; /*address of p is 100*/p = &x;a += *p;

159.233 Lecture 1 - 102

Register Indirect Addressing with auto-incr/decrement:

add a,[b]+add a,-[b]

The contents of b is an address.The contents of this address are added to the accumulator.

Afterwards b is incremented.

This addressing mode can be used in the absence of any explicit stack instructions.

byte p[10];b = p;a += *b++;

159.233 Lecture 1 - 103

Instruction Size.

Instructions can be

either variable in size – e.g. the Pico-computer or Pentium,

or of fixed length – e.g. the MIPS processor or PowerPC.

Variable size instructions -> a program will take up less space.

Fixed sized instructions -> the processor can be made simpler - freeing up space for more registers.

All instructions on the MIPs processor are 32 bits.

159.233 Lecture 1 - 104

MIPS.

RISC machines trade simplicity of processor v Registers.

The MIPs processor has fixed size instructions - 32 bits.

There are 32 registers but -

register $0 is reserved for the value 0 register $31 is the PC.

The MIPS architecture supports 4 addressing modes:

Register Addressing:

Register Indirect with Displacement Addressing:

Immediate Addressing:

PC Relative

159.233 Lecture 1 - 105

R type instructions:

ALU operations have three operands – all of them registers

Opcode rs rt rd shamt sub-fct6 5 5 5 5 6 bits

add $1,$2,$3 ;$1 = $2 + $3

shamt is the shift amount

I type instructions:

Load/Store (the only instructions to access memory) and some branches

Opcode rs rt displ6bits 5 5 16 bits

lw $1,100($2) ;$1 = mem[$2+100]

J type instructions:

Jump and procedure call.

Opcode displ6bits 26 bits

j 10000 ;goto 10000

159.233 Lecture 1 - 106

add $1,$2,$3 ;$1 = $2 + $3;

sub $1,$2,$3 ;$1 = $2 - $3;

and $1,$2,$3 ;$1 = $2 & $3;

or $1,$2,$3 ;$1 = $2 | $3;

slt $1,$2,$3 ;$1 = ($2 < $3);

jr $31 ;goto $31;(return;)

lw $1,100($2) ;$1 = mem[$2 + 100];

sw $1,100($2) ;mem[$2 + 100] = $1;

beq $1,$2,100 ;if($1==$2) goto PC+100

j 10000 ;goto 10000

jal 10000 ;$31=PC+4;goto 10000;;(function call;)

The MIPs Datapath

Unlike the Pico-computer the MIPs computer has several Buses connecting it’s components together.

We’ll see later that this enables us to ‘Pipe-line’ instructions.

The processor is built from several sub-sections.

The first section is the Program counter and the Instruction memory.

In this version of the MIPs processor, we keep instruction memory and data memory separate.

159.233 Lecture 1 - 107

ADD

PC

addr

instr

4

Instr mem

The Program Counter is used as the address line to the instruction memory.

As each instruction is 32bits wide, each time we retrieve an instruction we have to add 4 to PC to point to the next instruction. This will be modified later to cope with jumps.

159.233 Lecture 1 - 108

R-type instructions:

These instructions only involve Registers and the ALU - remember that the MIPS chip has a load/store architecture. Data stored in memory must first of all be transferred to a register before performing any operation on it.

ALU

Registers

read reg1read reg2

write reg

write data

read data1

read data2

instr

5 32

32

32

5

5

The instruction produces the numbers of each of the three registers involved:

add $4,$10,$14

will add register 14 to register 10 and put the result into register 4

The 32 bits making up this instruction has the form

Opcode Rs Rt Rd Shamt Subfct

each of the register fields are 5 bits.

The register file allows 3 registers to be selected.

The two read registers, Rs and Rt are selected by the instruction and the data stored in them appears as read data1 and read data2

The write register Rd gets its data from the output of the ALU.

The ALU is controlled by a combination of Opcode/Shamt/Subfct159.233 Lecture 1 - 109

I-type instructions

These instructions load/store data from memory to register

lw $1,100($2)

loads into register 1, the data stored at 100 + contents of register 2

We need to be able to add the offset stored in the instruction to a register.

The offset is only 16 bits, so has to be extended to 32 bits for the ALU.

If the offset is negative we need to preserve the sign bit (ASR).

sign extend

Data mem

addr

write data

read dataALU

Registers

read reg1read reg2

write reg

write data

read data1

read data2

16 32

instr

159.233 Lecture 1 - 110

Branch instructions

These also have a 16bit immediate data in the instruction, which needs to be added to the program count if the branch is taken

As the PC is always a multiple of 4, the immediate value has also to be shifted left by 4.

ADDshift left2

sign extend

ALU

Registers

read reg1read reg2

write reg

write data

read data1

read data2

16 32

instr

branch control logic

branch address

PC + 4

The ALU produces an output that will control whether the conditional branch is taken or not.

The branch address is formed from (PC + 4) + the displacement

These separate parts can now be combined into one complete picture for the MIPs processor architecture.

159.233 Lecture 1 - 111

Instr mem

mux

Data mem

read data

mux

ADDshift left2

sign extend

addr

write data

mux

ALU

Registers

read reg1read reg2

write reg

write data

read data1

read data2

16 32

ADD

PC

addr

instr

4

The multiplexors have been included so that we can control which data is to be forwarded.

The MUX sending data to the PC will be controlled by the branch control logic

The MUX before the adder will be controlled by whether the instruction is a load/store or not.

The MUX after the Data mem will control whether either the output from the ALU is written back to the register bank (an R-type instruction), or whether data from memory is written to a register (a load instruction)

Pipelining

159.233 Lecture 1 - 112

Pipelining is an implementation technique where multiple instructions have overlapped execution.

In a car assembly plant, the work done to produce a new car is broken down into several small jobs. As a car progresses along the assembly line each of these jobs is performed in sequence.

The total time taken to produce a car is still the same, but several cars can be assembled simultaneously.

A new car appears at a rate determined by how long each of the small jobs take.

The MIPs processor can be thought of as having several steps:

lw $1,100($0) ;load the word at 100 into Reg1

is made up of five separate steps

Instr fetch Reg ALU Data

fetch Reg

10ns 5ns 10ns 10ns 5ns

IF ID EX MEM WB

1. IF: Instruction fetch from Instr Memory

2. ID: Instruction Decode and Register fetch (Reg 0 = 0)

3. EX: Execution add the value (100) to register output (0)

4. MEM: Memory access - read location 100

5. WB: Write Back - write data into register 1

Total time is 40ns

159.233 Lecture 1 - 113

If we perform several loads in succession:

lw $1, 100($0)lw $2, 200($0)lw $3, 300($0)

The total time is 120ns

IF ID EX MEM WB




IF ID EX MEM WB

IF ID EX MEM WB

40ns 40ns 40ns

We can reduce the total time if we overlap these steps :

Each step is independent of the others, but because some are performed faster than others we have to insert some delays.

10ns

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

10ns 10ns 10ns 10ns 10ns 10ns

After we perform the ID and WB steps we must wait 5ns -each step now takes the same amount of time - 10ns.

159.233 Lecture 1 - 114

Each instruction now takes 50ns. - 10ns longer than before

All three instructions take 70ns. - slightly better than 120ns

But it only takes 10ns between instructions.

We have effectively increased our throughput by a factor of 4

1000 instructions now take 10,040ns instead of 40,000ns

We must be careful!

10ns

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

10ns 10ns 10ns 10ns 10ns 10ns

In cycle 3, the ALU is adding reg 0 to 100 for the first instruction

At the same time the immediate value 100 is changing to 200 as the second instruction is being decoded.

Each pipeline stage needs to know the data for the instruction that it is currently working on.

Before each of the stages, we need to include a register to remember details of the current instruction for that stage..

We pass the contents of this register to the next stage at beginning of the next cycle.

159.233 Lecture 1 - 115

mux

Data mem

mux

ADDshift left2

sign ext

mux

ALURegs file

ADD

PC

Inst mem

IF/ID ID/EX EX/MEM MEM/WB

Hazards

Data hazards:

Consider the following sequence of instructions:

sub $2,$1,$3and $12,$2,$5

159.233 Lecture 1 - 116

sub reads regs1 and 3 in cycle 2 and passes them to the ALU.It is not until the cycle 5 that it writes the answer to the Reg File

and reads regs 2 and 5 in cycle 3 and passes them to the ALU.

Reg 2 has not yet been updated!

When and reads the Reg File it gets the wrong value for Reg 2.

This is called a data hazard.

Data hazards occur if an instruction reads a Register that a previous instruction overwrites in a future cycle.

We must eliminate data hazards or pipelining produces incorrect results.

There are three ways to remove data hazards:

159.233 Lecture 1 - 117

The first way is to give the responsibility to software.

We must delay the and until after reg2 has been written.

IM Reg DM Reg

subread $1,$3

write $2

IM Reg

andread $2,$5

IM Reg DM Reg

nop

IM Reg DM

nop

IM Reg

nop

sub $2,$1,$3nopnopnopand $12,$2,$5

This puts the responsibility of removing hazards onto the compiler writer, and involves no extra hardware.

159.233 Lecture 1 - 118

The second way is to implement hardware to detect hazards, if they exist, and to insert a delay in the circuit - a stall.

IM Reg DM Reg

subread $1,$3

write $2

IM

and

Reg

read $2,$5

stall stall stall

IF/ID ID/EX EX/MEM MEM/WB

IF/ID IF/ID

After we detect a hazard we stall the instruction memory so that no new instructions are read until after the stall is resolved. Detecting the hazard condition involves checking the write register in each of the pipeline registers:

ID/EXEX/MEMMEM/WB

If the write register is equal to either of the two read registers in the IF/ID register, then a stall occurs.

159.233 Lecture 1 - 119

The third way is to notice that although reg 2 does not contain valid information at the time that the next instruction wants to read it, the correct information is available!

It just isn’t in the right place.

The EX/MEM register contains the output from the ALU.

The result that is destined for reg 2 is part of the EX/MEM register.

We can insert the output of the EX/MEM register into the input of the ALU - data forwarding.This again involves extra hardware.It removes all of the stalls and nops of the two previous solutions.

159.233 Lecture 1 - 120

Unfortunately it does not work in all cases:

lw $2,100($1)and $12,$2,$5

In this case the load instruction doesn’t get the data from memory until the DM cycle finishes

The data is not available until after it is required.

We have to insert a stall.

The original MIPs chip relied on the compiler writer to insert nops - trading the hardware space needed to control stalls.

These days processors must run as fast as possible. The extra hardware for forwarding/stalling is included on the chip.

159.233 Lecture 1 - 121

Clever Compilers!

If the compiler re-orders instructions, hardware stalls for load instructions can be eliminated:

lw $2,(100)$10lw $3,(200)$11 ;|add $4,$2,$3 ;|one stall added herelw $5,(300)$12

Two values are copied from memory to regs 2 and 3, and are then added together.

A further value is then copied from memory to reg 5

The second lw into reg 3, involves a stall.

If the compiler reorders the instructions:

lw $2,(100)$10lw $3,(200)$11lw $5,(300)$12add $4,$2,$3 ;$3 is available!

The result is the same - but now the add instruction is able to get the value of reg 3 from the forwarding unit -

The load of reg 5 has inserted on extra stage into the pipeline.

159.233 Lecture 1 - 122

no stalls!

159.233 Lecture 1 - 123

Branch Hazards

As soon as we branch to a new instruction, all the instructions that are in the pipeline behind the branch become invalid!

lw $5,(400)$14beq $3,$2,100add $7,$8,$9

..+100 sub $7,$8,$9

Either the add or sub instruction after the beq will be executed depending on the contents of regs 2 and 3.

We can include extra hardware to calculate the branch offset in the decode cycle. Data forwarding then makes it possible to do the branch just one cycle later - insert a nop.

159.233 Lecture 1 - 124

A clever compiler can eliminate the effect of the delay by inserting an instruction after the branch!

This can be the previous instruction!(If it is not involved in the branch.)

lw $5,(400)$14beq $3,$2,100

can be re-ordered:

beq $3,$2,100lw $5,(400)$14

Power Control

The PCON register determines the idle modes of the 2051.

putting a 1 in either of the 2 low order bits will cause the 2051 to go into idle or power-down mode.

Idle mode (01h) will allow interrupts, keep the timers going and preserve RAM.

159.233 Lecture 1 - 125

An interrupt or reset will revert to normal operation

Power-down mode will preserve RAM but needs a reset to resume.

(resets will alter the contents of SFR).

Power-down uses considerable less power than idle mode, which again uses less power than normal operations.

159.233 Lecture 1 - 126

Interrupts

Interrupts can be generated by the timers, the serial port or by an external event connected via a parallel input.

The action taken by the processor is:

finish the currently active instruction.

generate an LCALL to an interrupt service routine

the ISR will perform an RETI when it finishes. This informs the CPU that the interrupt has been handled

The LCALL branches to specific locations in memory:

0003h Ext Int 0000bh Timer 0 interrupt0013h Ext Int 1001bh Timer 1 interrupt0023h Serial Port

Either the service routine can fit in 8 bytes (possible) or it will JMP to a longer routine.Interrupts can be controlled by using the IE register

EA - - ES ET1 EX1 ET0 EX0

EA is a global interrupt enable bit. Interrupts will not happen unless this is set.

ET1/0 are the timer interrupts

ES is the serial interrupt.

The serial interrupt can happen either on RI or TI being set. The ISR will have to read these bits to decide which requires servicing. These bits should be cleared by software.

159.233 Lecture 1 - 127

It is good programming practice to put an reti instruction at each of the LCALL locations even if no interrupts are expected! If one does happen the program will jump into 'no-where'.

Interrupts can be given a priority level.

Those of higher priority can interrupt a running lower priority ISR.

The 2051 has only a two level priority scheme. Each interrupt can be assigned 0 (low priority) or 1 (high priority).

Interrupts at the same priority level cannot interrupt each other.

Levels are set in the IP register:

- - - PS PT1 PX1 PT0 PX0

Which interrupt happens is indicated by the location involved in the LCALL

Assembler programming

8031/51 architecture (specifically the 89C2051)

Is ideal for assembler programming!

2KBytes of program memory (read only)

128 Bytes of data

Programs must be small and efficient.

Compilers do exist for - Pascal, C and Modula 2.

159.233 Lecture 1 - 128

and most development work will be done in a high-level language, if at all possible.

Concentrate on assembler level programming.

An assembler programmer needs to know details of the underlying hardware, as well as be familiar with the machine instruction set.

159.233 Lecture 1 - 129

Assembler statements are ‘English’ versions of machine instructions.

Each assembler statement generates 1 machine code instruction. Whereas a C statement may well generate several machine instructions.

An assembler programmer is in complete control of the hardware.

Details that are generated automatically by C have to be done explicitly in assembler code.(eg passing arguments to functions)

It is important to have a good knowledge of the hardware - the 2051 includes several peripheral devices (timers, output ports etc) as well as the CPU organisation

159.233 Lecture 1 - 130

2051 assembler programming involves:

Write the program using an editor on the development (or host) machine.

Assemble the program.

Simulate the programs execution on a simulator or hardware emulator

Remove any mistakes

Program the chip using a FLASH programmer

Test the chip in a working environment

we will develop programs using a PC as the host environment. The following programs are available from the network:

JFE the gcc editor (but any text editor will do)

ASM51 a cross-compiler for 8031/8051 code

SIM51 a simulator for the 2051 controller

159.233 Lecture 1 - 131

The 2051 architecture:

Programs are stored in a separate memory area to where data is stored.

(This is different to many processors, where code and data are freely intermixed)

2051 code must reside in the PROM (Programmable Read Only Memory).

Constant data is also stored in the PROM.

Variable data is stored in Internal RAM.

(The 8031/8051 architecture does have facilities to allow external program and data storage to be added to the controller chip - but these are not available in the 2051 variant)

159.233 Lecture 1 - 132

Program code space is 2KBytes

The Program Counter is a 16bit register(only 11 bits are used!)

0000h0001h0002h

07ffh

reset for PC

8-bits

Variable data is stored in Internal RAM

00h01h02h

7fh8-bits

159.233 Lecture 1 - 133

General purpose data registers are part of RAM

Machine instructions that access registers are 1 byte shorter than an equivalent instruction that accesses RAM directly.

There are 4 banks of 8 registers R0 - R7.

Only one bank can be active at any time.

Bank 0 is located at 00h - 07h

Bank 1 is the next 8 bytes 08h - 0fh

Bank 2 is at 10h - 17h

Bank 3 is at 18h - 1fh

Register banks are used when we have more than one activity being performed at a time.

It is a simple matter to switch to the relevant set of registers, when we need to swap from one task to the other.

159.233 Lecture 1 - 134

Other registers, eg the accumulator and stack pointer, are stored in an extension to RAM

Locations 080h - 0ffh are called SFR space.

Special Function Registers

Most of this RAM is not there! Only locations for specific registers are implemented in hardware.

SFR space varies considerably for different versions of the 8031/8051 controllers but there is a set of registers common to them all.

The 2051 implements this base set plus one other.

SFR space can be addressed either by actual locations or by standard names

The Program Status Word is either

PSW or 0d0h

159.233 Lecture 1 - 135

Most SFRs are concerned with specific i/o functions, we will deal with them later

For the time being we need to know about the following:

081h SP the stack pointer0d0h PSW the program status word0e0h ACC the accumulator0f0h B the B register

SP is used for subroutine calls and storing items on a stack

PSW contains flags to reflect the outcome of the previous instruction:

Carry flag, Auxiliary Carry flag, OVerflow flag

and also the current Register Bank pointer

ACC is used in most arithmetic and logic instructions - but is not normally accessed directly through SFR space

B is used in multiplication and division

Assembler codetest.asm

$MOD2051csegorg 00h

mov a,#34 ;put 34 in accclr a ;clear the accend

asm51 test

Notes:

159.233 Lecture 1 - 136

1. There are three types of instructions:assembler controls

$MOD2051 tells the assembler to read predefined register names from the file mod2051

assembler directives or pseudo-operationscseg, org these do not generate machine code

real instructionsmov, clr these do generate machine code

2. The assembler produces test.lst:

0000 7422 mov a,#340002 e4 clr a

The numbers in the 1st column are the address where the machine code (2nd column) is stored.

and test.hex, which contains machine code in hexadecimal format. It is read by the simulator.

3. The first instruction generates 2 bytes of code. The second only one.

4. We can include comments in our code by inserting a semicolon.All text after a ; to the end of the line is ignored.

5. We format assembler programs so that they line up on tab stops.

6. We can use either upper or lower case

7. The program does not stop cleanly. Location 3 can contain anything and will be executed after the clr instruction

159.233 Lecture 1 - 137

8. There are 256 instructions!

The first byte is different for different instructions

There may be 1 or 2 more bytes, depending on the instruction.

9. The second instruction could have been

mov a,#00

This takes the same time as clr (1 cycle)

but is a 2 byte instruction!

10. Assembler programs should be as concise (small) and as fast as possible.

Sometimes these requirements conflict. We may need to trade space for speed or vice-versa.

AssemblerUse .asm as an extension for assembler source

In a command prompt

C:> cd \159233C:159233> asm51 filename

Errors will be reported in the file filename.lst

Simulator

C:159233> sim51 filename

┌─── Data ───────────────────────────┬──── Code ───────────────────┬──── SFR ──┐│00: 00 00 00 00 00 00 00 00 ........│000: 7422 MOV A,#22 │ ACC 00 ││08: 00 00 00 00 00 00 00 00 ........│002: E4 CLR A │ B 00 ││10: 00 00 00 00 00 00 00 00 ........│003: FF MOV R7,A │ SP 07 │

159.233 Lecture 1 - 138

│18: 00 00 00 00 00 00 00 00 ........│004: FF MOV R7,A │ PSW 00 ││20: 00 00 00 00 00 00 00 00 ........│005: FF MOV R7,A │ IP 00 ││28: 00 00 00 00 00 00 00 00 ........│006: FF MOV R7,A │ IE 00 ││30: 00 00 00 00 00 00 00 00 ........│007: FF MOV R7,A │ P1 FF in ││38: 00 00 00 00 00 00 00 00 ........│008: FF MOV R7,A │ FF out││40: 00 00 00 00 00 00 00 00 ........│009: FF MOV R7,A │ P3 FF in ││48: 00 00 00 00 00 00 00 00 ........│00A: FF MOV R7,A │ FF out││50: 00 00 00 00 00 00 00 00 ........│00B: FF MOV R7,A │SCON 00 ││58: 00 00 00 00 00 00 00 00 ........│00C: FF MOV R7,A │SBUF 00 in ││60: 00 00 00 00 00 00 00 00 ........│00D: FF MOV R7,A │ 00 out││68: 00 00 00 00 00 00 00 00 ........│00E: FF MOV R7,A │TCON 00 ││70: 00 00 00 00 00 00 00 00 ........│00F: FF MOV R7,A │TMOD 00 ││78: 00 00 00 00 00 00 00 00 ........│010: FF MOV R7,A │PCON 00 │├────────────────────────────┬───────┼──── Flags ──────────────────┤ T0 0000 ││ LCD │ │RegBank 0 C N AC N OV N │ T1 0000 ││Display │ │Cycles 0 PC 0000 │DPTR 0000 │└────────────────────────────┴───────┴─────────────────────────────┴───────────┘ F1-RUN F2-STEP F3-STEPOVER F4-BREAK F5-RESET Tab->next window F10-EXIT v 1.2

159.233 Lecture 1 - 139

Constants

Numbers:

Can be in decimal 34hexadecimal 22hbinary 00100010b

(use a leading zero for 0aah etc)

Characters:

Can be in quotes ‘A’‘ABCD’

numeric 41hmixture ‘Hello’,0dh,0ah

Expressions:

constants can be formed using expressions:2+21 SHR 8

operators are:

+ - * / modshr shlnot and or xor

Assembler directives:

Segment directives:csegdsegbseg

selects the current segment.

cseg for code and constantsdseg for data.

cseg is active by default159.233 Lecture 1 - 140

Location counter:org number

sets the value of the current segments location. If cseg is active, then code will be placed at address number after the org directive

Location counter for each segment defaults to 0

End directiveend

signals the end of the source file.

Equate directiveTEN equ 10

Whenever the symbol TEN occurs in the text it will be replaced by 10

159.233 Lecture 1 - 141

Data directiveX data 34hPSW data 0d0h

single byte data held in RAM or SFR can be given a name

X is the byte at location 34h in RAMPSW is the name of the SFR register at 0d0h

Bit directiveb bit 7CF bit 0d7h

single bit data held in RAM or SFR can be given a name

b is the bit at location 7 in bit addressable RAMCF is the name of the bit at 0d7h - the carry flag

Labelsl1: mov a,#34

Symbols must

only contain letters, numbers _ and ? start with a letter or _? be unique within the first 32 characters not be a reserved word

All symbols are translated to uppercase by the assembler.Labels are symbols with a trailing colon. They represent the location of the instruction or assembler directive.

159.233 Lecture 1 - 142

DB directivecopyright_msg:

db ‘(c) Copyright, 1996’

allocates space for initialised data in the cseg a byte at a time. Can only be used when cseg is active

DS directivebuffer: ds 16

reserves so many bytes (16) of data in RAM for variable data. The dseg must be active.

DBIT directiveio_map: dbit 12

reserves so many bits (12) of data in bit addressable RAM for variable data. The bseg must be active.

The file mod2051 contains many of these directives.

Addressing ModesThere are a variety of ways to address data.

Register Directmov a,R3

There are 8 registers R0 - R7. Register direct will access the contents of the specified register. The contents of R3 are placed into the accumulator

Register Indirectmov a,@R0

Only R0 and R1 may be used.

The register now contains an address.

159.233 Lecture 1 - 143

The contents of that address are placed in the accumulator.

The address must point to RAM ie 0 - 7fh

Immediate constantsmov a,#34

The value of the constant can be found as part of the instruction.

Directmov a,34

The contents of address 34 are placed in the accumulator.

The address can be either RAM or SFR

159.233 Lecture 1 - 144

Relative addressingsjmp start

The instruction contains a signed 8 bit offset that is relative to the first byte of the following instruction,

Absolute addressingajmp start

The instruction contains an 11 bit destination address to anywhere in the 2KByte program space

Implied addressing

Certain instructions imply a particular address - such as the accumulator.

The Program Status Word PSW

The PSW contains status bits that reflect the current state of the CPU

7 6 5 4 3 2 1 0CY AC F0 RS1 RS0 OV F1 P

The CY or carry flag is set by certain operations to indicate a carry out of bit 7.

The AC or auxiliary carry flag is set to indicate a carry out of bit 3.

F0 and F1 are general purpose user definable flags

RS1 and RS0 define the currently selected register bank

P or parity flag is the parity of the accumulator - even parity.

Instructions

159.233 Lecture 1 - 145

Arithmetic instructions:Addressing modes

Dir Ind Reg Immadd a,<byte> X X X Xaddc a,<byte> X X X Xsubb a,<byte> X X X Xinc a A onlyinc <byte> X X Xdec a A onlydec <byte> X X Xmul ab A and B onlydiv ab A and B onlyda a A only

mul multiplies A and Bthe high order byte of the 16 bit result is in Bthe low order byte in A

div divides A by Bthe answer is placed in Athe remainder is in B

da is decimal adjust accumulator, and follows any add or addc instruction working with BCD arithmetic

159.233 Lecture 1 - 146

Logical instructions:Addressing modes

Dir Ind Reg Immanl a,<byte> X X X Xanl <byte>,a Xanl <byte>,#data X

same for orl and xrlclr a A onlycpl a A onlyrl a A onlyrlc a A only

same for rrswap a A only

anl ANDorl ORxrl XORclr clearcpl NOT

rl rotate left 1 bitrlc rotate left through carry

rr rotate right

swap low nibble (4 bits) of A with high nibble

159.233 Lecture 1 - 147

Data transfer:Addressing modes

Dir Ind Reg Immmov a,<src> X X X Xmov <dst>,a X X Xmov <dst>,<src> X X X Xpush <src> Xpop <dst> Xxch a,<byte> X X Xxchd a,@Ri X

mov instructions are the same as high level language assignment statements.

push has the same effect as:inc spmov “@sp”,<src>

pop has the same effect as:mov <dst>,“@sp”dec sp

xch exchanges the contents of A with <byte>

xchd exchanges the low nibble of A with the low nibble of the address pointed to by R0 or R1

159.233 Lecture 1 - 148

The 2051 is capable of addressing single bits.

In RAM these bits are equivalent to the byte addresses 20h to 2fh.These bits are numbered 0 - 7fh

Some SFR registers can be bit addressed.These bits have symbols associated with them and are normally accessed via the symbol. Otherwise they are numbered 80 - 0ffh

Boolean Instructions:anl c,bit ;c = c and bitanl c,/bit ;c = c and not bit

same for orlmov c,bit ;c = bitmov bit,c ;bit = cclr c ;c = 0clr bit ;bit = 0setb c ;c = 1setb bit ;bit = 1cpl c ;c = not ccpl bit ;bit = not bitjc rel ;jump if c == 1jnc rel ;jump if c != 1jb bit,rel ;jump if bit == 1jnb bit,rel ;jump if bit != 1jbc bit,rel ;jump if bit == 1

;then clear bit

159.233 Lecture 1 - 149

Program Branching:jmp addr ;or sjmp/ajmp/ljmpcall addr ;or acall/lcallret ;return from subroutinenop ; no operation

jz rel ;jump if A == 0jnz rel ;jump if A != 0

djnz <byte>,rel ;decrement, jmp if !=0

cjne a,<byte>,rel ;jump if A != <byte>cjne <byte>,#data,rel ;jump if <byte> != #data

jmp or call may be used as generic instructions.

The assembler will choose which specific instruction to use. Sometimes it does not choose the best! Forward jmp's will be converted to ljmp's.

relative addresses are -128 to 127 bytes from the PC.

PC will already be pointing to the next instruction

The $ sign is interpreted by the assembler as current location counter

sjmp $

is an infinite loop.

The relative offset is not 0 though!

It is 0feh

Assembler Programs

Write an assembler program to add the two numbers stored at locations 14h and 15h in RAM and place the answer at location 16h

n3 = n1 + n2;

159.233 Lecture 1 - 150

$mod2051

n1 data 14hn2 data 15hn3 data 16h

org 0mov a,n1add a,n2mov n3,a

end

Write an assembler program to subtract the two numbers stored at locations 14h and 15h in RAM and place the answer at location 16h

n3 = n1 - n2;

$mod2051

n1 data 14hn2 data 15hn3 data 16h

org 0mov a,n1clr csubb a,n2mov n3,a

end

159.233 Lecture 1 - 151

Write an assembler program to convert the character stored at location 65h to lower case

c1 |= 0x20;

$mod2051

c1 data 65h

org 0mov a,c1orl a,#20hmov c1,a

end

Write an assembler program to convert a string at location 45h to lower case.

i = 0;while (s[i]) {

s[i] |= 0x20;i++;

}

$mod2051dsegorg 45h

s: ds 32

csegorg 0mov r0,#smov a,@r0

l1: jz l2orl a,#20hmov @r0,ainc r0mov a,@r0jmp l1

159.233 Lecture 1 - 152

l2: nopend

159.233 Lecture 1 - 153

Write a 2051 assembler program.

When the program starts, R0 points to a string of characters terminated with a 00h byte.

When the program ends, R2 contains the number of consecutive double characters within the string.

Eg

chocolate has no double letters -> R2 = 0sweet has one -> R2 = 1address has two -> R2 = 2

i = 0; doubles = 0; while (buffer[i] != 0) { if ((buffer[i]-buffer[i+1])==0) { doubles++; } i++; }

or (using pointers):

char *r0, *r1; char a; int r2;

r2 = 0; r0 = buffer; r1 = buffer; r1++; a = (*r0) while ((*r0) != 0) { a = a – (*r1); if (a == 0) { r2++; } r0++; r1++;

159.233 Lecture 1 - 154

a = (*r0); }

159.233 Lecture 1 - 155

$mod2051dsegorg 40h

buffer: ds 64;

cseg;use R0 to point to first character ;and R1 to point to next one along ;R2 is where we store the answer;

org 0;

mov R0,#buffer ;init R0 and R1mov R1,#bufferinc R1mov R2,#00 ;set counter to 0mov a,@R0 ;look at a char

l0:jz stop ;stop when its NULclr csubb a,@R1 ;sub next charjnz l1inc R2 ;add to counter if zero

l1:inc R0 ;move to the next pairinc R1mov a,@R0 ;look at next charsjmp l0 ;loop around the string

stop:sjmp stopend

Subroutine/Function calls

The 2051 supports function calls:acall fct

transfers control to the code labelled fctfct:

mov @r0,20hinc r0dec a

159.233 Lecture 1 - 156

jnz fctret

acall pushes the current PC on to the stack.

The SP register points to the stack - defaults to 7

The push, increments SP before transferring PC

ret pops back the previous PC from the stack

The pop, decrements SP after transferring PC

The assembler programmer has to decide, when a subroutine is called, if registers are to be saved or not.

A subroutine may, when it is invoked, always save the registers it uses on the stack.Programs that call these routines can be sure that there are no side effects - no registers are altered.

This may be inefficient!

The subroutine may alter R5 and so push and pop it to preserve its contents.

However the calling program may not use R5 at all - in which case the subroutine has pushed and popped needlessly.

159.233 Lecture 1 - 157

Another scheme is to have the calling program save registers before the acalland recover them afterwards.

push 5acall subpop 5

This suffers from the drawback that the main routine may save registers that the subroutine doesn't use.

A final scheme is for the subroutine to document which registers it uses and for the calling program to save only the ones that it wants.

Status flags are typically always altered by the actions of the subroutine.

Parameter passing:

1. In Registers.

Used by most assembly programs.

Has a problem with large numbers of parameters - may run out of registers.

Complex types - eg strings have to have their address passed.

2. In a parameter block.

Pass a pointer to a block of memory in a register. The subroutine reads the parameters from the block of memory.

Each routine has to have its own block of memory.

159.233 Lecture 1 - 158

3. Immediately after the acall.

Parameters can be retrieved by finding out the return address (on the stack) and reading the parameters from that area of memory.

Can't do this if the program is in PROM!

The return address on the stack has to be fiddled before the ret will work. Otherwise the program will return and try and execute the first parameter.

4. On the stack

Most High Level Languages use this method

The calling program pushes parameters onto the stack. The subroutine accesses them via the stack pointer.

One problem is - who pop’s them at the end?

In Pascal it is the responsibility of the subroutine to tidy up the stack before returning.

In C it is the responsibility of the calling routine.

Pascal and C also have different conventions about the order of the parameters.

In Pascal arguments are pushed from left to right.

C pushes them in the reverse order

C is designed for a variable number of arguments.

159.233 Lecture 1 - 159

ARGUMENT 4

ARGUMENT 3

ARGUMENT 2

ARGUMENT 1

address

return

4a

49

48

47

46

45

44

SP

Input / Output

The 2051 has:2 parallel ports - port 1 and port 32 timers - timer 0 and timer 11 serial port.

(Output pins are shared between uses,not all of the I/O capability will be available).

Timers:The timers can be used as timers or counters.(use either the system clock, or an external pulse).

Timer 1 can be used as the bit rate clock for the serial port.

The timers have several modes – we will look at

Mode 1: 16 bit up counter - each cycle will cause the 16 bit pair TH0/TL0 to increment. When the count gets to 0ffffh it wraps to 0000h and sets an overflow flag.

Mode 2:159.233 Lecture 1 - 160

8 bit up counter with automatic reload.TL0 counts up.When the count gets to 0ffh it sets an overflow flag and reloads TL0 with the value in TH0.

This mode is used for the serial bit rate.

159.233 Lecture 1 - 161

The TMOD register controls the mode setting for the two timers:

0 0 M1 M0 0 0 M1 M0timer 1 timer 0

The TCON register controls the running of the two timers:

TF1 TR1 TF0 TR0 X X X X

TF1 indicates an overflow has occurredTR1 starts the counter running

The system clock runs at typically 11.0592 Mhz1 cycle is 1/12 of the clock.The serial clock runs at 32 times the bit rate.At 9600 bps we auto reload after each count of 3. -> TH1 has the value 0fdh

159.233 Lecture 1 - 162

Serial I/O

Has 4 modes - look only at mode 1 (8bit UART).

SBUF is really two registers. One for input and one for output. Serial I/O works in full-duplex, so the two registers will have different values.

Serial I/O is controlled by the SCON register

SM0 SM1 0 1 0 0 TI RI

SM0 SM1 should be 0 1 for 8 bit UART

The TI bit indicates that the previous output character has been transmitted. It should be cleared by software.

The RI bit indicates that a character has been received. It should be cleared by software.

In the simulator (if the timer and serial i/o systems have been configured) - the serial output is displayed at the bottom of the screen. Keys typed on the keyboard appear as received characters in SBUF

159.233 Lecture 1 - 163

Parallel I/O

Parallel ports are for general I/O. They are all bi-directional - but if we want to input data we must ensure that we write 1’s to the pins. (they are not full-duplex!)

The simulator is configured so that the parallel ports drive an LCD display.

P1 holds the character to be displayed.

P3.5 acts as a strobe signal. The LCD display will read the value from P1 when P3.5 changes from a 1 to a 0.

The real LCD is a little more complicated.

It must first be intialised.

After sending a character, either the CPU must delay for a small time before sending another, or must wait for a signal from the LCD.

The LCD display also has a command mode (clearing the screen, moving the cursor etc) and a data mode (ASCII characters).

159.233 Lecture 1 - 164

Documents

AND/OR gates, Boolean algebra - Massey Universitymjjohnso/notes/59233/all.doc · Web viewPhase 1: Analyse the problem in terms of its inputs Generate an architecture Phase 2: Design