Embedded Systems - Comparative Anatomy and Phisiology

8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology

1/31

Florida Institute of Technology

College of Computer and Electrical Engineering

ECE4551 Computer Architecture

Fall 2009

Semester Project Paper:

The Comparative Anatomy and Physiology of Embedded Systems.

By David Wurmfeld

ECE4551 Page 1 of 31 03/12/2009 03:10:58


2/31

Table of Contents

Florida Institute of Technology ....................................................................................1

Abstract ............................................................................................................................ 3Fundamental Anatomy of an Embedded Processor ......................................................... 4

The programmable FSM: The idea that became a computer: .................................. 4

ROM and RAM: memory structures making up the skeleton of an embeddedprocessor................................................................................................................... 5

The Guts of the Processor: the Program Counter, ALU and Control Unit..............5

Putting it all together, the big picture: ......................................................................7Fundamental Physiology of an Embedded Processor ....................................................11

The computer program: a recipe for functionality................................................. 11

The Program Counter: more than meets the eye.................................................... 11

The Control Unit: Conductor, logic wizard and traffic Cop: .................................14110001110011: I dig Computer Baby Talk! .......................................................14

Making Bits Work: ................................................................................................. 15

Instruction Encoding; how many bits do we need? ................................................15

Looking Forward: Tools for us Humans: ............................................................... 16Temporary storage: Data Memory......................................................................... 17

The Arithmetic Logic Unit: Workhorse of the Embedded Processor....................18Memory Organization and Program Memory: ....................................................... 19

Memory Organization and Data Memory: ............................................................. 21

................................................................................................................................ 24

The other fiddly bits: Input Output and additional processor functionality........24Summary.................................................................................................................... 28

Embedded Processor questions to test for understanding: ......................................... 29

Answers ...................................................................................................................... 30

ECE4551 Page 2 of 31 03/12/2009 03:10:58


3/31

AbstractAn embedded system is a difficult animal to describe. In the general interpretation of

the term, an Embedded System refers to a dedicated computer used to accomplish apre-defined task. The term embedded usually relates to the encapsulated or contained

nature of the device. In the modern vernacular however its meaning is becoming lesssharply defined. The current idea of an embedded system is expanding to mean any

computer system dedicated to a specific purpose. The computer that is the autopilot ona commercial airliner is considered an Embedded System, as is the Windows XP

powered console for a medical MRI imaging system.

Today, the domain of an embedded system is almost limitless, ranging from a full

blown LINUX system deployed on a single VirtexIV FPGAi chip with a PowerPC

microprocessor core and custom integrated peripherals to a 4-bit data security chipglued onto the front of a smart bank card. Indeed, todays embedded systems may not

have any pins to speak of, they may be pre-compiled cores or software templates

of hardware architectures designed to implement and complement common computerresources. These cores are purely software in nature, describing hardware architectureusing a Hardware Description Language or HDL, and only take on a physical

manifestation when implemented within a particular ASIC or FPGA scheme. These

cores are often referred to as Intellectual Property or IP. The domain of the typicalembedded system however is dominated by single chip microcontrollers with fewer

than a dozen Input-Output (I/O) pins.

To understand the scope of embedded systems then it becomes necessary to understand

the resources available (chip/core architecture) as well as the tool chain used to exploit

those resources. This paper will endeavour to describe, using a top-down approach, the

animal that is the Embedded System; its comparative Anatomy and Physiology orhow the architecture and behaviours differ between three different real architectures,

selected by their market share.2006

3.42, 31%

3.9, 36%

3.675, 33%

32-bit 16-bit 8-bit

2007

3.8, 30%

3.9, 31%

4.9, 39%

32- bit 16- bit 8- bit

Microcontroller Market (in Billions of US Dollars) 2006 vs. 20071

According to WSTS2, the lowly 8-bit microcontroller dominated the microcomputer

chip market with monthly sales toping 250 million units per month in 2000, followed

1www.emittsolutions.comMicrocontroller market trend report by Emitt Solutions Inc.2 ExtremeTech online,http://www.extremetech.com/article2/0,2845,1156706,00.asp, Embedded

Processors Part one, September 2009, Quoting WSTS, World Semiconductor Trade Statistics,

http://www.wsts.org

ECE4551 Page 3 of 31 03/12/2009 03:10:58
http://www.emittsolutions.com/http://www.emittsolutions.com/http://www.extremetech.com/article2/0,2845,1156706,00.asphttp://www.extremetech.com/article2/0,2845,1156706,00.asphttp://www.emittsolutions.com/http://www.extremetech.com/article2/0,2845,1156706,00.asp


4/31

by the 4-bit microcontroller at 100 million per month. The so-called hot processors,

16 and 32 bit barely pull in 50 million units per month.

This paper will confine the domain to 8-bit and 16-bit microcontrollers, concentrating

on how they compare with one another, from the 30,000-foot view down to the register

level3

. In addition to the anatomy or architecture of the embedded systems, thephysiology or behaviour, from high level constructs down to bits in silicon will be

outlined and compared. To synthesize the disparate facts and processes into meaningful

information, the embedded systems outlined will be compared from a simpleperformance metric, using a Gedanken experiment4 to explore the performance of

three hypothetical embedded systems.

Fundamental Anatomy of an Embedded ProcessorThe heart of any embedded system is the computer core driving it. It manages the dataflow throughout the system, on chip5 and off chip. This construct constitutes a

revolution in logic design. Historically, digital logic has been combined to form

meaningful representations of the world, for example, an alarm system could bemodelled by representing the doors and windows to be monitored as elements in thedesign, using registers and states to describe the behaviour when a door or window is

opened at the wrong time. These so called Finite State Machines (FSM) were used to

create the original embedded systems, with dedicated chips implementing logicfunctionality (NAND, NOR) all interconnected pin to pin to accommodate the data

flowing into and out of the machine. This dedicated functionality proved cumbersome

for anything but complex control systems, custom designed for a single, specific task.Difficult to design, produce and maintain, a different solution was needed.

The programmable FSM: The idea that became a computer:

Early in the history of electronic devices the idea of a re-configurable system to use thesame hardware to accomplish many different tasks was developed. This so-called

compute-or idea first took shape in 1936 as the Z1 computer designed by KonradZuse6. The first re-configurable or programmable machine, Doctor Zuse is credited

with designing and building the first machine to truly solve floating-point problems

using binary representation7. With these initial architectures and the technologicalrevolution of the transistor, the physical manifestation of a computer shrunk from room

sized behemoths to refrigerator sized boxes down to a single board comprised of only a

few chips.

Like its predecessors, the computer needed all the elements of a traditional Finite State

Machine with a new twist; the ability to change states by following bit patterns; bitpatterns found in configurable structures, structures not hard wired into the design.

3 For the purposes of this paper, we are taking as faith the silicon topologies and processes used to

implement registers work and are well described in other tomes.4 Thought Experiment: http://en.wikipedia.org/wiki/Thought_experiment5 For the initial part of this discussion, we will refer to a chip as the fundamental embedded system

building block. Later we will expand that definition to include the concept of a microcontroller core.6 About.com: Inventors - http://inventors.about.com/library/blcoindex.htm7 Technical Institute of Berlin;http://user.cs.tu-berlin.de/~zuse/Konrad_Zuse/index.html

ECE4551 Page 4 of 31 03/12/2009 03:10:58
http://user.cs.tu-berlin.de/~zuse/Konrad_Zuse/index.htmlhttp://user.cs.tu-berlin.de/~zuse/Konrad_Zuse/index.htmlhttp://user.cs.tu-berlin.de/~zuse/Konrad_Zuse/index.html


5/31

This created the possibility of a new state machine paradigm, one of an InfiniteStateMachine.

ROM and RAM: memory structures making up the skeleton of an

embedded processor.This new topology could execute a pre-configured programmed sequence of

states based on the contents of two new hardware constructs: read-write volatile datamemory and read-mostly program memory. The term volatile data memory refers to

memory that will not retain its contents after the power is removed. The term Readmostly memory refers to memory that can be pre-configured or programmed with asequence of bits (bits, that we will later see are the patterns that represent computer

instructions) that is persistent or non-volatile and will be available after the power has

been removed. For the sake of brevity and tradition, non-volatile memory is referred toas ROM8, whereas volatile memory is referred to as RAM9.

The Guts of the Processor: the Program Counter, ALU and ControlUnit.

Keeping with the comparative anatomy theme, every computer is built using these two

memory structures in one form or another. They are used to provide long and short-term storage for data and instructions. The third element necessary to the operation of

the computer is the control unit. It is a multi-function module that controls and

synchronizes the flow of data from the outside to the inside, and between memoryelements and the outside world.

Like a policeman directing traffic, the control unit directs when and where data willmove. It also records state information for use by other operations. The key to the

control module is the program counter or PC. It is a special purpose register10 that holdsthe memory address of the next instruction to be executed. The width in bits of the PCcorresponds to the maximum number of instructions that can be addressed by the

computer.

The number of instructions that can be addressed is n2 locations. The control module

fetches the instruction from ROM, pointed to by the PC. It then translates the

instruction into a sequence of control signals that route the data from and to theappropriate location.

8 ROM is the acronym for Read Only Memory, meaning not writable but readable. In practice, these

memories are writable at least once, to configure the memory. Typically, they are implemented usingFLASH technology, allowing multiple write cycles using proper programming equipment.9 RAM is the acronym for Random Access Memory, which is a misnomer as all addressable memory is

random access by definition. It traditionally refers to memory that looses all data when the power is off,

and is typically of a static or dynamic nature.10 A register is a fixed width, volatile memory element, used to store intermediate information. This

information may be the next address to execute, or the flag bits used to configure the built in A/D

converter.

ECE4551 Page 5 of 31 03/12/2009 03:10:58


6/31

The next module important to the operation of the computer is known as the

Arithmetic-Logic Unit or ALU. It is responsible for performing various arithmetic

operations on the data, like addition or subtraction as well as various logic operationslike AND, OR, and NOT. In more sophisticated microcontrollers, the ALU may also

include a perfect exchange register, where word-oriented operations can be carried

out, like swap bytes, swapping the upper and lower bytes of the word, or swapnibbles; single cycle operations11 that make short work of bit intensive operations. The

ALU can also test data for various states for the control unit to record. All ALU

operations are directed by the control unit, which is in turn directed by the instructionsfound in the program memory.

The following structures are the building blocks of all embedded computer systems:

Program memory read mostly, stores instructions and constant data (data

that does not change over time). Non-Volatile, data is retained after the power

is turned off. Typically it is organized as an addressable matrix of x bytes,

where is the memory address width and represents 2

locations, and represents 1 or 2 bytes of memory width.

Data memory read/write; stores the results of instructions and stateinteractions. Volatile, looses all data when power is turned off. Typically it is

organized as an addressable matrix of x bytes, where is the memory

address width and represents 2 locations, and represents 1 or 2 bytes ofmemory width.

Program counter a special purpose volatile memory element (usually a

dedicated register) that holds the address of where the processor is in itsinstruction sequence, usually the address of the next instruction to be

fetched12 executed. The width of the PC corresponds to the maximum number

of instructions the computer can hold. Control unit A dedicated Finite State Machine (FSM) that takes as its inputs

the instruction from the program memory, translating the bit pattern into actions

manifested as synchronized control signals and states to the other modules in

the computer.

Arithmetic-Logic-Unit A dedicated FSM that takes as inputs control signals

and chunks of data13 (usually whole bytes or words14) and gives as outputs the

results of the operation, in similar chunks of data.

11 It will be seen that embedded processors are theme oriented, that is as a motor controller, or

communications controller or a sensor controller, and the architecture correspondingly includes specialfunctionality (read dedicated registers/operations) that make those features efficient compared to doing

it manually in software.12 The term fetch normally associated with a ball and the family dog is a good analogy in this context

as the verb describing the action of retrieving the instruction from program memory. It involves looking

it up, getting it physically and bringing it back.13 Historically, the term data word is the fundamental width of the registers and program memory

native to the microprocessor. In the domain of processors we are outlining, it dependent on the

architecture. Wikkipedia: http://en.wikipedia.org/wiki/Word_%28computing%2914

ECE4551 Page 6 of 31 03/12/2009 03:10:58
http://en.wikipedia.org/wiki/Word_(computing)http://en.wikipedia.org/wiki/Word_(computing)


7/31

The anatomy of any embedded processor is made up of these five modules. It is the

way in which these elements are arranged that ultimately describe the behaviour or

how a specific controller actually executes a program of instructions. The next sectionexplores the fundamental physiology or how it actually works, of an embedded

processor.

Putting it all together, the big picture:

All embedded processors we encounter have various combinations of RAM, ROM and

register resources. Typically, for the domain of embedded processors we are exploring,

this memory will reside physically on the silicon from which the chip15 is

constructed.

Figure 1 photomicrograph of physical microcontroller elements on silicon die.16

Other types of embedded processors have enough pins to support accessing memoryoff chip. Starting out, we will address those topologies that have memory built into the

chip. The following figures illustrate three different embedded processors, the ARM-7,

the Atmel 89C2051 and the Microchip PIC18F1330 8-bit microcontrollers. Starting out

lets look at the anatomy of these processors.

In Figure 4, the PIC18 processor, it is easy to identify the modules we have described

so far, ALU, Program Counter, Control Unit Instruction Decode and Control,Program Memory and Data Memory. You will also notice there are many other

modules in the processor we havent discussed yet but may be able to guess at the

15 The word chip is loosely used to mean those devices built from chips of silicon wafer, mounted

onto a leaded carrier, providing the pins that allow connection to the circuit.16 Image copyright 2009 Micro Control Journal. All rights reserved.

(http://www.mcjournal.com/articles/arc105/arc105.htm)

ECE4551 Page 7 of 31 03/12/2009 03:10:58


8/31

functionality. Suffice it to say they all support the movement and modification of data,

which ultimately is the only purpose of any processor, embedded or not.

The other processor illustrations are not so straightforward to interpret. In Figure 2, it isclear what the data and program memory is, the ALU and program counter as well but

where is the Control Unit? It is there, just split into several blocks, each illustrating a

function the control unit must accomplish, like PC incrementer, program addressregister and stack pointer. As long as you understand these functions are common

to all embedded processors, it is not difficult to interpret the block diagram of any

microcontroller.

Figure 2: Atmel 89C2051 Architecture17

The diagram in Figure 3 stretch the simple block diagram concept, but with some

digging it is possible to catch the islands of functionality18. In the CPU diagram, itlays out a stylized arrangement of registers, implying interconnection and the existence

of a control unit connecting them all. As this is a model of the CPU core, the memory

is not illustrated. The memory is laid out a little differently, both the program and datamemory share the same address, but the program counter and flag register are clearly

there along with many other, yet to be understood specialized registers. We will get to

those later; for now the important idea here to grasp is although it may seem like these

17 Image 2009, Atmel, http://www.atmel.com/dyn/products/product_card.asp?part_id=193818 Islands of Functionality refer to isolated group of registers, FSM and other structures that perform a

single job, like the ALU, a timer, or an A/D controller module.

ECE4551 Page 8 of 31 03/12/2009 03:10:58
http://www.atmel.com/dyn/products/product_card.asp?part_id=1938http://www.atmel.com/dyn/products/product_card.asp?part_id=1938


9/31

three processors are dramatically different, they actually differ only by the specifics of

how the various modules interact, and not so much by the modules they have.

Figure 3: The ARM-7 core architecture

ECE4551 Page 9 of 31 03/12/2009 03:10:58


10/31


11/31

Fundamental Physiology of an Embedded Processor Now that we have outlined the structure of the building blocks of an embeddedprocessor, (memory, ALU, Control Unit, PC) and briefly illustrated some real

processors, it is time to describe their inner workings, i.e. how they behave with one

another. Once the general ideas of structure (anatomy) and function (physiology) are

understood we can proceed to take a comparative look at how examples of actualembedded processors work performing similar tasks.

The computer program: a recipe for functionality.

Although trivial in concept, it bears repeating. A computer follows a list of

instructions, starting at the beginning and executing each instruction until theend of the recipe or program. From our previous glimpse into the anatomy of an

embedded processor, we know the program memory (ROM) stores the program

instructions in an addressable matrix of bytes or words. The Program Counter (PC) hasthe vague job of knowing which instruction to execute. It is the job of the control

unit to know where to start the program (the beginning), fetch, decode and execute the

instruction and as long as it isnt the last instruction (the end of the program), advanceto the next instruction, execute it and so on

The Program Counter: more than meets the eye.

In Figure 5 we see our first look at a generic computer, complete with program

memory, control unit and program counter. The program counter is connected to theprogram memory via the program memory address bus. The output of the program

memory goes directly into the control unit. The program counter is also connected to

the control unit. At any one moment in time, any computer is in the middle of a finiteset of cycles, performing mundane tasks like:

Calculate the address of the next instruction

Load the address of the next instruction in the program counter

Enable the program memory to use the address the PC is presenting

Get the value of the memory location (Fetch the instruction)

For the time being we will focus on these simple but vital tasks. Somehow, the control

unit is smart enough to know what the next address is to fetch the next instruction.

The key to understanding how something works is to walk a mile in its shoes as it

were, to follow it step by step as it does its job. Lets consider a simple scenario withtwo questions. What actually happens when an embedded processor20 is powered upand how does the control unit orchestrate these events?

As we are dealing with events that take place in time, it is traditional to illustrate theseevent relationships that happen in time or in synchrony with a waveform chart. A

20 From this point forward, the term processor or embedded processor or computer will all refer to

the same thing.

ECE4551 Page 11 of 31 03/12/2009 03:10:58


12/31

well-organized chart can illustrate in a single picture what would take pages of text to

describe.

The following is a generic power up sequence that could apply to almost any embedded

processor. It is organized as several rows, each representing a particular signal as it

changes in the time domain. The signal may represent an actual voltage, or a logicalstate, for example 0V to 3.3V, or asserted or not asserted21. The row images of the

waveform are linked in time, that is they all start at the same time, and important events

are usually labelled. In this example, the first row represents a logic condition of powerbeing applied, (the lower line illustrates the zero or off condition, and the upper line

represents the high or on condition) rather than its actual voltage value(s). Important

here is the idea that not all signals are valid at all times.

The first few hundred microseconds of life: the power up timingwaveform.

POWER power stableRESET Reset StartCLOCK clock stableADDRESS INST DATA FETCH Cont

Fetch 1st Instructionf

Power on sequence:

1. Power is applied to the chip (The beginning of time as the chip sees it)

2. The reset signal is asserted, holding the chip in a reset state.

3. In the reset state, nothing happens within the computer, but the computer cycleclocks start oscillating and everything is poised, just waiting for the reset to be

released. This is one of the most important times for a computer, without it, the

control unit and program counter would be in unknown states22, and could cause thecomputer to go haywire, not knowing what state it is in, or where to go next.

21 It is more accurate to use the term asserted/not asserted to indicate the value of a particular state. 1or 0, or true or false can all imply an implementation of a state. A logic 0 may be represented by

anything less than 0.9 VDC in a 3.3V system, and represents the assertedorenabledstate of a processor

reset signal, which would be logically true for its value.22 By unknown state, consider what is physically happening in time when power is first applied to a

transistor circuit. This all happens on the time scale of pico and nanoseconds, but when dozens of

transistors are linked together, it can take hundreds or thousands of nanoseconds to settle down into a

known state

ECE4551 Page 12 of 31 03/12/2009 03:10:58


13/31

Putting the computer in the reset condition gives the processor the time it needs to

turn on and set up the physical transistors that make up the hardware to known

conditions, subsequently initializing the control unit to a known state.4. Some time after the power is applied and stable (that is, within the operating range

of the processor), the system clock(s) have started and are stable, the reset signal is

released, and the control unit starts from its initial state. All this happens in a shortperiod of time to us (2 5 milliseconds), a lifetime to a processor that can execute a

half a million instructions in a second.

5. The control unit loads a pre-defined address (processor dependent) into theprogram counter andfetches the first instruction from program memory. The word

fetch12 is often used to describe this control unit cycle, and can be summarized

with the following steps:

a. Start the fetch cycle:b. Using a FSM, assemble the pieces that will make up the next instruction

address. In the case of the first instruction location, no calculation is

necessary, it is a fixed location, just use that address.

c. On the appropriate system clock edge

23

, logic OR the address pieces

24

together and load the assembled new address into the program counter.

d. Enable the new instruction address from the program counter onto theprogram memory address bus.

e. Synchronized with the appropriate clock, and after there has been enough

time for the address to be stable and valid, (the last thing you want is the

address changing while you are trying to read an instruction from programmemory) enable the program memory output onto the instruction bus.

f. Synchronized with the appropriate system clock, read the instruction into a

holding register within the control unit.g. Set the increment next instruction address increment value to be added to

the current instruction address when the next fetch cycle starts.

h. Fetch cycle complete, the control unit now has fetched the instruction fromthe program memory.

6. The control unit is ready to decode the instruction fetched, execute it and start the

fetch cycle all over again.

23 For this overview, we are playing fast and loose with the necessity of system synchronism. Assume on

faith that every processor cycle that is executed is done in time and in sync with a clock, or clocks, or

portions of a clock to insure the data is taken or arrives where it belongs when it is valid to do so.24 The pieces referred to will be described in detail later, suffice it for now the pieces may be an offset

from the current location and the previous location, along with any increment pending.

ECE4551 Page 13 of 31 03/12/2009 03:10:58


14/31

DataMemoryProgram

MemoryPC

ALU

I/O

Port

CU

Buses

Figure 5 Control Interconnections

As you can see, there is more to the program counter than meets the eye; it is thesignpost the entire computer uses to keep track of where it is in the instructionsequence. It is a lot more than just a simple placeholder; it is an integral player in the

instruction fetch cycle as well as an essential element for proper program execution.

The Control Unit: Conductor, logic wizard and traffic Cop:

Up to this point we have glossed over many inner workings of the computer, limiting

our focus on just how does the computer know what instruction to do next. Do notloose sight of the big picture:

There is a program stored in instruction memory (ROM)

This ROM holds the instruction sequence that is the program to be executed,

whatever that means. The control unit is stuffing the program counter with addresses, fetching

instructions and doing something with them.

The next part of our discussion of embedded processor physiology is how the control

unit knows what to do with the instruction it fetched from the program memory.

The Control Unit is the very heart of any embedded processor. Ultimately it is

responsible for knowing what instruction to fetch next, how to fetch it, set up for the

next instruction, decode and execute the instruction just fetched. It is a relativelycomplex FSM designed specifically to control the inner workings of the computer

according to basic cycle specifications like the fetch cycle mentioned previously, or inreal time by decoding the cycle information contained with the instruction.

110001110011: I dig Computer Baby Talk!

So far, this tutorial has been pretty fast with the information; it is kind of like trying to

drink water from a fire hose, possible, but a lot will spill out! So far, what do we know

about the inner workings of a computer?

ECE4551 Page 14 of 31 03/12/2009 03:10:58


15/31

The computer works by executing instructions in sequence.

The program memory stores the instructions to be executed.

The control unit, in conjunction with the program counter can fetch instructions

from program memory, decode and execute them, whatever that means.

So far so good, before we continue, lets do a little Boolean algebra review. Recall that

the number of permutations a particular binary number has is equal to 2n where n is the

number of bits in the binary number. For example, if we had a 4 digit binary number ithas 24 or 16 possible combinations. An 8-bit byte has 28 or 256 possible combinations.

This organization is used extensively in computers to allow us to select one from many,

or address one memory location from the tens of thousands of memory locationsavailable to us. Like the ubiquitous Apartment Number analogy, for every memory

location, there is a unique address, just as there is a unique physical address or number

for every apartment.

Making Bits Work:Remember the alarm example, where the finite state machine modelled the windowsand doors of the house as bit locations to encode the physical world into a digital

representation of that world? The dedicated finite state machine circuits decode the

binary bit locations to determine what door or window was opened.

This is what is being done with the so-called instructions, bit patterns are being used

to represent places and actions we wish the computer to access or execute. Now we cansay it; when a computer executes an instruction, it means that particular instruction

has a physical meaning associated to its unique bit pattern. That meaning is used to

enable the sequence of events that is required to execute the meaning or command.

Instruction Encoding; how many bits do we need?

This is exactly how the control unit knows what to do with the instruction it fetchedpreviously. Each instruction contains an encoded portion indicating what to do, who to

do it to and what to do it with. This is a lot to ask a few bits to do. As we will see later

when we compare processors, some only have enough bits to encode 64 instructions or6 bits (what to do) and 8 bits of location or actual data information (what/who to do it

with/to). That adds up to an instruction word that is 14 bits wide, implying the program

data memory better be at least 14 bits wide.

Time for a real example; lets say our embedded processor has an instruction called

Add. Its function is to cause the contents of some register (lets call it a) to beadded with the fixed value 0x14 and the results stored back in registera, wherever

that is. The operation code (opcode25) for this instruction could be 110001 in

binary, and the fixed (immediate) value might be 0x1426, 00010100. The entire

25 See definition:http://en.wikipedia.org/wiki/Opcode26 The traditional prefix for a hexadecimal number is the two character pair 0x. Each hexadecimal digit

is four bits wide, thus having 16 values, from 0 (0000) to F (1111).

http://en.wikipedia.org/wiki/Hexadecimal

ECE4551 Page 15 of 31 03/12/2009 03:10:58
http://en.wikipedia.org/wiki/Opcodehttp://en.wikipedia.org/wiki/Opcodehttp://en.wikipedia.org/wiki/Hexadecimalhttp://en.wikipedia.org/wiki/Opcodehttp://en.wikipedia.org/wiki/Hexadecimal


16/31

instruction data would be the concatenation (joining together) of the opcode and the

immediate data, for a complete instruction word of: 11000100010100. The

control unit is expert at this kind of binary computer baby talk and decodes the first

six digits as the action to add the number represented by the last 8 digits to someregister called a, wherever that is. Unfortunately, for most of us humans, it is tedious

to impossible to manage lists of binary strings representing instructions and data. Somesort of help was needed for our simple minds to handle binary instructions.

Looking Forward: Tools for us Humans:

The time has come to peek ahead and introduce the concept of mnemonics, the

mapping of a human understandable memory aid onto literal computer constructs27.The trick is to come up with a pseudo language of sorts, with descriptive verbs and

nouns representing the operations we would like to perform. Which of the following

identical statements is easier to understand?

11000100010100 - or - add a,#0x14

The first string28 of binary digits is traditionally called machine language (computer

baby talk) and the second statement is called assembly language, a pseudo Englishpatois of suggestive verbs and nouns loosely cobbled together to garner meaning.

Here is the beginning of what is called the Tool Chain, a very important concept inunderstanding how computers work. There are tools (actually applications that run on a

separate development computer system) that help us translate language a human can

understand into machine language a computer can execute; the actual, physical binarypattern stored in program memory.

In this simple example, we would create a program using a stand alone text editor or aneditor within an IDE29 containing among other things the add statement above anduse that human readable text file as the input to an application called an assembler.

The assembler interprets the assembly language source code into the appropriate bitpattern. To complete the chain, that bit pattern is then combined with other bit patterns

to form an executable bit image. This bit image is then programmed or burned30

into the computer program memory ROM. We will be discussing the tool chainconcept in more detail later. At this point in our tutorial accept it on faith that there is

indeed a way that humans can create programs that ultimately physically are

manifested as bit patterns or instructions inside the processor, ready to execute when

the power is turned on.

27 Derived from Wikipedia definition: http://en.wikipedia.org/wiki/Mnemonic28 Be careful, this is not a binary number; it is a composite representation of opcode and data.29 Integrated Development Environment, a computer application that streamlines the creation of

computer programs by integrating the editor, compiler, assembler and linker into a single user interface.30 Burning a ROM is a throw back to when physical metal fuses integral to the memory were burned

away using a high current pulse, permanently setting the state for that memory location. The specific

mechanisms for memory is beyond the scope of this paper; See

http://www.howstuffworks.com/rom.htm/printable for more details.

ECE4551 Page 16 of 31 03/12/2009 03:10:58
http://en.wikipedia.org/wiki/Mnemonichttp://www.howstuffworks.com/rom.htm/printablehttp://en.wikipedia.org/wiki/Mnemonichttp://www.howstuffworks.com/rom.htm/printable


17/31

Figure 6 Simple Tool Chain, Assembly to Bits in Computer

Temporary storage: Data Memory.

So far we have explored how the control unit fetches an instruction from program

memory and decodes it to perform some task. What is missing is, where does the

control unit store temporary data? (Remember register a?) Not everything is known

when the program is compiled into machine language, for example, if the embeddedsystem was a thermostat, or an alarm controller, where does the computer store the

current temperature? Where does the computer store what doors are closed? This is the

job for Data Memory, a read/write volatile memory that the control unit can use to save

intermediate results, or just about anything that can and does change with time. In thestrictest sense, the program counter is an example of this type of memory; when the

power goes away, the data is lost. Other modules rely on this kind of memory; the

ALU uses temporary storage as scratch pad memory to hold intermediate results. Thecontrol unit relies on temporary data storage to keep track of the current state of the

computer. Data memory is an integral part of the computer architecture, and it is

necessary to understand how it interacts with the other modules we have seen so far.

Data

MemoryProgramMemory PC

ALU

I/OPort

CU

Buses

As you can see from the illustration above, most computer architectures have at leastsome sort of program memory, data memory, program counter and control unit. We are

almost ready to start looking at particular embedded processors. It is first necessary to

understand the relationship between the program memory, data memory and dedicated

volatile memory elements or registers.

ECE4551 Page 17 of 31 03/12/2009 03:10:58


18/31

Before continuing, lets review the concept of a register. It is nothing more or less

than an ordered set of bit(s), not unlike a memory location, that can hold a bit

pattern. A register may be 1 or 256 bits wide. It might hold a single bit from theoverflow of a binary addition, or the four bit value that points to a portion of memory

called a file. Whatever it holds, and however wide it is, it is a volatile memory

element, usually controlled (set, reset, read) by the control unit.

Some embedded processors for example, have a special dedicated register for

everything. This is where the uniqueness of a processor manifests itself; how thefunctions and data are organized physically on the processor. Registers may be general-

purpose scratch pad to hold any value (say the intermediate result of a logic operation)

or a special function register to hold a binary value that corresponds to the artefacts of

the last instruction31.

As embedded processors contain more and more functionality (timers, serial ports, A/D

converters) it is necessary to have volatile memory elements to keep track of all their

settings and status. In some processors, there are over two-dozen separate specialfunction registers just for this purpose. Here is where the similarities end and

individual architectures begin to diverge from the generic model. How does thecomputer organize the needs of program memory, data memory and special function

registers? Keep that question close to mind as we continue our exploration of the

computers last generic element, the ALU.

The Arithmetic Logic Unit: Workhorse of the Embedded Processor.

Up to this point we have hinted that there is a module that is used to do math and logic

operations managed by the control unit. Indeed, inside every embedded processor is an

ALU that can take two or more operands, perform an operation on them (addition,subtraction, multiplication, AND/OR/NOT) storing the result in data memory or a

purpose built register, updating the artefacts associated with that operation; negative,

zero, overflow, underflow, divide by zero)

Figure 7 Simple ALU

31 An artefact is a processor state that may change when an operation is performed. For example, if the

result of a math operation is negative, a bit or flag could be set as an artefact of that operation. There

are various flags or artefacts that are updated after each instruction by the control unit.

ECE4551 Page 18 of 31 03/12/2009 03:10:59


19/31

The ALU itself is a special purpose finite state machine designed to take control signals

and operands and perform the function called out by the control unit. Its featuresinclude the operations it can perform, the size of the operations it can handle and the

size of the results it can produce.

Consider the addition of two 8-bit numbers. The sum could be larger than an 8-bit

number can handle so the ALU must be able to accommodate that possibility. The

ALU must also be able to provide some sort of floating point functionality (or at leastmechanisms to support such operations), usually incorporated as partially hardware,

partially custom math libraries for that processor.32 The full complement of math

operations take up a lot of processor real estate and compromises need to be made to

get the maximum functionality in the minimum space with the best performance possible. It is possible to multiply or divide any two numbers using successive

additions (or subtractions) but that would take a long time. Time then would be the

compromise over the real estate33 needed to have a hardware multiplier integral to the

ALU.

Memory Organization and Program Memory:

However youslice it34, every embedded controller operates on two types of memory;

the memory that stores instructions (program memory) and the memory thattemporarily hold values (data memory). In the previous real chip examples, we can see

the PIC18 and AT89 both have separate, distinct memory areas, with separate data and

address lines as opposed to the R8C architecture, where the program and data memoryis logically one monolithic block, with one common address and one common data bus.

It is not the intention of this paper to compare and contrast the ramifications of this

level of architectural choices35

; we will however explore what they are and how theyare used.

Recall that it is the control unit that calculates the address for the next instruction tofetch. In the case of the PIC18, this is an address that can accommodate a maximum of

8192 memory locations. As a review, how many bits are needed (the minimum

32 It is mathematically possible to do any math operation just using two bits and a lot of RAM, it would

just tale a lot of instructions to orchestrate even a simple 16 bit integer addition. On the other hand, you

could dedicate three separate registers, two 16 bit and one 32 bit register to hold the operands and the

sum respectfully. Controller architecture is a balance of what space you physically have and what

operations can be done in software.33 It physically takes space in silicon to do anything, store a bit or make a control unit. Each processor

designer is faced with the problem of trying to find space for everything the marketing people want inthe new version. Compromises are made in performance or size (and power) when design decisions

(architectures) are made.34 Off-hand homage to the so-called bit-slicers of old.35 As is often the case, the choice of one particular architecture over another has religious implications,

with each ideology having its priests, each believing in their brand of the truth. More often than not,

the choice of processor is cost, or number of pins, or what chip did we use last? or how much do the

tools cost? and not some idealized architecture philosophy.

ECE4551 Page 19 of 31 03/12/2009 03:10:59


20/31

number) to uniquely address every location in the program memory?36 As it turns out,

the internal modules in the PIC18 family of processors are almost identical, and while

the PIC18F1330 may have a program memory depth of 8192 locations, each 16 bitswide, the flagship of the PIC18 family, the PIC18F8722 has up to 128 kBytes (64k

words) of program memory.

This common control structure is not by accident. It is to insure instructions written for

the least capable member of the family will work on the most capable member. In fact,

the address latch for the program memory is a full 20-bits wide, allowing up to 220 orone million memory locations (can we expect future versions of the PIC18 family with

more program memory?). For our PIC18 however, we have more than enough address

bits to accommodate the 8192 locations (16 kBytes) of instructions and constants 37.

As we mentioned earlier, the Program counter is much more than a simple register, in

this example it is almost a mini ALU in the operations it can perform to assemble the

correct address for the next instruction. Keeping with the program memory theme, look

at the Atmel 89C2051 memory. Although difficult to read from the simple blockdiagram, the literature specifies the program memory to be byte wide, with 16 address

bits for a maximum of 65532 locations, (64 k, 1k = 1024 locations) With our variant,the 89C2015 has 2k of program memory or 2048 locations, each one byte wide.

Consider this simple fact for a moment. Each program memory access of a

PIC18F1330 processor returns 16 bits. Each program memory access of the 89C2015returns half as much data. If both processors are running at the same speed, which one

moves more data per unit time? We cant answer that right now, but keep it in mind

when we compare performance between our three embedded processors.

To complete the program memory tour, the ARM-7 processor addresses 32-bit wide

program memory, unlike the PIC18, which uses a 16-bit address. Very similarprocessors, but different approaches to how the program instructions are addressed.

36 2n = 8192, log2(8192) = n = 13 bits

37 We include and constants on purpose when describing program memory as it is the ideal place to

store values that are known when the code is assembled, and would overwhelm the limited data memory

space. This convenience however comes at a cost, as we will soon see.

ECE4551 Page 20 of 31 03/12/2009 03:10:59


21/31

Figure 8: Microchip PIC18F8722

Memory Organization and Data Memory:

Previously we explored the anatomy of controllers, describing the blocks or modules

that made up the controller. These blocks are indeed islands of functionality, for themost part independent of each other. That means while the control unit is fetching

the next instruction, the ALU can be logically ORing two values, sending the results

into a register, and the timer module (described later) can be counting down; all at the

same time, all on the same chip of silicon.

This ability begs to be used in a parallel fashion, and not simply in the serial follow

the recipe concept of a computer program. The tricky part is, how does the computerkeep track of all these independent operations? More on that later.

Going back to our real life examples, consider the monolithic memory architecture ofthe ARM-7. To fetch an instruction, the control unit updates the program counter, then

ECE4551 Page 21 of 31 03/12/2009 03:10:59


22/31

places that address on the address bus. Some time later it reads the instruction from the

ROM then executes it. If there are data values to be stored, it places the destination

address on the same memory bus, and some time later writes it into the RAM.

In this simple example, there is some dead time, that time between subsequent

operations that could have been used doing things in parallel.

Figure 9 "Princeton" (Von Neumann's) Architecture

Figure 10 "Harvard" Architecture

Look at the Figure 4, on the PIC18 architecture; the program memory has its own

address and data busses, distinct and separate from the data memory. This means thecontrol unit can be building the next instruction to be fetched while it is decoding thecurrent instruction, and at the same time writing the results of the last ALU operation

into the data memory. The level of parallelism is common to architectures that have

separate program and data memory spaces. The figures above illustrate the basicmemory architecture differences, the main idea to keep in mind is in the Princeton

architecture, there is one Program/Data memory address and data bus, tying the

program and data memory logically together. In the Harvard architecture (hmmm, I

ECE4551 Page 22 of 31 03/12/2009 03:10:59


23/31

wonder where they were invented) the program and data memory are separate and

distinct, each with its own address and data busses.

Is one architecture better than another? It is a difficult question to answer without

knowing the actual application. In a simple case when the computer is reading a

thermostat, checking it against a pre-set temperature and deciding if the heat should beturned on, it doesnt take many cycles to accomplish, and a monolithic memory may be

just fine. If more performance were required, a faster processor could be used. If

however, many functions were being handled at the same time, like reading the buttonson a USB Nintendo controller, processing 8 buttons as well as the angles of the

left/right joysticks and sending the information back to the Play station in time to keep

from being sliced in half by the Org you are battling, it may be better to have a

microcontroller that can do as much as possible at the same time.

The organization or architecture of the computer includes how the various volatile

memory elements: registers: are organized and controlled. Some architectures use

individual, separate registers for everything: see Figure 11. An alternative to havingseparate physical registers is the model used by the RC8 and PIC18, the registers for

the whole computer are contained in data memory as a set of registers, addressed likeany other memory element and often organized as files or blocks of memory.

This significantly reduces the complexity of the control unit while maintaining the

flexibility of added functionality. For example, consider two processors from the samefamily; the PIC18F8722 and the PIC18F1330. Using the same register file,

architecture, it is possible to accommodate the five timers and 12 A/D modules using

the same control unit that the PIC18F1330 uses to maintain 2 timers and 5 A/Dmodules.

ECE4551 Page 23 of 31 03/12/2009 03:10:59


24/31

Figure 11 PPC 405 Core Programmers model copyright 2006, IBM

The other fiddly bits: Input Output and additionalprocessor functionality.

To round out the inner workings of the data memory, it is important to consider how

our computer actually moves data into and out of the processor chip. In most

architecture, an I/O pin is a simple RAM element, usually organized as a register ofindividually accessible bits that is on the data memory bus. To perform an output

operation, the control unit asserts the direction control so the pin will be electrically an

output pin. Figure 12 illustrates one bit of the I/O port for the PIC18. The simple portpicture illustrated in Figure 4 glosses over the real work the control unit does in

orchestrating an I/O operation. This is further complicated when the I/O pin can be an

analogue input pin as well as a digital I/O pin.

ECE4551 Page 24 of 31 03/12/2009 03:10:59


25/31

Figure 12 Generic PIC18 I/O pin Figure 13 Digital & Analogue I/O pin

In addition to input and output functionality, most microcontrollers these days havesome sort of built in timer capability. These timer modules operate independently once

set and started, and provide a much-needed function to count events or time to be used

in an embedded application. Again, there is a trade off between chip real-estate andsoftware overhead. Any timer function can be implemented in software using loops and

tests, at the cost of having to execute in linear time with the program. No matter howfast a program is it can only be doing one thing at a time. Implementing a timer in

hardware however, relieves the burden of maintaining a count.

ECE4551 Page 25 of 31 03/12/2009 03:10:59


26/31

Real world timers do more than just count; here the ARM-7 has two, 32-bit timers38,

with selectable input count frequencies (multiples of the master clock) as well as input

from I/O pins (to count events happening on a pin). It is a simple matter to preload a16-bit timer39 and count from 0x0000 to 0xffff. If the master 32 MHz clock was used as

the count input then the timer would roll over or overflow 40 every 2 or so milliseconds.

If the fc32 divider is used, then it would overflow every 64 milliseconds.

Timers, I/O ports, A/D modules, indeed most if not all computer special function

modules need registers to hold configuration parameters (in the case of the timer it maybe the flag bit that controls if the timer restarts after an overflow). As we saw in the

PIC18, these registers are part of the computer data memory area. In the case of the

PPC405, they are individual registers, peppered all over the die. Figure 15 illustrates

what is arguably the quintessence of microcontroller technology to date, the PhilipsLPC2114.

Using what we have learned, lets examine this animal closely. The key to evaluating

a microcontroller is in answering fundamental structure (anatomy) questions first, andthen if the structure is appropriate to the task, then take a closer look at the

functionality (physiology) of the beast.

First the bones; how is the Program counter, Control Unit and ALU arranged? How

does the chip get/put data to the outside world? What other goodies are available

(timers, USARTS, A/D)?

At first blush, the block diagram of the LPC2114 seems to be missing many essential

elements. They are there; it just takes a little digging. This chip represents a new trendin embedded processors, that is a common core of functionality, surrounded by the I/O

that makes that particular processor special. Here the core is illustrated in the block

named ARM7-DTMI. The actual guts of the core is illustrated in Figure 14. TheProgram Counter is implemented by two elements, the Address Incrementer and the

Address Register, which makes sense considering how the program counter normally

functions. The other basic processor elements are present, The ALU, the Control Unit,and a data register. Some interesting additions are a 32 x 8 hardware multiplier, and a

barrel shifter interfaced with the ALU. These could help floating point operations by

speeding up in hardware common math tasks. (Remember the real-estate

performance trade-off discussion? Clearly, this chip is built with the building blocks ofspeed.) Another interesting element is the 31 x 32 bit register bank, a good place to

hold intermediate results, or condition flags; at this point in the investigation it is not

clear, but should not come as a surprise when looking at the physiology of this beastto discover there are such register locations in the bank.

38 Philips LPC2124 Datasheet.39 Timers are characterized by how many bits are used to count with.40 A timer typically starts at some preloaded value and count up, one bit at a time until the maximum

value is reached. Then, depending on the mode used, a overflow flag is set, the initial value reloaded

and the cycle started all over again.

ECE4551 Page 26 of 31 03/12/2009 03:10:59


27/31

This cursory look at the ARM-7 core should convince you that it has the infrastructure

to be a hot, 32 bit processor. Now, lets have a look at the I/O.

Figure 14 ARM-7 Core

Looking at Figure 15, we can now see where the program and data memory live, it is a

monolithic block, with a single address and data bus. The I/O however, has their own

address/data bus, an interesting combination, and should make the I/O modulessomewhat independent of the ARM-7core.

In this initial look, it is clear this chip has been designed to accommodate almost any

I/O scenario you would encounter with an embedded processor.

ECE4551 Page 27 of 31 03/12/2009 03:10:59


28/31

Figure 15 ARM-7 (Philips LPC2114) Block Diagram

Summary.In this paper we have explored how any embedded processor is made up of the sameelements, and each individual element has basically the same behaviour. These

elements are mixed and matched to create the animal known as an embedded

processor, and along with the software applications in the tool chain, make up a

development environment the designer can use to solve real world problems usingembedded processors.

The heart of the embedded processor is the concept of an infinite state machine, thatI, dedicated hardware that can reconfigure functionality by following a set of

instructions. This innovation has enabled designers to move away from the directhardware manipulating bits to looking at problems from a modular perspective.

As important as the hardware advances is the tool chain used to design the control

software, the ultimate arbiter of any embedded system.

ECE4551 Page 28 of 31 03/12/2009 03:10:59


29/31

Embedded Processor questions to test forunderstanding:

1. What is the difference between a Finite State Machine and an embedded

processor doing the same task?

2. How do you physically get the program into the embedded processor?3. What is the first thing the embedded processor does when the power is turned

on?

4. What is the difference between an 8-bit processor and a 32-bit processor?5. What are the steps needed to translate the following text into machine code?

int a = 5;

int b = 3;

if(a > b)

a = a+b;

else

a = b;

ECE4551 Page 29 of 31 03/12/2009 03:10:59


30/31

Answers

1. The FSM has one task it is designed to perform; the embedded processor can be

programmed to do an infinite number of tasks.

2. The embedded processor is connected to a chip programmer, either in the

circuit (so called in circuit programming) or as individual chips. The machinecode file is used by the programmer to change the instruction memory on the

chip.3. At the beginning of time, all embedded processors are held in a reset state,

allowing all the internal circuitry a chance to start in a known state.

4. Generally, the difference is in the internal representation of data. In an 8-bit

machine, data is moved one byte at a time. In a 32-bot machine, data is moved(in the same time frame) 4 bytes at a time.

5. The text is input to a compiler that translates the Human readable code (in this

case C) The result of the compiler is assembly code, the processor specifichuman readable version. Finally, that assembly code is fed into an assembler

linker tool that produces machine readable code, specific to that individualprocessor.

ECE4551 Page 30 of 31 03/12/2009 03:10:59


31/31

i Fundamentals of FPGAs: http://www.techonline.com/learning/course/210605004

Documents

Embedded Systems - Comparative Anatomy and Phisiology