Upload
dwurmfeld
View
215
Download
0
Embed Size (px)
Citation preview
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
1/31
Florida Institute of Technology
College of Computer and Electrical Engineering
ECE4551 Computer Architecture
Fall 2009
Semester Project Paper:
The Comparative Anatomy and Physiology of Embedded Systems.
By David Wurmfeld
ECE4551 Page 1 of 31 03/12/2009 03:10:58
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
2/31
Table of Contents
Florida Institute of Technology ....................................................................................1
Abstract ............................................................................................................................ 3Fundamental Anatomy of an Embedded Processor ......................................................... 4
The programmable FSM: The idea that became a computer: .................................. 4
ROM and RAM: memory structures making up the skeleton of an embeddedprocessor................................................................................................................... 5
The Guts of the Processor: the Program Counter, ALU and Control Unit..............5
Putting it all together, the big picture: ......................................................................7Fundamental Physiology of an Embedded Processor ....................................................11
The computer program: a recipe for functionality................................................. 11
The Program Counter: more than meets the eye.................................................... 11
The Control Unit: Conductor, logic wizard and traffic Cop: .................................14110001110011: I dig Computer Baby Talk! .......................................................14
Making Bits Work: ................................................................................................. 15
Instruction Encoding; how many bits do we need? ................................................15
Looking Forward: Tools for us Humans: ............................................................... 16Temporary storage: Data Memory......................................................................... 17
The Arithmetic Logic Unit: Workhorse of the Embedded Processor....................18Memory Organization and Program Memory: ....................................................... 19
Memory Organization and Data Memory: ............................................................. 21
................................................................................................................................ 24
The other fiddly bits: Input Output and additional processor functionality........24Summary.................................................................................................................... 28
Embedded Processor questions to test for understanding: ......................................... 29
Answers ...................................................................................................................... 30
ECE4551 Page 2 of 31 03/12/2009 03:10:58
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
3/31
AbstractAn embedded system is a difficult animal to describe. In the general interpretation of
the term, an Embedded System refers to a dedicated computer used to accomplish apre-defined task. The term embedded usually relates to the encapsulated or contained
nature of the device. In the modern vernacular however its meaning is becoming lesssharply defined. The current idea of an embedded system is expanding to mean any
computer system dedicated to a specific purpose. The computer that is the autopilot ona commercial airliner is considered an Embedded System, as is the Windows XP
powered console for a medical MRI imaging system.
Today, the domain of an embedded system is almost limitless, ranging from a full
blown LINUX system deployed on a single VirtexIV FPGAi chip with a PowerPC
microprocessor core and custom integrated peripherals to a 4-bit data security chipglued onto the front of a smart bank card. Indeed, todays embedded systems may not
have any pins to speak of, they may be pre-compiled cores or software templates
of hardware architectures designed to implement and complement common computerresources. These cores are purely software in nature, describing hardware architectureusing a Hardware Description Language or HDL, and only take on a physical
manifestation when implemented within a particular ASIC or FPGA scheme. These
cores are often referred to as Intellectual Property or IP. The domain of the typicalembedded system however is dominated by single chip microcontrollers with fewer
than a dozen Input-Output (I/O) pins.
To understand the scope of embedded systems then it becomes necessary to understand
the resources available (chip/core architecture) as well as the tool chain used to exploit
those resources. This paper will endeavour to describe, using a top-down approach, the
animal that is the Embedded System; its comparative Anatomy and Physiology orhow the architecture and behaviours differ between three different real architectures,
selected by their market share.2006
3.42, 31%
3.9, 36%
3.675, 33%
32-bit 16-bit 8-bit
2007
3.8, 30%
3.9, 31%
4.9, 39%
32- bit 16- bit 8- bit
Microcontroller Market (in Billions of US Dollars) 2006 vs. 20071
According to WSTS2, the lowly 8-bit microcontroller dominated the microcomputer
chip market with monthly sales toping 250 million units per month in 2000, followed
1www.emittsolutions.comMicrocontroller market trend report by Emitt Solutions Inc.2 ExtremeTech online,http://www.extremetech.com/article2/0,2845,1156706,00.asp, Embedded
Processors Part one, September 2009, Quoting WSTS, World Semiconductor Trade Statistics,
http://www.wsts.org
ECE4551 Page 3 of 31 03/12/2009 03:10:58
http://www.emittsolutions.com/http://www.emittsolutions.com/http://www.extremetech.com/article2/0,2845,1156706,00.asphttp://www.extremetech.com/article2/0,2845,1156706,00.asphttp://www.emittsolutions.com/http://www.extremetech.com/article2/0,2845,1156706,00.asp8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
4/31
by the 4-bit microcontroller at 100 million per month. The so-called hot processors,
16 and 32 bit barely pull in 50 million units per month.
This paper will confine the domain to 8-bit and 16-bit microcontrollers, concentrating
on how they compare with one another, from the 30,000-foot view down to the register
level3
. In addition to the anatomy or architecture of the embedded systems, thephysiology or behaviour, from high level constructs down to bits in silicon will be
outlined and compared. To synthesize the disparate facts and processes into meaningful
information, the embedded systems outlined will be compared from a simpleperformance metric, using a Gedanken experiment4 to explore the performance of
three hypothetical embedded systems.
Fundamental Anatomy of an Embedded ProcessorThe heart of any embedded system is the computer core driving it. It manages the dataflow throughout the system, on chip5 and off chip. This construct constitutes a
revolution in logic design. Historically, digital logic has been combined to form
meaningful representations of the world, for example, an alarm system could bemodelled by representing the doors and windows to be monitored as elements in thedesign, using registers and states to describe the behaviour when a door or window is
opened at the wrong time. These so called Finite State Machines (FSM) were used to
create the original embedded systems, with dedicated chips implementing logicfunctionality (NAND, NOR) all interconnected pin to pin to accommodate the data
flowing into and out of the machine. This dedicated functionality proved cumbersome
for anything but complex control systems, custom designed for a single, specific task.Difficult to design, produce and maintain, a different solution was needed.
The programmable FSM: The idea that became a computer:
Early in the history of electronic devices the idea of a re-configurable system to use thesame hardware to accomplish many different tasks was developed. This so-called
compute-or idea first took shape in 1936 as the Z1 computer designed by KonradZuse6. The first re-configurable or programmable machine, Doctor Zuse is credited
with designing and building the first machine to truly solve floating-point problems
using binary representation7. With these initial architectures and the technologicalrevolution of the transistor, the physical manifestation of a computer shrunk from room
sized behemoths to refrigerator sized boxes down to a single board comprised of only a
few chips.
Like its predecessors, the computer needed all the elements of a traditional Finite State
Machine with a new twist; the ability to change states by following bit patterns; bitpatterns found in configurable structures, structures not hard wired into the design.
3 For the purposes of this paper, we are taking as faith the silicon topologies and processes used to
implement registers work and are well described in other tomes.4 Thought Experiment: http://en.wikipedia.org/wiki/Thought_experiment5 For the initial part of this discussion, we will refer to a chip as the fundamental embedded system
building block. Later we will expand that definition to include the concept of a microcontroller core.6 About.com: Inventors - http://inventors.about.com/library/blcoindex.htm7 Technical Institute of Berlin;http://user.cs.tu-berlin.de/~zuse/Konrad_Zuse/index.html
ECE4551 Page 4 of 31 03/12/2009 03:10:58
http://user.cs.tu-berlin.de/~zuse/Konrad_Zuse/index.htmlhttp://user.cs.tu-berlin.de/~zuse/Konrad_Zuse/index.htmlhttp://user.cs.tu-berlin.de/~zuse/Konrad_Zuse/index.html8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
5/31
This created the possibility of a new state machine paradigm, one of an InfiniteStateMachine.
ROM and RAM: memory structures making up the skeleton of an
embedded processor.This new topology could execute a pre-configured programmed sequence of
states based on the contents of two new hardware constructs: read-write volatile datamemory and read-mostly program memory. The term volatile data memory refers to
memory that will not retain its contents after the power is removed. The term Readmostly memory refers to memory that can be pre-configured or programmed with asequence of bits (bits, that we will later see are the patterns that represent computer
instructions) that is persistent or non-volatile and will be available after the power has
been removed. For the sake of brevity and tradition, non-volatile memory is referred toas ROM8, whereas volatile memory is referred to as RAM9.
The Guts of the Processor: the Program Counter, ALU and ControlUnit.
Keeping with the comparative anatomy theme, every computer is built using these two
memory structures in one form or another. They are used to provide long and short-term storage for data and instructions. The third element necessary to the operation of
the computer is the control unit. It is a multi-function module that controls and
synchronizes the flow of data from the outside to the inside, and between memoryelements and the outside world.
Like a policeman directing traffic, the control unit directs when and where data willmove. It also records state information for use by other operations. The key to the
control module is the program counter or PC. It is a special purpose register10 that holdsthe memory address of the next instruction to be executed. The width in bits of the PCcorresponds to the maximum number of instructions that can be addressed by the
computer.
The number of instructions that can be addressed is n2 locations. The control module
fetches the instruction from ROM, pointed to by the PC. It then translates the
instruction into a sequence of control signals that route the data from and to theappropriate location.
8 ROM is the acronym for Read Only Memory, meaning not writable but readable. In practice, these
memories are writable at least once, to configure the memory. Typically, they are implemented usingFLASH technology, allowing multiple write cycles using proper programming equipment.9 RAM is the acronym for Random Access Memory, which is a misnomer as all addressable memory is
random access by definition. It traditionally refers to memory that looses all data when the power is off,
and is typically of a static or dynamic nature.10 A register is a fixed width, volatile memory element, used to store intermediate information. This
information may be the next address to execute, or the flag bits used to configure the built in A/D
converter.
ECE4551 Page 5 of 31 03/12/2009 03:10:58
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
6/31
The next module important to the operation of the computer is known as the
Arithmetic-Logic Unit or ALU. It is responsible for performing various arithmetic
operations on the data, like addition or subtraction as well as various logic operationslike AND, OR, and NOT. In more sophisticated microcontrollers, the ALU may also
include a perfect exchange register, where word-oriented operations can be carried
out, like swap bytes, swapping the upper and lower bytes of the word, or swapnibbles; single cycle operations11 that make short work of bit intensive operations. The
ALU can also test data for various states for the control unit to record. All ALU
operations are directed by the control unit, which is in turn directed by the instructionsfound in the program memory.
The following structures are the building blocks of all embedded computer systems:
Program memory read mostly, stores instructions and constant data (data
that does not change over time). Non-Volatile, data is retained after the power
is turned off. Typically it is organized as an addressable matrix of x bytes,
where is the memory address width and represents 2
locations, and represents 1 or 2 bytes of memory width.
Data memory read/write; stores the results of instructions and stateinteractions. Volatile, looses all data when power is turned off. Typically it is
organized as an addressable matrix of x bytes, where is the memory
address width and represents 2 locations, and represents 1 or 2 bytes ofmemory width.
Program counter a special purpose volatile memory element (usually a
dedicated register) that holds the address of where the processor is in itsinstruction sequence, usually the address of the next instruction to be
fetched12 executed. The width of the PC corresponds to the maximum number
of instructions the computer can hold. Control unit A dedicated Finite State Machine (FSM) that takes as its inputs
the instruction from the program memory, translating the bit pattern into actions
manifested as synchronized control signals and states to the other modules in
the computer.
Arithmetic-Logic-Unit A dedicated FSM that takes as inputs control signals
and chunks of data13 (usually whole bytes or words14) and gives as outputs the
results of the operation, in similar chunks of data.
11 It will be seen that embedded processors are theme oriented, that is as a motor controller, or
communications controller or a sensor controller, and the architecture correspondingly includes specialfunctionality (read dedicated registers/operations) that make those features efficient compared to doing
it manually in software.12 The term fetch normally associated with a ball and the family dog is a good analogy in this context
as the verb describing the action of retrieving the instruction from program memory. It involves looking
it up, getting it physically and bringing it back.13 Historically, the term data word is the fundamental width of the registers and program memory
native to the microprocessor. In the domain of processors we are outlining, it dependent on the
architecture. Wikkipedia: http://en.wikipedia.org/wiki/Word_%28computing%2914
ECE4551 Page 6 of 31 03/12/2009 03:10:58
http://en.wikipedia.org/wiki/Word_(computing)http://en.wikipedia.org/wiki/Word_(computing)8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
7/31
The anatomy of any embedded processor is made up of these five modules. It is the
way in which these elements are arranged that ultimately describe the behaviour or
how a specific controller actually executes a program of instructions. The next sectionexplores the fundamental physiology or how it actually works, of an embedded
processor.
Putting it all together, the big picture:
All embedded processors we encounter have various combinations of RAM, ROM and
register resources. Typically, for the domain of embedded processors we are exploring,
this memory will reside physically on the silicon from which the chip15 is
constructed.
Figure 1 photomicrograph of physical microcontroller elements on silicon die.16
Other types of embedded processors have enough pins to support accessing memoryoff chip. Starting out, we will address those topologies that have memory built into the
chip. The following figures illustrate three different embedded processors, the ARM-7,
the Atmel 89C2051 and the Microchip PIC18F1330 8-bit microcontrollers. Starting out
lets look at the anatomy of these processors.
In Figure 4, the PIC18 processor, it is easy to identify the modules we have described
so far, ALU, Program Counter, Control Unit Instruction Decode and Control,Program Memory and Data Memory. You will also notice there are many other
modules in the processor we havent discussed yet but may be able to guess at the
15 The word chip is loosely used to mean those devices built from chips of silicon wafer, mounted
onto a leaded carrier, providing the pins that allow connection to the circuit.16 Image copyright 2009 Micro Control Journal. All rights reserved.
(http://www.mcjournal.com/articles/arc105/arc105.htm)
ECE4551 Page 7 of 31 03/12/2009 03:10:58
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
8/31
functionality. Suffice it to say they all support the movement and modification of data,
which ultimately is the only purpose of any processor, embedded or not.
The other processor illustrations are not so straightforward to interpret. In Figure 2, it isclear what the data and program memory is, the ALU and program counter as well but
where is the Control Unit? It is there, just split into several blocks, each illustrating a
function the control unit must accomplish, like PC incrementer, program addressregister and stack pointer. As long as you understand these functions are common
to all embedded processors, it is not difficult to interpret the block diagram of any
microcontroller.
Figure 2: Atmel 89C2051 Architecture17
The diagram in Figure 3 stretch the simple block diagram concept, but with some
digging it is possible to catch the islands of functionality18. In the CPU diagram, itlays out a stylized arrangement of registers, implying interconnection and the existence
of a control unit connecting them all. As this is a model of the CPU core, the memory
is not illustrated. The memory is laid out a little differently, both the program and datamemory share the same address, but the program counter and flag register are clearly
there along with many other, yet to be understood specialized registers. We will get to
those later; for now the important idea here to grasp is although it may seem like these
17 Image 2009, Atmel, http://www.atmel.com/dyn/products/product_card.asp?part_id=193818 Islands of Functionality refer to isolated group of registers, FSM and other structures that perform a
single job, like the ALU, a timer, or an A/D controller module.
ECE4551 Page 8 of 31 03/12/2009 03:10:58
http://www.atmel.com/dyn/products/product_card.asp?part_id=1938http://www.atmel.com/dyn/products/product_card.asp?part_id=19388/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
9/31
three processors are dramatically different, they actually differ only by the specifics of
how the various modules interact, and not so much by the modules they have.
Figure 3: The ARM-7 core architecture
ECE4551 Page 9 of 31 03/12/2009 03:10:58
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
10/31
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
11/31
Fundamental Physiology of an Embedded Processor Now that we have outlined the structure of the building blocks of an embeddedprocessor, (memory, ALU, Control Unit, PC) and briefly illustrated some real
processors, it is time to describe their inner workings, i.e. how they behave with one
another. Once the general ideas of structure (anatomy) and function (physiology) are
understood we can proceed to take a comparative look at how examples of actualembedded processors work performing similar tasks.
The computer program: a recipe for functionality.
Although trivial in concept, it bears repeating. A computer follows a list of
instructions, starting at the beginning and executing each instruction until theend of the recipe or program. From our previous glimpse into the anatomy of an
embedded processor, we know the program memory (ROM) stores the program
instructions in an addressable matrix of bytes or words. The Program Counter (PC) hasthe vague job of knowing which instruction to execute. It is the job of the control
unit to know where to start the program (the beginning), fetch, decode and execute the
instruction and as long as it isnt the last instruction (the end of the program), advanceto the next instruction, execute it and so on
The Program Counter: more than meets the eye.
In Figure 5 we see our first look at a generic computer, complete with program
memory, control unit and program counter. The program counter is connected to theprogram memory via the program memory address bus. The output of the program
memory goes directly into the control unit. The program counter is also connected to
the control unit. At any one moment in time, any computer is in the middle of a finiteset of cycles, performing mundane tasks like:
Calculate the address of the next instruction
Load the address of the next instruction in the program counter
Enable the program memory to use the address the PC is presenting
Get the value of the memory location (Fetch the instruction)
For the time being we will focus on these simple but vital tasks. Somehow, the control
unit is smart enough to know what the next address is to fetch the next instruction.
The key to understanding how something works is to walk a mile in its shoes as it
were, to follow it step by step as it does its job. Lets consider a simple scenario withtwo questions. What actually happens when an embedded processor20 is powered upand how does the control unit orchestrate these events?
As we are dealing with events that take place in time, it is traditional to illustrate theseevent relationships that happen in time or in synchrony with a waveform chart. A
20 From this point forward, the term processor or embedded processor or computer will all refer to
the same thing.
ECE4551 Page 11 of 31 03/12/2009 03:10:58
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
12/31
well-organized chart can illustrate in a single picture what would take pages of text to
describe.
The following is a generic power up sequence that could apply to almost any embedded
processor. It is organized as several rows, each representing a particular signal as it
changes in the time domain. The signal may represent an actual voltage, or a logicalstate, for example 0V to 3.3V, or asserted or not asserted21. The row images of the
waveform are linked in time, that is they all start at the same time, and important events
are usually labelled. In this example, the first row represents a logic condition of powerbeing applied, (the lower line illustrates the zero or off condition, and the upper line
represents the high or on condition) rather than its actual voltage value(s). Important
here is the idea that not all signals are valid at all times.
The first few hundred microseconds of life: the power up timingwaveform.
POWER power stableRESET Reset StartCLOCK clock stableADDRESS INST DATA FETCH Cont
Fetch 1st Instructionf
Power on sequence:
1. Power is applied to the chip (The beginning of time as the chip sees it)
2. The reset signal is asserted, holding the chip in a reset state.
3. In the reset state, nothing happens within the computer, but the computer cycleclocks start oscillating and everything is poised, just waiting for the reset to be
released. This is one of the most important times for a computer, without it, the
control unit and program counter would be in unknown states22, and could cause thecomputer to go haywire, not knowing what state it is in, or where to go next.
21 It is more accurate to use the term asserted/not asserted to indicate the value of a particular state. 1or 0, or true or false can all imply an implementation of a state. A logic 0 may be represented by
anything less than 0.9 VDC in a 3.3V system, and represents the assertedorenabledstate of a processor
reset signal, which would be logically true for its value.22 By unknown state, consider what is physically happening in time when power is first applied to a
transistor circuit. This all happens on the time scale of pico and nanoseconds, but when dozens of
transistors are linked together, it can take hundreds or thousands of nanoseconds to settle down into a
known state
ECE4551 Page 12 of 31 03/12/2009 03:10:58
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
13/31
Putting the computer in the reset condition gives the processor the time it needs to
turn on and set up the physical transistors that make up the hardware to known
conditions, subsequently initializing the control unit to a known state.4. Some time after the power is applied and stable (that is, within the operating range
of the processor), the system clock(s) have started and are stable, the reset signal is
released, and the control unit starts from its initial state. All this happens in a shortperiod of time to us (2 5 milliseconds), a lifetime to a processor that can execute a
half a million instructions in a second.
5. The control unit loads a pre-defined address (processor dependent) into theprogram counter andfetches the first instruction from program memory. The word
fetch12 is often used to describe this control unit cycle, and can be summarized
with the following steps:
a. Start the fetch cycle:b. Using a FSM, assemble the pieces that will make up the next instruction
address. In the case of the first instruction location, no calculation is
necessary, it is a fixed location, just use that address.
c. On the appropriate system clock edge
23
, logic OR the address pieces
24
together and load the assembled new address into the program counter.
d. Enable the new instruction address from the program counter onto theprogram memory address bus.
e. Synchronized with the appropriate clock, and after there has been enough
time for the address to be stable and valid, (the last thing you want is the
address changing while you are trying to read an instruction from programmemory) enable the program memory output onto the instruction bus.
f. Synchronized with the appropriate system clock, read the instruction into a
holding register within the control unit.g. Set the increment next instruction address increment value to be added to
the current instruction address when the next fetch cycle starts.
h. Fetch cycle complete, the control unit now has fetched the instruction fromthe program memory.
6. The control unit is ready to decode the instruction fetched, execute it and start the
fetch cycle all over again.
23 For this overview, we are playing fast and loose with the necessity of system synchronism. Assume on
faith that every processor cycle that is executed is done in time and in sync with a clock, or clocks, or
portions of a clock to insure the data is taken or arrives where it belongs when it is valid to do so.24 The pieces referred to will be described in detail later, suffice it for now the pieces may be an offset
from the current location and the previous location, along with any increment pending.
ECE4551 Page 13 of 31 03/12/2009 03:10:58
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
14/31
DataMemoryProgram
MemoryPC
ALU
I/O
Port
CU
Buses
Figure 5 Control Interconnections
As you can see, there is more to the program counter than meets the eye; it is thesignpost the entire computer uses to keep track of where it is in the instructionsequence. It is a lot more than just a simple placeholder; it is an integral player in the
instruction fetch cycle as well as an essential element for proper program execution.
The Control Unit: Conductor, logic wizard and traffic Cop:
Up to this point we have glossed over many inner workings of the computer, limiting
our focus on just how does the computer know what instruction to do next. Do notloose sight of the big picture:
There is a program stored in instruction memory (ROM)
This ROM holds the instruction sequence that is the program to be executed,
whatever that means. The control unit is stuffing the program counter with addresses, fetching
instructions and doing something with them.
The next part of our discussion of embedded processor physiology is how the control
unit knows what to do with the instruction it fetched from the program memory.
The Control Unit is the very heart of any embedded processor. Ultimately it is
responsible for knowing what instruction to fetch next, how to fetch it, set up for the
next instruction, decode and execute the instruction just fetched. It is a relativelycomplex FSM designed specifically to control the inner workings of the computer
according to basic cycle specifications like the fetch cycle mentioned previously, or inreal time by decoding the cycle information contained with the instruction.
110001110011: I dig Computer Baby Talk!
So far, this tutorial has been pretty fast with the information; it is kind of like trying to
drink water from a fire hose, possible, but a lot will spill out! So far, what do we know
about the inner workings of a computer?
ECE4551 Page 14 of 31 03/12/2009 03:10:58
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
15/31
The computer works by executing instructions in sequence.
The program memory stores the instructions to be executed.
The control unit, in conjunction with the program counter can fetch instructions
from program memory, decode and execute them, whatever that means.
So far so good, before we continue, lets do a little Boolean algebra review. Recall that
the number of permutations a particular binary number has is equal to 2n where n is the
number of bits in the binary number. For example, if we had a 4 digit binary number ithas 24 or 16 possible combinations. An 8-bit byte has 28 or 256 possible combinations.
This organization is used extensively in computers to allow us to select one from many,
or address one memory location from the tens of thousands of memory locationsavailable to us. Like the ubiquitous Apartment Number analogy, for every memory
location, there is a unique address, just as there is a unique physical address or number
for every apartment.
Making Bits Work:Remember the alarm example, where the finite state machine modelled the windowsand doors of the house as bit locations to encode the physical world into a digital
representation of that world? The dedicated finite state machine circuits decode the
binary bit locations to determine what door or window was opened.
This is what is being done with the so-called instructions, bit patterns are being used
to represent places and actions we wish the computer to access or execute. Now we cansay it; when a computer executes an instruction, it means that particular instruction
has a physical meaning associated to its unique bit pattern. That meaning is used to
enable the sequence of events that is required to execute the meaning or command.
Instruction Encoding; how many bits do we need?
This is exactly how the control unit knows what to do with the instruction it fetchedpreviously. Each instruction contains an encoded portion indicating what to do, who to
do it to and what to do it with. This is a lot to ask a few bits to do. As we will see later
when we compare processors, some only have enough bits to encode 64 instructions or6 bits (what to do) and 8 bits of location or actual data information (what/who to do it
with/to). That adds up to an instruction word that is 14 bits wide, implying the program
data memory better be at least 14 bits wide.
Time for a real example; lets say our embedded processor has an instruction called
Add. Its function is to cause the contents of some register (lets call it a) to beadded with the fixed value 0x14 and the results stored back in registera, wherever
that is. The operation code (opcode25) for this instruction could be 110001 in
binary, and the fixed (immediate) value might be 0x1426, 00010100. The entire
25 See definition:http://en.wikipedia.org/wiki/Opcode26 The traditional prefix for a hexadecimal number is the two character pair 0x. Each hexadecimal digit
is four bits wide, thus having 16 values, from 0 (0000) to F (1111).
http://en.wikipedia.org/wiki/Hexadecimal
ECE4551 Page 15 of 31 03/12/2009 03:10:58
http://en.wikipedia.org/wiki/Opcodehttp://en.wikipedia.org/wiki/Opcodehttp://en.wikipedia.org/wiki/Hexadecimalhttp://en.wikipedia.org/wiki/Opcodehttp://en.wikipedia.org/wiki/Hexadecimal8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
16/31
instruction data would be the concatenation (joining together) of the opcode and the
immediate data, for a complete instruction word of: 11000100010100. The
control unit is expert at this kind of binary computer baby talk and decodes the first
six digits as the action to add the number represented by the last 8 digits to someregister called a, wherever that is. Unfortunately, for most of us humans, it is tedious
to impossible to manage lists of binary strings representing instructions and data. Somesort of help was needed for our simple minds to handle binary instructions.
Looking Forward: Tools for us Humans:
The time has come to peek ahead and introduce the concept of mnemonics, the
mapping of a human understandable memory aid onto literal computer constructs27.The trick is to come up with a pseudo language of sorts, with descriptive verbs and
nouns representing the operations we would like to perform. Which of the following
identical statements is easier to understand?
11000100010100 - or - add a,#0x14
The first string28 of binary digits is traditionally called machine language (computer
baby talk) and the second statement is called assembly language, a pseudo Englishpatois of suggestive verbs and nouns loosely cobbled together to garner meaning.
Here is the beginning of what is called the Tool Chain, a very important concept inunderstanding how computers work. There are tools (actually applications that run on a
separate development computer system) that help us translate language a human can
understand into machine language a computer can execute; the actual, physical binarypattern stored in program memory.
In this simple example, we would create a program using a stand alone text editor or aneditor within an IDE29 containing among other things the add statement above anduse that human readable text file as the input to an application called an assembler.
The assembler interprets the assembly language source code into the appropriate bitpattern. To complete the chain, that bit pattern is then combined with other bit patterns
to form an executable bit image. This bit image is then programmed or burned30
into the computer program memory ROM. We will be discussing the tool chainconcept in more detail later. At this point in our tutorial accept it on faith that there is
indeed a way that humans can create programs that ultimately physically are
manifested as bit patterns or instructions inside the processor, ready to execute when
the power is turned on.
27 Derived from Wikipedia definition: http://en.wikipedia.org/wiki/Mnemonic28 Be careful, this is not a binary number; it is a composite representation of opcode and data.29 Integrated Development Environment, a computer application that streamlines the creation of
computer programs by integrating the editor, compiler, assembler and linker into a single user interface.30 Burning a ROM is a throw back to when physical metal fuses integral to the memory were burned
away using a high current pulse, permanently setting the state for that memory location. The specific
mechanisms for memory is beyond the scope of this paper; See
http://www.howstuffworks.com/rom.htm/printable for more details.
ECE4551 Page 16 of 31 03/12/2009 03:10:58
http://en.wikipedia.org/wiki/Mnemonichttp://www.howstuffworks.com/rom.htm/printablehttp://en.wikipedia.org/wiki/Mnemonichttp://www.howstuffworks.com/rom.htm/printable8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
17/31
Figure 6 Simple Tool Chain, Assembly to Bits in Computer
Temporary storage: Data Memory.
So far we have explored how the control unit fetches an instruction from program
memory and decodes it to perform some task. What is missing is, where does the
control unit store temporary data? (Remember register a?) Not everything is known
when the program is compiled into machine language, for example, if the embeddedsystem was a thermostat, or an alarm controller, where does the computer store the
current temperature? Where does the computer store what doors are closed? This is the
job for Data Memory, a read/write volatile memory that the control unit can use to save
intermediate results, or just about anything that can and does change with time. In thestrictest sense, the program counter is an example of this type of memory; when the
power goes away, the data is lost. Other modules rely on this kind of memory; the
ALU uses temporary storage as scratch pad memory to hold intermediate results. Thecontrol unit relies on temporary data storage to keep track of the current state of the
computer. Data memory is an integral part of the computer architecture, and it is
necessary to understand how it interacts with the other modules we have seen so far.
Data
MemoryProgramMemory PC
ALU
I/OPort
CU
Buses
As you can see from the illustration above, most computer architectures have at leastsome sort of program memory, data memory, program counter and control unit. We are
almost ready to start looking at particular embedded processors. It is first necessary to
understand the relationship between the program memory, data memory and dedicated
volatile memory elements or registers.
ECE4551 Page 17 of 31 03/12/2009 03:10:58
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
18/31
Before continuing, lets review the concept of a register. It is nothing more or less
than an ordered set of bit(s), not unlike a memory location, that can hold a bit
pattern. A register may be 1 or 256 bits wide. It might hold a single bit from theoverflow of a binary addition, or the four bit value that points to a portion of memory
called a file. Whatever it holds, and however wide it is, it is a volatile memory
element, usually controlled (set, reset, read) by the control unit.
Some embedded processors for example, have a special dedicated register for
everything. This is where the uniqueness of a processor manifests itself; how thefunctions and data are organized physically on the processor. Registers may be general-
purpose scratch pad to hold any value (say the intermediate result of a logic operation)
or a special function register to hold a binary value that corresponds to the artefacts of
the last instruction31.
As embedded processors contain more and more functionality (timers, serial ports, A/D
converters) it is necessary to have volatile memory elements to keep track of all their
settings and status. In some processors, there are over two-dozen separate specialfunction registers just for this purpose. Here is where the similarities end and
individual architectures begin to diverge from the generic model. How does thecomputer organize the needs of program memory, data memory and special function
registers? Keep that question close to mind as we continue our exploration of the
computers last generic element, the ALU.
The Arithmetic Logic Unit: Workhorse of the Embedded Processor.
Up to this point we have hinted that there is a module that is used to do math and logic
operations managed by the control unit. Indeed, inside every embedded processor is an
ALU that can take two or more operands, perform an operation on them (addition,subtraction, multiplication, AND/OR/NOT) storing the result in data memory or a
purpose built register, updating the artefacts associated with that operation; negative,
zero, overflow, underflow, divide by zero)
Figure 7 Simple ALU
31 An artefact is a processor state that may change when an operation is performed. For example, if the
result of a math operation is negative, a bit or flag could be set as an artefact of that operation. There
are various flags or artefacts that are updated after each instruction by the control unit.
ECE4551 Page 18 of 31 03/12/2009 03:10:59
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
19/31
The ALU itself is a special purpose finite state machine designed to take control signals
and operands and perform the function called out by the control unit. Its featuresinclude the operations it can perform, the size of the operations it can handle and the
size of the results it can produce.
Consider the addition of two 8-bit numbers. The sum could be larger than an 8-bit
number can handle so the ALU must be able to accommodate that possibility. The
ALU must also be able to provide some sort of floating point functionality (or at leastmechanisms to support such operations), usually incorporated as partially hardware,
partially custom math libraries for that processor.32 The full complement of math
operations take up a lot of processor real estate and compromises need to be made to
get the maximum functionality in the minimum space with the best performance possible. It is possible to multiply or divide any two numbers using successive
additions (or subtractions) but that would take a long time. Time then would be the
compromise over the real estate33 needed to have a hardware multiplier integral to the
ALU.
Memory Organization and Program Memory:
However youslice it34, every embedded controller operates on two types of memory;
the memory that stores instructions (program memory) and the memory thattemporarily hold values (data memory). In the previous real chip examples, we can see
the PIC18 and AT89 both have separate, distinct memory areas, with separate data and
address lines as opposed to the R8C architecture, where the program and data memoryis logically one monolithic block, with one common address and one common data bus.
It is not the intention of this paper to compare and contrast the ramifications of this
level of architectural choices35
; we will however explore what they are and how theyare used.
Recall that it is the control unit that calculates the address for the next instruction tofetch. In the case of the PIC18, this is an address that can accommodate a maximum of
8192 memory locations. As a review, how many bits are needed (the minimum
32 It is mathematically possible to do any math operation just using two bits and a lot of RAM, it would
just tale a lot of instructions to orchestrate even a simple 16 bit integer addition. On the other hand, you
could dedicate three separate registers, two 16 bit and one 32 bit register to hold the operands and the
sum respectfully. Controller architecture is a balance of what space you physically have and what
operations can be done in software.33 It physically takes space in silicon to do anything, store a bit or make a control unit. Each processor
designer is faced with the problem of trying to find space for everything the marketing people want inthe new version. Compromises are made in performance or size (and power) when design decisions
(architectures) are made.34 Off-hand homage to the so-called bit-slicers of old.35 As is often the case, the choice of one particular architecture over another has religious implications,
with each ideology having its priests, each believing in their brand of the truth. More often than not,
the choice of processor is cost, or number of pins, or what chip did we use last? or how much do the
tools cost? and not some idealized architecture philosophy.
ECE4551 Page 19 of 31 03/12/2009 03:10:59
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
20/31
number) to uniquely address every location in the program memory?36 As it turns out,
the internal modules in the PIC18 family of processors are almost identical, and while
the PIC18F1330 may have a program memory depth of 8192 locations, each 16 bitswide, the flagship of the PIC18 family, the PIC18F8722 has up to 128 kBytes (64k
words) of program memory.
This common control structure is not by accident. It is to insure instructions written for
the least capable member of the family will work on the most capable member. In fact,
the address latch for the program memory is a full 20-bits wide, allowing up to 220 orone million memory locations (can we expect future versions of the PIC18 family with
more program memory?). For our PIC18 however, we have more than enough address
bits to accommodate the 8192 locations (16 kBytes) of instructions and constants 37.
As we mentioned earlier, the Program counter is much more than a simple register, in
this example it is almost a mini ALU in the operations it can perform to assemble the
correct address for the next instruction. Keeping with the program memory theme, look
at the Atmel 89C2051 memory. Although difficult to read from the simple blockdiagram, the literature specifies the program memory to be byte wide, with 16 address
bits for a maximum of 65532 locations, (64 k, 1k = 1024 locations) With our variant,the 89C2015 has 2k of program memory or 2048 locations, each one byte wide.
Consider this simple fact for a moment. Each program memory access of a
PIC18F1330 processor returns 16 bits. Each program memory access of the 89C2015returns half as much data. If both processors are running at the same speed, which one
moves more data per unit time? We cant answer that right now, but keep it in mind
when we compare performance between our three embedded processors.
To complete the program memory tour, the ARM-7 processor addresses 32-bit wide
program memory, unlike the PIC18, which uses a 16-bit address. Very similarprocessors, but different approaches to how the program instructions are addressed.
36 2n = 8192, log2(8192) = n = 13 bits
37 We include and constants on purpose when describing program memory as it is the ideal place to
store values that are known when the code is assembled, and would overwhelm the limited data memory
space. This convenience however comes at a cost, as we will soon see.
ECE4551 Page 20 of 31 03/12/2009 03:10:59
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
21/31
Figure 8: Microchip PIC18F8722
Memory Organization and Data Memory:
Previously we explored the anatomy of controllers, describing the blocks or modules
that made up the controller. These blocks are indeed islands of functionality, for themost part independent of each other. That means while the control unit is fetching
the next instruction, the ALU can be logically ORing two values, sending the results
into a register, and the timer module (described later) can be counting down; all at the
same time, all on the same chip of silicon.
This ability begs to be used in a parallel fashion, and not simply in the serial follow
the recipe concept of a computer program. The tricky part is, how does the computerkeep track of all these independent operations? More on that later.
Going back to our real life examples, consider the monolithic memory architecture ofthe ARM-7. To fetch an instruction, the control unit updates the program counter, then
ECE4551 Page 21 of 31 03/12/2009 03:10:59
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
22/31
places that address on the address bus. Some time later it reads the instruction from the
ROM then executes it. If there are data values to be stored, it places the destination
address on the same memory bus, and some time later writes it into the RAM.
In this simple example, there is some dead time, that time between subsequent
operations that could have been used doing things in parallel.
Figure 9 "Princeton" (Von Neumann's) Architecture
Figure 10 "Harvard" Architecture
Look at the Figure 4, on the PIC18 architecture; the program memory has its own
address and data busses, distinct and separate from the data memory. This means thecontrol unit can be building the next instruction to be fetched while it is decoding thecurrent instruction, and at the same time writing the results of the last ALU operation
into the data memory. The level of parallelism is common to architectures that have
separate program and data memory spaces. The figures above illustrate the basicmemory architecture differences, the main idea to keep in mind is in the Princeton
architecture, there is one Program/Data memory address and data bus, tying the
program and data memory logically together. In the Harvard architecture (hmmm, I
ECE4551 Page 22 of 31 03/12/2009 03:10:59
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
23/31
wonder where they were invented) the program and data memory are separate and
distinct, each with its own address and data busses.
Is one architecture better than another? It is a difficult question to answer without
knowing the actual application. In a simple case when the computer is reading a
thermostat, checking it against a pre-set temperature and deciding if the heat should beturned on, it doesnt take many cycles to accomplish, and a monolithic memory may be
just fine. If more performance were required, a faster processor could be used. If
however, many functions were being handled at the same time, like reading the buttonson a USB Nintendo controller, processing 8 buttons as well as the angles of the
left/right joysticks and sending the information back to the Play station in time to keep
from being sliced in half by the Org you are battling, it may be better to have a
microcontroller that can do as much as possible at the same time.
The organization or architecture of the computer includes how the various volatile
memory elements: registers: are organized and controlled. Some architectures use
individual, separate registers for everything: see Figure 11. An alternative to havingseparate physical registers is the model used by the RC8 and PIC18, the registers for
the whole computer are contained in data memory as a set of registers, addressed likeany other memory element and often organized as files or blocks of memory.
This significantly reduces the complexity of the control unit while maintaining the
flexibility of added functionality. For example, consider two processors from the samefamily; the PIC18F8722 and the PIC18F1330. Using the same register file,
architecture, it is possible to accommodate the five timers and 12 A/D modules using
the same control unit that the PIC18F1330 uses to maintain 2 timers and 5 A/Dmodules.
ECE4551 Page 23 of 31 03/12/2009 03:10:59
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
24/31
Figure 11 PPC 405 Core Programmers model copyright 2006, IBM
The other fiddly bits: Input Output and additionalprocessor functionality.
To round out the inner workings of the data memory, it is important to consider how
our computer actually moves data into and out of the processor chip. In most
architecture, an I/O pin is a simple RAM element, usually organized as a register ofindividually accessible bits that is on the data memory bus. To perform an output
operation, the control unit asserts the direction control so the pin will be electrically an
output pin. Figure 12 illustrates one bit of the I/O port for the PIC18. The simple portpicture illustrated in Figure 4 glosses over the real work the control unit does in
orchestrating an I/O operation. This is further complicated when the I/O pin can be an
analogue input pin as well as a digital I/O pin.
ECE4551 Page 24 of 31 03/12/2009 03:10:59
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
25/31
Figure 12 Generic PIC18 I/O pin Figure 13 Digital & Analogue I/O pin
In addition to input and output functionality, most microcontrollers these days havesome sort of built in timer capability. These timer modules operate independently once
set and started, and provide a much-needed function to count events or time to be used
in an embedded application. Again, there is a trade off between chip real-estate andsoftware overhead. Any timer function can be implemented in software using loops and
tests, at the cost of having to execute in linear time with the program. No matter howfast a program is it can only be doing one thing at a time. Implementing a timer in
hardware however, relieves the burden of maintaining a count.
ECE4551 Page 25 of 31 03/12/2009 03:10:59
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
26/31
Real world timers do more than just count; here the ARM-7 has two, 32-bit timers38,
with selectable input count frequencies (multiples of the master clock) as well as input
from I/O pins (to count events happening on a pin). It is a simple matter to preload a16-bit timer39 and count from 0x0000 to 0xffff. If the master 32 MHz clock was used as
the count input then the timer would roll over or overflow 40 every 2 or so milliseconds.
If the fc32 divider is used, then it would overflow every 64 milliseconds.
Timers, I/O ports, A/D modules, indeed most if not all computer special function
modules need registers to hold configuration parameters (in the case of the timer it maybe the flag bit that controls if the timer restarts after an overflow). As we saw in the
PIC18, these registers are part of the computer data memory area. In the case of the
PPC405, they are individual registers, peppered all over the die. Figure 15 illustrates
what is arguably the quintessence of microcontroller technology to date, the PhilipsLPC2114.
Using what we have learned, lets examine this animal closely. The key to evaluating
a microcontroller is in answering fundamental structure (anatomy) questions first, andthen if the structure is appropriate to the task, then take a closer look at the
functionality (physiology) of the beast.
First the bones; how is the Program counter, Control Unit and ALU arranged? How
does the chip get/put data to the outside world? What other goodies are available
(timers, USARTS, A/D)?
At first blush, the block diagram of the LPC2114 seems to be missing many essential
elements. They are there; it just takes a little digging. This chip represents a new trendin embedded processors, that is a common core of functionality, surrounded by the I/O
that makes that particular processor special. Here the core is illustrated in the block
named ARM7-DTMI. The actual guts of the core is illustrated in Figure 14. TheProgram Counter is implemented by two elements, the Address Incrementer and the
Address Register, which makes sense considering how the program counter normally
functions. The other basic processor elements are present, The ALU, the Control Unit,and a data register. Some interesting additions are a 32 x 8 hardware multiplier, and a
barrel shifter interfaced with the ALU. These could help floating point operations by
speeding up in hardware common math tasks. (Remember the real-estate
performance trade-off discussion? Clearly, this chip is built with the building blocks ofspeed.) Another interesting element is the 31 x 32 bit register bank, a good place to
hold intermediate results, or condition flags; at this point in the investigation it is not
clear, but should not come as a surprise when looking at the physiology of this beastto discover there are such register locations in the bank.
38 Philips LPC2124 Datasheet.39 Timers are characterized by how many bits are used to count with.40 A timer typically starts at some preloaded value and count up, one bit at a time until the maximum
value is reached. Then, depending on the mode used, a overflow flag is set, the initial value reloaded
and the cycle started all over again.
ECE4551 Page 26 of 31 03/12/2009 03:10:59
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
27/31
This cursory look at the ARM-7 core should convince you that it has the infrastructure
to be a hot, 32 bit processor. Now, lets have a look at the I/O.
Figure 14 ARM-7 Core
Looking at Figure 15, we can now see where the program and data memory live, it is a
monolithic block, with a single address and data bus. The I/O however, has their own
address/data bus, an interesting combination, and should make the I/O modulessomewhat independent of the ARM-7core.
In this initial look, it is clear this chip has been designed to accommodate almost any
I/O scenario you would encounter with an embedded processor.
ECE4551 Page 27 of 31 03/12/2009 03:10:59
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
28/31
Figure 15 ARM-7 (Philips LPC2114) Block Diagram
Summary.In this paper we have explored how any embedded processor is made up of the sameelements, and each individual element has basically the same behaviour. These
elements are mixed and matched to create the animal known as an embedded
processor, and along with the software applications in the tool chain, make up a
development environment the designer can use to solve real world problems usingembedded processors.
The heart of the embedded processor is the concept of an infinite state machine, thatI, dedicated hardware that can reconfigure functionality by following a set of
instructions. This innovation has enabled designers to move away from the directhardware manipulating bits to looking at problems from a modular perspective.
As important as the hardware advances is the tool chain used to design the control
software, the ultimate arbiter of any embedded system.
ECE4551 Page 28 of 31 03/12/2009 03:10:59
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
29/31
Embedded Processor questions to test forunderstanding:
1. What is the difference between a Finite State Machine and an embedded
processor doing the same task?
2. How do you physically get the program into the embedded processor?3. What is the first thing the embedded processor does when the power is turned
on?
4. What is the difference between an 8-bit processor and a 32-bit processor?5. What are the steps needed to translate the following text into machine code?
int a = 5;
int b = 3;
if(a > b)
a = a+b;
else
a = b;
ECE4551 Page 29 of 31 03/12/2009 03:10:59
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
30/31
Answers
1. The FSM has one task it is designed to perform; the embedded processor can be
programmed to do an infinite number of tasks.
2. The embedded processor is connected to a chip programmer, either in the
circuit (so called in circuit programming) or as individual chips. The machinecode file is used by the programmer to change the instruction memory on the
chip.3. At the beginning of time, all embedded processors are held in a reset state,
allowing all the internal circuitry a chance to start in a known state.
4. Generally, the difference is in the internal representation of data. In an 8-bit
machine, data is moved one byte at a time. In a 32-bot machine, data is moved(in the same time frame) 4 bytes at a time.
5. The text is input to a compiler that translates the Human readable code (in this
case C) The result of the compiler is assembly code, the processor specifichuman readable version. Finally, that assembly code is fed into an assembler
linker tool that produces machine readable code, specific to that individualprocessor.
ECE4551 Page 30 of 31 03/12/2009 03:10:59
8/14/2019 Embedded Systems - Comparative Anatomy and Phisiology
31/31
i Fundamentals of FPGAs: http://www.techonline.com/learning/course/210605004