Embedded Sys Unit 4

8/13/2019 Embedded Sys Unit 4

1/28

UNIT IV

HARDWARE ACCELERATES & NETWORKS

Accelerators Accelerated system design Distributed

Embedded Architecture Networks for Embedded

Systems Network based design Internet enabled

Systems.

CPUs AND ACCELERATORS

One important category of E !rocessing element"

for embedded multiprocessor is the accelerator. An

accelerator is attached to #$ buses to %uickly e&ecute

certain key functions. Accelerators can pro'ide large

performance increases for applications with

computational kernels that spend a great deal of time in a

small section of code. Accelerators can also pro'ide

critical speedups for low(latency I)O functions.

*he design of accelerated systems is one e&ample of

hardware/software

Co-designthe simultaneous design of hardware and

software to meet system

Ob+ecti'es. *hus far, we ha'e taken the computing

platform as a gi'en- by adding accelerators, we can

customie the embedded platform to better meet our

application/s demands.

As illustrated in 0igure 1.2, a #$ accelerator is

attached to the #$ bus. *he #$ is often called the host.

*he #$ talks to the accelerator through data and controlregisters in the accelerator. *hese registers allow the #$

to monitor the accelerator/s operation and to gi'e the

accelerator commands.

*he #$ and accelerator may also communicate 'ia

shared memory. If the accelerator needs to operate on a

large 'olume of data, it is usually more efficient to lea'ethe data in memory and ha'e the accelerators read and

write memory directly rather than to ha'e the #$ shuttle

data from memory to accelerator registers and back.


2/28


3/28

3oth #$s and accelerators perform computations

re%uired by the specification- at some le'el we do not care

whether the work is done on a programmable #$ or on a

hardwired unit.

*he first task in designing an accelerator is

determining that our system actually needs one. 4e ha'e

to make sure that the function we want to accelerate will

run more %uickly on our accelerator than it will by

e&ecuting as software on a #$. If our system #$ is a

small microcontroller, the race may be easily won, but

competing against a high(performance #$ is a challenge.

4e also ha'e to make sure that the accelerated function

will speed up the system. If some other operation is in fact

the bottleneck, or if mo'ing data into and out of the

accelerator is too slow, then adding the accelerator may

not be a net gain.

Once we ha'e analyed the system, we need to

design the accelerator itself. In order to ha'e identified our

need for an accelerator, we must ha'e a good

understanding of the algorithm to be accelerated, which is

often in the form of a high(le'el language program. 4e

must translate the algorithm description into a hardware

design, a considerable task in itself. 4e must also designthe interface between the accelerator core and the #$

bus. *he interface includes more than bus handshaking

logic. 0or e&ample, we ha'e to determine how the

application software on the #$ will communicate with

the accelerator and pro'ide the re%uired registers- we may

ha'e to implement shared memory synchroniationoperations- and we may ha'e to add address generation

logic to read and write large amounts of data from system

memory.


4/28

0inally, we will ha'e to design the #$(side

interface to the accelerator. *he application software will

ha'e to talk to the accelerator, pro'iding it data and telling

it what to do. 4e ha'e to somehow synchronie the

operation of the accelerator with the rest of the application

so that the accelerator knows when it has the re%uired data

and the #$ knows when it has recei'ed the desired

results.

ACCELRATED SYSTEM DESIGNVIDEO ACCELERATOR

In this section we use a 'ideo accelerator as an

e&ample of an accelerated embedded system. Digital 'ideo

is still a computationally intensi'e task, so it is well suited

to acceleration. 5otion estimation engines are used in

real(time search engines- we may want to ha'e oneattached to our personal computer to e&periment with

'ideo processing techni%ues.

ALGORITHM AND REQUIREMENTS

4e could build an accelerator for any number of

digital 'ideo algorithms. 4e will choose block motion

estimation as our e&ample here because it is 'ery

#omputation and memory intensi'e but it is relati'ely

easy to e&plain.

3lock motion estimation is used in digital 'ideo

compression algorithms so that one frame in the 'ideo can

be described in terms of the differences between it and

another frame. 3ecause ob+ects in the frame often mo'erelati'ely little, describing one frame in terms of another

greatly reduces the number of bits re%uired to describe the

'ideo.


5/28

REQUIREMENTS FOR THE SYSTEM

SPECIFICATION

*he specification for the system is relati'ely

straightforward because the algorithm is simple. 0igure

1.67 defines some classes that describe basic data types in

the system8 the motion 'ector, the macro block, and the

search area. *hese definitions are straightforward. 3ecausethe beha'ior is simple, we need to define only two classes

to describe it8 the accelerator itself and the #. *hese

classes are shown in 0igure 1.69. *he # makes its


6/28

memory accessible to the accelerator. *he accelerator

pro'ides a beha'ior compute(m' ! " that performs the

block motion estimation algorithm. 0igure 1.2: shows a

se%uence diagram that describes the operation of compute(

m' ! ". After initiating the beha'ior, the accelerator reads

the search area and macro block from the #- after

computing the motion 'ector, it returns it to the #.

ARCHITECTURE

*he accelerator will be implemented in an 0;A ona card connected to a #/s #I slot. Such accelerators can

be purchased or they can be designed from scratch. If you

design such a card from scratch, you ha'e to decide early

on whether the card will be used only for this 'ideo

accelerator or if it should be made general enough to

support other applications as well.


7/28

SYSTEM TESTING*esting 'ideo algorithms re%uires a large amount of

data. E;

images and put out pi&els in the format re%uired by your

accelerator. 4ith a little more cle'erness, the resulting

motion 'ector can be written back onto the image for a'isual confirmation of the result. If you want to be

ad'enturous and try motion estimation on 'ideo, open

source 5E; encoders and decoders are also a'ailable.

DISTRIBUTED EMBEDDED ARCHITECTURE

A distributed embedded system can be organied in

many different ways, but its basic units are the E and the

network as illustrated in 0igure 7.?. A E may be an

instruction set processor such as a DSP, CPU, or

microcontroller, as well as a nonprogrammable unit such

as theASICs used to implementPE 4. An I)O de'ice such


8/28

as PE 1 !which we call here a sensor or actuator,

depending on whether it pro'ides input or output" may

also be a E, so long as it can speak the network protocol

to communicate with other Es. *he network in this case

is a bus, but other network topologies are also possible. It

is also possible that the system can use more than one

network, such as when relati'ely independent functions

re%uire relati'ely little communication among them. 4e

often refer to the connection between Es pro'ided by thenetwork as a communication link.

*he system of Es and networks forms the hardware

platform on which the

Application runs.

*he distributed embedded system does not ha'e

memory on the bus !unless a memory unit is organied as

an I)O de'ice that speaks the network protocol". In

particular, Es does not fetch instructions o'er the

network as they do on the microprocessor bus. 4e take

ad'antage of this fact when analying network


9/28

performance@the speed at which Es can communicate

o'er the bus would be difficult if not impossible to predict

if we allowed arbitrary instruction and data fetches as we

do on microprocessor buses.

WHY DISTRIBUTED

3uilding an embedded system with se'eral Es

talking o'er a network is definitely more complicated than

using a single large microprocessor to perform the same

tasks. So why would anyone build a distributed embeddedsystem All the reasons for designing accelerator systems

also apply to distributed embedded systems, and se'eral

more reasons are uni%ue to distributed systems.

In some cases, distributed systems are necessary

because the de'ices that the Es communicates with are

physically separated. If the deadlines for processing the

data are short, it may be more cost(effecti'e to put the Es

where the data are located rather than build a higher(speed

network to carry the data to a distant, fast E.

An important ad'antage of a distributed system with

se'eral #$s is that one part of the system can be used to

help diagnose problems in another part. 4hether you are

debugging a prototype or diagnosing a problem in the

field, isolating the error to one part of the system can be

difficult when e'erything is done on a single #$. If you

ha'e se'eral #$s in the system, you can use one to

generate inputs for another and to watch its output.

NETWORK ABSTRACTIONS

Networks are comple& systems. Ideally, they

pro'ide high(le'el ser'ices while hiding many of the

details of data transmission from the other components in

the system. In order to help understand !and design"

networks, the International Standards Organiation has

de'eloped a se'en(layer model for networks known as

Open Systems Interconnection !OSI" models.

$nderstanding the OSI layers will help us to understand

the details of real networks.


10/28


11/28

HARDWARE AND SOFTWARE ARCHITECTURES

Distributed embedded systems can be organied in

many different ways depending upon the needs of the

application and cost constraints. One good way to

understand possible architectures is to consider the

different types of interconnection networks that can be

used.

A point-to-point link establishes a connection

between e&actly two Es. oint to(point links are simple

to design precisely because they deal with only two

components. 4e do not ha'e to worry about other Es

interfering with communication on the link.

0igure 7.2 shows a simple e&ample of a distributed

embedded system built from oint(to(point links. *he

input signal is sampled by the input de'ice and passed to

the first digital filter, F?, o'er a point(to(point link. *heresults of that filter are sent through a second point(to(

point link to filter F6. *he results in turn are sent to the

output de'ice o'er a third point(to(point link. A digital

filtering system re%uires that its outputs arri'e at strict

inter'als, which means that the filters must process their

inputs in a timely fashion. $sing point(to(point

connections allows both F? and F6 to recei'e a new

sample and send a new output at the same time without

worrying about collisions on the communications network.


12/28

It is possible to build a full-duplex, point(to(point

connection that can be used for simultaneous

communication in both directions between the two Es. !A

half duple& connection allows for only one(way

communication."

A bus is a more general form of network since it

allows multiple de'ices to be connected to it.


13/28


14/28

A bus transaction comprised a series of ?(byte

transmissions and an address followed by one or more

data bytes. I6# encourages a data(push programmingstyle. 4hen a master wants to write a sla'e, it transmits

the sla'e/s address followed by the data. Since a sla'e

cannot initiate a transfer, the master must send a read

re%uest with the sla'e/s address and let the sla'e transmit

the data. *herefore, an address transmission includes the

1(bit address and ? bit for data direction8 : for writingfrom the master to the sla'e and ? for reading from the

sla'e to the master. !*his e&plains the 1(bit addresses on

the bus." *he format of an address transmission is shown

in 0igure 7.9.

A bus transaction is initiated by a start signal and

completed with an end signal as follows8

A start is signaled by lea'ing the S#< high and sending

a ? to : transition on

SD


15/28

high while the data line assumes its proper 'alue of : or

?.An acknowledgment is sent at the end of e'ery 7(bit

transmission, whether it is an address or data. 0oracknowledgment, the transmitter does not pull down the

SD


16/28

ETHERNET

Ethernet is 'ery widely used as a local area network

for general(purpose computing. 3ecause of its ubi%uityand the low cost of Ethernet interfaces, it has seen

significant use as a network for embedded computing.

Ethernet is particularly useful when #s are used as

platforms, making it possible to use standard components,

and when the network does not ha'e to meet rigorous real(

time re%uirements.*he physical organiation of an Ethernet is 'ery

simple, as shown in 0igure 7.?C.*he network is a bus with

a single signal path- the Ethernet standard allows for

se'eral different implementations such as twisted pair and

coa&ial cable.


17/28

$nlike the I6# bus, nodes on the Ethernet are not

synchronied@they can send their bits at any time. I6#

relies on the fact that a collision can be detected and

%uashed within a single bit time thanks to synchroniation.

3ut since Ethernet nodes are not synchronied, if two

nodes decide to transmit at the same time, the message

will be ruined. *he Ethernet arbitration scheme is known

as Carrier Sense !ultiple "ccess with Collision

Detection (CS!"/CD. *he algorithm is outlined in

0igure 7.?G. A node that has a message waits for the bus to

become silent and then starts transmitting. It

simultaneously listens, and if it hears another transmission

that interferes with its transmission, it stops transmitting

and waits to retransmit. *he waiting time is random, butweighted by an e&ponential function of the number of

times the message has been aborted. 0igure 7.?H shows the

e&ponential back off function both before and after it is

modulated by the random wait time. Since a message may

be interfered with se'eral times before it is successfully

transmitted, the exponential back off techni#uehelps toensure that the network does not become o'erloaded at

high demand factors. *he random factor in the wait time

minimies the chance that two messages will repeatedly

interfere with each other.


18/28

*he ma&imum length of an Ethernet is determined

by the nodes/ ability to detect collisions. *he worst caseoccurs when two nodes at opposite ends of the bus are

transmitting simultaneously. 0or the collision to be

detected by both nodes, each node/s signal must be able to


19/28

tra'el to the opposite end of the bus so that it can be heard

by the other node. In practice, Ethernets can run up to

se'eral hundred meters.0igure 7.?1 shows the basic format of an Ethernet

packet. It pro'ides addresses of both the destination and

the source. It also pro'ides for a 'ariable(length data

payload.

NETWORK BASED DESIGN

Designing a distributed embedded system around a

network in'ol'es some of the same design tasks we faced

in accelerated systems. 4e must schedule computations in

time and allocate them to Es. Scheduling and allocation

of communication are important additional design tasks

re%uired for many distributed networks. 5any embedded

networks are designed for low cost and therefore do not

pro'ide e&cessi'ely high communication speed. If we arenot careful, the network can become the bottleneck in

system design. In this section we concentrate on design

tasks uni%ue to network(based distributed embedded

systems.

4e know how to analye the e&ecution time of

programs and systems of processes on single #$s, but toanalye the performance of networks we must know how

to determine the delay incurred by transmitting messages.


20/28

4here tx is the transmitter(side o'erhead, tn is the

network transmission time, and tr is the recei'er(side

o'erhead. In I6#, tx and tr are negligible relati'e to tn, asillustrated by E&ample 7.6.


21/28

If messages can interfere with each other in the

network, analying communication delay becomesdifficult. In general, because we must wait for the network

to become a'ailable and then transmit the message, we can

write the mess$.edel$! as

ty tdJtm !7.6"

4here td is the network a$ailability delay incurred

waiting for the network to3ecome a'ailable. *he main problem, therefore, is

calculating td. *hat 'alue depends on the type of

arbitration used in the network.

If the network uses fi&ed(priority arbitration, the

network a'ailability delay is

$nbounded for all but the highest(priority de'ice. Since

the highest(priority de'ice always gets the network first,

unless there is an application(specific limit on how long it

will transmit before relin%uishing the network, it can keep

blocking the other de'ices indefinitely.

If the network uses fair arbitration, the network

a'ailability delay is bounded.

In the case of round(robin arbitration, if there are Nde'ices, then the worst case

Network a'ailability delay is N (tx +tarb), where tarb is

the delay incurred for arbitration. tarb is usually small

compared to transmission time.

E'en when round(robin arbitration is used to bind

the network a'ailability delay, the waiting time can be'ery long. If we add acknowledgment and data corruption

into the analysis, figuring network delay is more difficult.

Assuming that errors are random, we cannot predict a

worst(case delay since e'ery packet may contain an error.

4e can, howe'er, compute the probability that a packet

will be delayed for more than a gi'en amount of time.

Arbitration on networks is a form of prioritiation.

In a rate(monotonic communication scheme, the task with

the shortest deadline should be assigned the highest

priority in the network.


22/28

Our process scheduling model assumed that we

could interrupt processes at any point. 3ut network

communications are organied into packets. In mostnetworks we cannot interrupt a packet transmission to take

o'er the network for a higher priority packet. As a result,

networks e&hibit priority in'ersion like that introduced in

#hapter H.4hen a low(priority message is on the network,

the network is effecti'ely allocated to that low(priority

message, allowing it to block higher(priority messages.

*his cannot cause deadlock since each message has a

bounded length, but it can slow down critical

communications. *he only solution is to analye network

beha'ior to determine whether priority in'ersion causes

some messages to be delayed for too long.

Of course, a round(robin arbitrated network puts all

communications at the same priority. *his does not

eliminate the priority in'ersion problem because processes

still ha'e priorities.

*hus far we ha'e assumed a single-hop network: A

message is recei'ed at its intended destination directly

from the source, without going through any other networknode. It is possible to build multi-hop networks in which

messages are routed through network nodes to get to their

destinations. !$sing a multistage network does not

necessarily mean using a multi(hop network@the stages in

a multistage network are generally much smaller than the

network Es." 0igure 7.?7 shows an e&ample of a multi(

hop communication. *he hardware platform has two

separate networks !perhaps so that communications

between subsets of the Es do not interfere", but there is

no direct path from ? to G.*he message is therefore

routed through2, which reads it from one network and

sends it on to the other one. Analying delays through

multi(hop systems is 'ery difficult. 0or e&ample, the time

that the message is held at 2 depends on both the

computational load of 2 and the other messages that it

must handle.


23/28


24/28

*he I'e*'e P*oo#ol 0IP1 is the fundamental

protocol on the %nternet. It pro'ides connectionless,

packet(based communication. Industrial automation haslong been a good application area for Internet(based

embedded systems. Information appliances that use the

Internet are rapidly becoming another use of I in

embedded computing.

Internet protocol is not defined o'er a particular

physical implementation@it is an internetworking

standard. Internet packets are assumed to be carried by

some other network, such as an Ethernet. In general, an

Internet packet will tra'el o'er se'eral different networks

from source to destination. *he I allows data to flow

seamlessly through these networks from one end user to

another. *he relationship between I and indi'idual

networks is illustrated in 0igure 7.?9. I works at the

network layer. 4hen node A wants to send data to node 3,

the application/s data pass through se'eral layers of the

protocol stack to send to the I. I creates packets for

routing to the destination, which are then sent to the d!t!

lin" and#$ysic!l layers. A node that transmits data among

different types of networks is known as a router. *herouter/s functionality must go up to the I layer, but since

it is not running applications- it does not need to go to

higher le'els of the OSI model. In general, a packet may

go through se'eral routers to get to its destination. At the

destination, the I layer pro'ides data to the transport

layer and ultimately the recei'ing application. As the data

pass through se'eral layers of the protocol stack, the I

packet data are encapsulated in packet formats appropriate

to each layer.


25/28

*he basic format of an I packet is shown in 0igure 7.6:.

*he header and data payload are both of 'ariable length.

*he ma&imum total length of the header and data payload

is HG,G2G bytes.

An Internet address is a number !26 bits in early

'ersions of I, ?67 bits in I'H. *he I address is typically

written in the form &&&.&&.&&.&&. *he names by which

users and applications typically refer to Internet nodes,

such as foo.ba.com, are translated into I addresses 'iacalls to a Domain &ame Ser$er, one of the higher(le'el

ser'ices built on top of I.

*he fact that I works at the network layer tells us

that it does not guarantee that a packet is deli'ered to its

destination. 0urthermore, packets that do arri'e may come

out of order. *his is referred to as best-effort routing.

Since routes for data may change %uickly with subse%uent

packets being routed along 'ery different paths with

different delays, real(time performance of I can be hard

to predict. 4hen a small network is contained totally

within the embedded system, performance can be

e'aluated through simulation or other methods because the

possible inputs are limited. Since the performance of the

Internet may depend on worldwide usage patterns, its real(

time performance is inherently harder to predict.


26/28

*he Internet also pro'ides higher(le'el ser'ices

built on top of I. *he 'ransmission Control rotocol

('C is one such e&ample. It pro'ides a connectionoriented ser'ice that ensures that data arri'e in the

appropriate order, and it usesan acknowledgment protocol

to ensure that packets arri'e. 3ecause many higher le'el

ser'ices are built on top of *#, the basic protocol is often

referred to as*#)I.

0igure 7.6? shows the relationships between I and

higher(le'el Internet ser'ices. $sing I as the foundation,

*# is used to pro'ide )ile 'ransport rotocolfor batch

file transfers, *ypertext 'ransport rotocol (*'' for


27/28


28/28

embedded system. *his nonreal(time interaction can be

used to monitor the system, set its configuration, and

interact with it.

INTERNET SECURITY

#onnecting an embedded system to the Internet

opens up the system to the same sorts of attacks that are

made on #s and ser'ers e'ery day. Bowe'er, attacks on

embedded systems can destroy not only information but

also the physical de'ices connected to the embedded

processor. Dung listed se'eral e&ample attacks that

caused significant damage8

A work infected the computer network of the #SF

railway, causing all trains

in the 4ashington, D# area to be shut down for a half day.

A worm disabled the computer(based safety monitoring

system at the Da'is(

3esse nuclear power plant in Ohio.

A former consultant to a waste water plant in Australia

used its computers to

Kelease one million liters of sewage into the areawaterways.

*hey point out that security can be enforced at all

le'els of the network stack. ;eneral network security

principles can be applied to Internet(enabled embedded

systems- 'arious industrial standards also deal with

measures specific to industrial networks.

Documents

Embedded Sys Unit 4