Upload
abubuker-sidique
View
226
Download
0
Embed Size (px)
Citation preview
8/13/2019 Embedded Sys Unit 4
1/28
UNIT IV
HARDWARE ACCELERATES & NETWORKS
Accelerators Accelerated system design Distributed
Embedded Architecture Networks for Embedded
Systems Network based design Internet enabled
Systems.
CPUs AND ACCELERATORS
One important category of E !rocessing element"
for embedded multiprocessor is the accelerator. An
accelerator is attached to #$ buses to %uickly e&ecute
certain key functions. Accelerators can pro'ide large
performance increases for applications with
computational kernels that spend a great deal of time in a
small section of code. Accelerators can also pro'ide
critical speedups for low(latency I)O functions.
*he design of accelerated systems is one e&le of
hardware/software
Co-designthe simultaneous design of hardware and
software to meet system
Ob+ecti'es. *hus far, we ha'e taken the computing
platform as a gi'en- by adding accelerators, we can
customie the embedded platform to better meet our
application/s demands.
As illustrated in 0igure 1.2, a #$ accelerator is
attached to the #$ bus. *he #$ is often called the host.
*he #$ talks to the accelerator through data and controlregisters in the accelerator. *hese registers allow the #$
to monitor the accelerator/s operation and to gi'e the
accelerator commands.
*he #$ and accelerator may also communicate 'ia
shared memory. If the accelerator needs to operate on a
large 'olume of data, it is usually more efficient to lea'ethe data in memory and ha'e the accelerators read and
write memory directly rather than to ha'e the #$ shuttle
data from memory to accelerator registers and back.
8/13/2019 Embedded Sys Unit 4
2/28
8/13/2019 Embedded Sys Unit 4
3/28
3oth #$s and accelerators perform computations
re%uired by the specification- at some le'el we do not care
whether the work is done on a programmable #$ or on a
hardwired unit.
*he first task in designing an accelerator is
determining that our system actually needs one. 4e ha'e
to make sure that the function we want to accelerate will
run more %uickly on our accelerator than it will by
e&ecuting as software on a #$. If our system #$ is a
small microcontroller, the race may be easily won, but
competing against a high(performance #$ is a challenge.
4e also ha'e to make sure that the accelerated function
will speed up the system. If some other operation is in fact
the bottleneck, or if mo'ing data into and out of the
accelerator is too slow, then adding the accelerator may
not be a net gain.
Once we ha'e analyed the system, we need to
design the accelerator itself. In order to ha'e identified our
need for an accelerator, we must ha'e a good
understanding of the algorithm to be accelerated, which is
often in the form of a high(le'el language program. 4e
must translate the algorithm description into a hardware
design, a considerable task in itself. 4e must also designthe interface between the accelerator core and the #$
bus. *he interface includes more than bus handshaking
logic. 0or e&le, we ha'e to determine how the
application software on the #$ will communicate with
the accelerator and pro'ide the re%uired registers- we may
ha'e to implement shared memory synchroniationoperations- and we may ha'e to add address generation
logic to read and write large amounts of data from system
memory.
8/13/2019 Embedded Sys Unit 4
4/28
0inally, we will ha'e to design the #$(side
interface to the accelerator. *he application software will
ha'e to talk to the accelerator, pro'iding it data and telling
it what to do. 4e ha'e to somehow synchronie the
operation of the accelerator with the rest of the application
so that the accelerator knows when it has the re%uired data
and the #$ knows when it has recei'ed the desired
results.
ACCELRATED SYSTEM DESIGNVIDEO ACCELERATOR
In this section we use a 'ideo accelerator as an
e&le of an accelerated embedded system. Digital 'ideo
is still a computationally intensi'e task, so it is well suited
to acceleration. 5otion estimation engines are used in
real(time search engines- we may want to ha'e oneattached to our personal computer to e&periment with
'ideo processing techni%ues.
ALGORITHM AND REQUIREMENTS
4e could build an accelerator for any number of
digital 'ideo algorithms. 4e will choose block motion
estimation as our e&le here because it is 'ery
#omputation and memory intensi'e but it is relati'ely
easy to e&plain.
3lock motion estimation is used in digital 'ideo
compression algorithms so that one frame in the 'ideo can
be described in terms of the differences between it and
another frame. 3ecause ob+ects in the frame often mo'erelati'ely little, describing one frame in terms of another
greatly reduces the number of bits re%uired to describe the
'ideo.
8/13/2019 Embedded Sys Unit 4
5/28
REQUIREMENTS FOR THE SYSTEM
SPECIFICATION
*he specification for the system is relati'ely
straightforward because the algorithm is simple. 0igure
1.67 defines some classes that describe basic data types in
the system8 the motion 'ector, the macro block, and the
search area. *hese definitions are straightforward. 3ecausethe beha'ior is simple, we need to define only two classes
to describe it8 the accelerator itself and the #. *hese
classes are shown in 0igure 1.69. *he # makes its
8/13/2019 Embedded Sys Unit 4
6/28
memory accessible to the accelerator. *he accelerator
pro'ides a beha'ior compute(m' ! " that performs the
block motion estimation algorithm. 0igure 1.2: shows a
se%uence diagram that describes the operation of compute(
m' ! ". After initiating the beha'ior, the accelerator reads
the search area and macro block from the #- after
computing the motion 'ector, it returns it to the #.
ARCHITECTURE
*he accelerator will be implemented in an 0;A ona card connected to a #/s #I slot. Such accelerators can
be purchased or they can be designed from scratch. If you
design such a card from scratch, you ha'e to decide early
on whether the card will be used only for this 'ideo
accelerator or if it should be made general enough to
support other applications as well.
8/13/2019 Embedded Sys Unit 4
7/28
SYSTEM TESTING*esting 'ideo algorithms re%uires a large amount of
data. E;
images and put out pi&els in the format re%uired by your
accelerator. 4ith a little more cle'erness, the resulting
motion 'ector can be written back onto the image for a'isual confirmation of the result. If you want to be
ad'enturous and try motion estimation on 'ideo, open
source 5E; encoders and decoders are also a'ailable.
DISTRIBUTED EMBEDDED ARCHITECTURE
A distributed embedded system can be organied in
many different ways, but its basic units are the E and the
network as illustrated in 0igure 7.?. A E may be an
instruction set processor such as a DSP, CPU, or
microcontroller, as well as a nonprogrammable unit such
as theASICs used to implementPE 4. An I)O de'ice such
8/13/2019 Embedded Sys Unit 4
8/28
as PE 1 !which we call here a sensor or actuator,
depending on whether it pro'ides input or output" may
also be a E, so long as it can speak the network protocol
to communicate with other Es. *he network in this case
is a bus, but other network topologies are also possible. It
is also possible that the system can use more than one
network, such as when relati'ely independent functions
re%uire relati'ely little communication among them. 4e
often refer to the connection between Es pro'ided by thenetwork as a communication link.
*he system of Es and networks forms the hardware
platform on which the
Application runs.
*he distributed embedded system does not ha'e
memory on the bus !unless a memory unit is organied as
an I)O de'ice that speaks the network protocol". In
particular, Es does not fetch instructions o'er the
network as they do on the microprocessor bus. 4e take
ad'antage of this fact when analying network
8/13/2019 Embedded Sys Unit 4
9/28
performance@the speed at which Es can communicate
o'er the bus would be difficult if not impossible to predict
if we allowed arbitrary instruction and data fetches as we
do on microprocessor buses.
WHY DISTRIBUTED
3uilding an embedded system with se'eral Es
talking o'er a network is definitely more complicated than
using a single large microprocessor to perform the same
tasks. So why would anyone build a distributed embeddedsystem All the reasons for designing accelerator systems
also apply to distributed embedded systems, and se'eral
more reasons are uni%ue to distributed systems.
In some cases, distributed systems are necessary
because the de'ices that the Es communicates with are
physically separated. If the deadlines for processing the
data are short, it may be more cost(effecti'e to put the Es
where the data are located rather than build a higher(speed
network to carry the data to a distant, fast E.
An important ad'antage of a distributed system with
se'eral #$s is that one part of the system can be used to
help diagnose problems in another part. 4hether you are
debugging a prototype or diagnosing a problem in the
field, isolating the error to one part of the system can be
difficult when e'erything is done on a single #$. If you
ha'e se'eral #$s in the system, you can use one to
generate inputs for another and to watch its output.
NETWORK ABSTRACTIONS
Networks are comple& systems. Ideally, they
pro'ide high(le'el ser'ices while hiding many of the
details of data transmission from the other components in
the system. In order to help understand !and design"
networks, the International Standards Organiation has
de'eloped a se'en(layer model for networks known as
Open Systems Interconnection !OSI" models.
$nderstanding the OSI layers will help us to understand
the details of real networks.
8/13/2019 Embedded Sys Unit 4
10/28
8/13/2019 Embedded Sys Unit 4
11/28
HARDWARE AND SOFTWARE ARCHITECTURES
Distributed embedded systems can be organied in
many different ways depending upon the needs of the
application and cost constraints. One good way to
understand possible architectures is to consider the
different types of interconnection networks that can be
used.
A point-to-point link establishes a connection
between e&actly two Es. oint to(point links are simple
to design precisely because they deal with only two
components. 4e do not ha'e to worry about other Es
interfering with communication on the link.
0igure 7.2 shows a simple e&le of a distributed
embedded system built from oint(to(point links. *he
input signal is sampled by the input de'ice and passed to
the first digital filter, F?, o'er a point(to(point link. *heresults of that filter are sent through a second point(to(
point link to filter F6. *he results in turn are sent to the
output de'ice o'er a third point(to(point link. A digital
filtering system re%uires that its outputs arri'e at strict
inter'als, which means that the filters must process their
inputs in a timely fashion. $sing point(to(point
connections allows both F? and F6 to recei'e a new
sample and send a new output at the same time without
worrying about collisions on the communications network.
8/13/2019 Embedded Sys Unit 4
12/28
It is possible to build a full-duplex, point(to(point
connection that can be used for simultaneous
communication in both directions between the two Es. !A
half duple& connection allows for only one(way
communication."
A bus is a more general form of network since it
allows multiple de'ices to be connected to it.
8/13/2019 Embedded Sys Unit 4
13/28
8/13/2019 Embedded Sys Unit 4
14/28
A bus transaction comprised a series of ?(byte
transmissions and an address followed by one or more
data bytes. I6# encourages a data(push programmingstyle. 4hen a master wants to write a sla'e, it transmits
the sla'e/s address followed by the data. Since a sla'e
cannot initiate a transfer, the master must send a read
re%uest with the sla'e/s address and let the sla'e transmit
the data. *herefore, an address transmission includes the
1(bit address and ? bit for data direction8 : for writingfrom the master to the sla'e and ? for reading from the
sla'e to the master. !*his e&plains the 1(bit addresses on
the bus." *he format of an address transmission is shown
in 0igure 7.9.
A bus transaction is initiated by a start signal and
completed with an end signal as follows8
A start is signaled by lea'ing the S#< high and sending
a ? to : transition on
SD
8/13/2019 Embedded Sys Unit 4
15/28
high while the data line assumes its proper 'alue of : or
?.An acknowledgment is sent at the end of e'ery 7(bit
transmission, whether it is an address or data. 0oracknowledgment, the transmitter does not pull down the
SD
8/13/2019 Embedded Sys Unit 4
16/28
ETHERNET
Ethernet is 'ery widely used as a local area network
for general(purpose computing. 3ecause of its ubi%uityand the low cost of Ethernet interfaces, it has seen
significant use as a network for embedded computing.
Ethernet is particularly useful when #s are used as
platforms, making it possible to use standard components,
and when the network does not ha'e to meet rigorous real(
time re%uirements.*he physical organiation of an Ethernet is 'ery
simple, as shown in 0igure 7.?C.*he network is a bus with
a single signal path- the Ethernet standard allows for
se'eral different implementations such as twisted pair and
coa&ial cable.
8/13/2019 Embedded Sys Unit 4
17/28
$nlike the I6# bus, nodes on the Ethernet are not
synchronied@they can send their bits at any time. I6#
relies on the fact that a collision can be detected and
%uashed within a single bit time thanks to synchroniation.
3ut since Ethernet nodes are not synchronied, if two
nodes decide to transmit at the same time, the message
will be ruined. *he Ethernet arbitration scheme is known
as Carrier Sense !ultiple "ccess with Collision
Detection (CS!"/CD. *he algorithm is outlined in
0igure 7.?G. A node that has a message waits for the bus to
become silent and then starts transmitting. It
simultaneously listens, and if it hears another transmission
that interferes with its transmission, it stops transmitting
and waits to retransmit. *he waiting time is random, butweighted by an e&ponential function of the number of
times the message has been aborted. 0igure 7.?H shows the
e&ponential back off function both before and after it is
modulated by the random wait time. Since a message may
be interfered with se'eral times before it is successfully
transmitted, the exponential back off techni#uehelps toensure that the network does not become o'erloaded at
high demand factors. *he random factor in the wait time
minimies the chance that two messages will repeatedly
interfere with each other.
8/13/2019 Embedded Sys Unit 4
18/28
*he ma&imum length of an Ethernet is determined
by the nodes/ ability to detect collisions. *he worst caseoccurs when two nodes at opposite ends of the bus are
transmitting simultaneously. 0or the collision to be
detected by both nodes, each node/s signal must be able to
8/13/2019 Embedded Sys Unit 4
19/28
tra'el to the opposite end of the bus so that it can be heard
by the other node. In practice, Ethernets can run up to
se'eral hundred meters.0igure 7.?1 shows the basic format of an Ethernet
packet. It pro'ides addresses of both the destination and
the source. It also pro'ides for a 'ariable(length data
payload.
NETWORK BASED DESIGN
Designing a distributed embedded system around a
network in'ol'es some of the same design tasks we faced
in accelerated systems. 4e must schedule computations in
time and allocate them to Es. Scheduling and allocation
of communication are important additional design tasks
re%uired for many distributed networks. 5any embedded
networks are designed for low cost and therefore do not
pro'ide e&cessi'ely high communication speed. If we arenot careful, the network can become the bottleneck in
system design. In this section we concentrate on design
tasks uni%ue to network(based distributed embedded
systems.
4e know how to analye the e&ecution time of
programs and systems of processes on single #$s, but toanalye the performance of networks we must know how
to determine the delay incurred by transmitting messages.
8/13/2019 Embedded Sys Unit 4
20/28
4here tx is the transmitter(side o'erhead, tn is the
network transmission time, and tr is the recei'er(side
o'erhead. In I6#, tx and tr are negligible relati'e to tn, asillustrated by E&le 7.6.
8/13/2019 Embedded Sys Unit 4
21/28
If messages can interfere with each other in the
network, analying communication delay becomesdifficult. In general, because we must wait for the network
to become a'ailable and then transmit the message, we can
write the mess$.edel$! as
ty tdJtm !7.6"
4here td is the network a$ailability delay incurred
waiting for the network to3ecome a'ailable. *he main problem, therefore, is
calculating td. *hat 'alue depends on the type of
arbitration used in the network.
If the network uses fi&ed(priority arbitration, the
network a'ailability delay is
$nbounded for all but the highest(priority de'ice. Since
the highest(priority de'ice always gets the network first,
unless there is an application(specific limit on how long it
will transmit before relin%uishing the network, it can keep
blocking the other de'ices indefinitely.
If the network uses fair arbitration, the network
a'ailability delay is bounded.
In the case of round(robin arbitration, if there are Nde'ices, then the worst case
Network a'ailability delay is N (tx +tarb), where tarb is
the delay incurred for arbitration. tarb is usually small
compared to transmission time.
E'en when round(robin arbitration is used to bind
the network a'ailability delay, the waiting time can be'ery long. If we add acknowledgment and data corruption
into the analysis, figuring network delay is more difficult.
Assuming that errors are random, we cannot predict a
worst(case delay since e'ery packet may contain an error.
4e can, howe'er, compute the probability that a packet
will be delayed for more than a gi'en amount of time.
Arbitration on networks is a form of prioritiation.
In a rate(monotonic communication scheme, the task with
the shortest deadline should be assigned the highest
priority in the network.
8/13/2019 Embedded Sys Unit 4
22/28
Our process scheduling model assumed that we
could interrupt processes at any point. 3ut network
communications are organied into packets. In mostnetworks we cannot interrupt a packet transmission to take
o'er the network for a higher priority packet. As a result,
networks e&hibit priority in'ersion like that introduced in
#hapter H.4hen a low(priority message is on the network,
the network is effecti'ely allocated to that low(priority
message, allowing it to block higher(priority messages.
*his cannot cause deadlock since each message has a
bounded length, but it can slow down critical
communications. *he only solution is to analye network
beha'ior to determine whether priority in'ersion causes
some messages to be delayed for too long.
Of course, a round(robin arbitrated network puts all
communications at the same priority. *his does not
eliminate the priority in'ersion problem because processes
still ha'e priorities.
*hus far we ha'e assumed a single-hop network: A
message is recei'ed at its intended destination directly
from the source, without going through any other networknode. It is possible to build multi-hop networks in which
messages are routed through network nodes to get to their
destinations. !$sing a multistage network does not
necessarily mean using a multi(hop network@the stages in
a multistage network are generally much smaller than the
network Es." 0igure 7.?7 shows an e&le of a multi(
hop communication. *he hardware platform has two
separate networks !perhaps so that communications
between subsets of the Es do not interfere", but there is
no direct path from ? to G.*he message is therefore
routed through2, which reads it from one network and
sends it on to the other one. Analying delays through
multi(hop systems is 'ery difficult. 0or e&le, the time
that the message is held at 2 depends on both the
computational load of 2 and the other messages that it
must handle.
8/13/2019 Embedded Sys Unit 4
23/28
8/13/2019 Embedded Sys Unit 4
24/28
*he I'e*'e P*oo#ol 0IP1 is the fundamental
protocol on the %nternet. It pro'ides connectionless,
packet(based communication. Industrial automation haslong been a good application area for Internet(based
embedded systems. Information appliances that use the
Internet are rapidly becoming another use of I in
embedded computing.
Internet protocol is not defined o'er a particular
physical implementation@it is an internetworking
standard. Internet packets are assumed to be carried by
some other network, such as an Ethernet. In general, an
Internet packet will tra'el o'er se'eral different networks
from source to destination. *he I allows data to flow
seamlessly through these networks from one end user to
another. *he relationship between I and indi'idual
networks is illustrated in 0igure 7.?9. I works at the
network layer. 4hen node A wants to send data to node 3,
the application/s data pass through se'eral layers of the
protocol stack to send to the I. I creates packets for
routing to the destination, which are then sent to the d!t!
lin" and#$ysic!l layers. A node that transmits data among
different types of networks is known as a router. *herouter/s functionality must go up to the I layer, but since
it is not running applications- it does not need to go to
higher le'els of the OSI model. In general, a packet may
go through se'eral routers to get to its destination. At the
destination, the I layer pro'ides data to the transport
layer and ultimately the recei'ing application. As the data
pass through se'eral layers of the protocol stack, the I
packet data are encapsulated in packet formats appropriate
to each layer.
8/13/2019 Embedded Sys Unit 4
25/28
*he basic format of an I packet is shown in 0igure 7.6:.
*he header and data payload are both of 'ariable length.
*he ma&imum total length of the header and data payload
is HG,G2G bytes.
An Internet address is a number !26 bits in early
'ersions of I, ?67 bits in I'H. *he I address is typically
written in the form &&&.&&.&&.&&. *he names by which
users and applications typically refer to Internet nodes,
such as foo.ba.com, are translated into I addresses 'iacalls to a Domain &ame Ser$er, one of the higher(le'el
ser'ices built on top of I.
*he fact that I works at the network layer tells us
that it does not guarantee that a packet is deli'ered to its
destination. 0urthermore, packets that do arri'e may come
out of order. *his is referred to as best-effort routing.
Since routes for data may change %uickly with subse%uent
packets being routed along 'ery different paths with
different delays, real(time performance of I can be hard
to predict. 4hen a small network is contained totally
within the embedded system, performance can be
e'aluated through simulation or other methods because the
possible inputs are limited. Since the performance of the
Internet may depend on worldwide usage patterns, its real(
time performance is inherently harder to predict.
8/13/2019 Embedded Sys Unit 4
26/28
*he Internet also pro'ides higher(le'el ser'ices
built on top of I. *he 'ransmission Control rotocol
('C is one such e&le. It pro'ides a connectionoriented ser'ice that ensures that data arri'e in the
appropriate order, and it usesan acknowledgment protocol
to ensure that packets arri'e. 3ecause many higher le'el
ser'ices are built on top of *#, the basic protocol is often
referred to as*#)I.
0igure 7.6? shows the relationships between I and
higher(le'el Internet ser'ices. $sing I as the foundation,
*# is used to pro'ide )ile 'ransport rotocolfor batch
file transfers, *ypertext 'ransport rotocol (*'' for
8/13/2019 Embedded Sys Unit 4
27/28
8/13/2019 Embedded Sys Unit 4
28/28
embedded system. *his nonreal(time interaction can be
used to monitor the system, set its configuration, and
interact with it.
INTERNET SECURITY
#onnecting an embedded system to the Internet
opens up the system to the same sorts of attacks that are
made on #s and ser'ers e'ery day. Bowe'er, attacks on
embedded systems can destroy not only information but
also the physical de'ices connected to the embedded
processor. Dung listed se'eral e&le attacks that
caused significant damage8
A work infected the computer network of the #SF
railway, causing all trains
in the 4ashington, D# area to be shut down for a half day.
A worm disabled the computer(based safety monitoring
system at the Da'is(
3esse nuclear power plant in Ohio.
A former consultant to a waste water plant in Australia
used its computers to
Kelease one million liters of sewage into the areawaterways.
*hey point out that security can be enforced at all
le'els of the network stack. ;eneral network security
principles can be applied to Internet(enabled embedded
systems- 'arious industrial standards also deal with
measures specific to industrial networks.