Embedded Sys Unit 4

Embed Size (px)

Citation preview

  • 8/13/2019 Embedded Sys Unit 4

    1/28

    UNIT IV

    HARDWARE ACCELERATES & NETWORKS

    Accelerators Accelerated system design Distributed

    Embedded Architecture Networks for Embedded

    Systems Network based design Internet enabled

    Systems.

    CPUs AND ACCELERATORS

    One important category of E !rocessing element"

    for embedded multiprocessor is the accelerator. An

    accelerator is attached to #$ buses to %uickly e&ecute

    certain key functions. Accelerators can pro'ide large

    performance increases for applications with

    computational kernels that spend a great deal of time in a

    small section of code. Accelerators can also pro'ide

    critical speedups for low(latency I)O functions.

    *he design of accelerated systems is one e&ample of

    hardware/software

    Co-designthe simultaneous design of hardware and

    software to meet system

    Ob+ecti'es. *hus far, we ha'e taken the computing

    platform as a gi'en- by adding accelerators, we can

    customie the embedded platform to better meet our

    application/s demands.

    As illustrated in 0igure 1.2, a #$ accelerator is

    attached to the #$ bus. *he #$ is often called the host.

    *he #$ talks to the accelerator through data and controlregisters in the accelerator. *hese registers allow the #$

    to monitor the accelerator/s operation and to gi'e the

    accelerator commands.

    *he #$ and accelerator may also communicate 'ia

    shared memory. If the accelerator needs to operate on a

    large 'olume of data, it is usually more efficient to lea'ethe data in memory and ha'e the accelerators read and

    write memory directly rather than to ha'e the #$ shuttle

    data from memory to accelerator registers and back.

  • 8/13/2019 Embedded Sys Unit 4

    2/28

  • 8/13/2019 Embedded Sys Unit 4

    3/28

    3oth #$s and accelerators perform computations

    re%uired by the specification- at some le'el we do not care

    whether the work is done on a programmable #$ or on a

    hardwired unit.

    *he first task in designing an accelerator is

    determining that our system actually needs one. 4e ha'e

    to make sure that the function we want to accelerate will

    run more %uickly on our accelerator than it will by

    e&ecuting as software on a #$. If our system #$ is a

    small microcontroller, the race may be easily won, but

    competing against a high(performance #$ is a challenge.

    4e also ha'e to make sure that the accelerated function

    will speed up the system. If some other operation is in fact

    the bottleneck, or if mo'ing data into and out of the

    accelerator is too slow, then adding the accelerator may

    not be a net gain.

    Once we ha'e analyed the system, we need to

    design the accelerator itself. In order to ha'e identified our

    need for an accelerator, we must ha'e a good

    understanding of the algorithm to be accelerated, which is

    often in the form of a high(le'el language program. 4e

    must translate the algorithm description into a hardware

    design, a considerable task in itself. 4e must also designthe interface between the accelerator core and the #$

    bus. *he interface includes more than bus handshaking

    logic. 0or e&ample, we ha'e to determine how the

    application software on the #$ will communicate with

    the accelerator and pro'ide the re%uired registers- we may

    ha'e to implement shared memory synchroniationoperations- and we may ha'e to add address generation

    logic to read and write large amounts of data from system

    memory.

  • 8/13/2019 Embedded Sys Unit 4

    4/28

    0inally, we will ha'e to design the #$(side

    interface to the accelerator. *he application software will

    ha'e to talk to the accelerator, pro'iding it data and telling

    it what to do. 4e ha'e to somehow synchronie the

    operation of the accelerator with the rest of the application

    so that the accelerator knows when it has the re%uired data

    and the #$ knows when it has recei'ed the desired

    results.

    ACCELRATED SYSTEM DESIGNVIDEO ACCELERATOR

    In this section we use a 'ideo accelerator as an

    e&ample of an accelerated embedded system. Digital 'ideo

    is still a computationally intensi'e task, so it is well suited

    to acceleration. 5otion estimation engines are used in

    real(time search engines- we may want to ha'e oneattached to our personal computer to e&periment with

    'ideo processing techni%ues.

    ALGORITHM AND REQUIREMENTS

    4e could build an accelerator for any number of

    digital 'ideo algorithms. 4e will choose block motion

    estimation as our e&ample here because it is 'ery

    #omputation and memory intensi'e but it is relati'ely

    easy to e&plain.

    3lock motion estimation is used in digital 'ideo

    compression algorithms so that one frame in the 'ideo can

    be described in terms of the differences between it and

    another frame. 3ecause ob+ects in the frame often mo'erelati'ely little, describing one frame in terms of another

    greatly reduces the number of bits re%uired to describe the

    'ideo.

  • 8/13/2019 Embedded Sys Unit 4

    5/28

    REQUIREMENTS FOR THE SYSTEM

    SPECIFICATION

    *he specification for the system is relati'ely

    straightforward because the algorithm is simple. 0igure

    1.67 defines some classes that describe basic data types in

    the system8 the motion 'ector, the macro block, and the

    search area. *hese definitions are straightforward. 3ecausethe beha'ior is simple, we need to define only two classes

    to describe it8 the accelerator itself and the #. *hese

    classes are shown in 0igure 1.69. *he # makes its

  • 8/13/2019 Embedded Sys Unit 4

    6/28

    memory accessible to the accelerator. *he accelerator

    pro'ides a beha'ior compute(m' ! " that performs the

    block motion estimation algorithm. 0igure 1.2: shows a

    se%uence diagram that describes the operation of compute(

    m' ! ". After initiating the beha'ior, the accelerator reads

    the search area and macro block from the #- after

    computing the motion 'ector, it returns it to the #.

    ARCHITECTURE

    *he accelerator will be implemented in an 0;A ona card connected to a #/s #I slot. Such accelerators can

    be purchased or they can be designed from scratch. If you

    design such a card from scratch, you ha'e to decide early

    on whether the card will be used only for this 'ideo

    accelerator or if it should be made general enough to

    support other applications as well.

  • 8/13/2019 Embedded Sys Unit 4

    7/28

    SYSTEM TESTING*esting 'ideo algorithms re%uires a large amount of

    data. E;

    images and put out pi&els in the format re%uired by your

    accelerator. 4ith a little more cle'erness, the resulting

    motion 'ector can be written back onto the image for a'isual confirmation of the result. If you want to be

    ad'enturous and try motion estimation on 'ideo, open

    source 5E; encoders and decoders are also a'ailable.

    DISTRIBUTED EMBEDDED ARCHITECTURE

    A distributed embedded system can be organied in

    many different ways, but its basic units are the E and the

    network as illustrated in 0igure 7.?. A E may be an

    instruction set processor such as a DSP, CPU, or

    microcontroller, as well as a nonprogrammable unit such

    as theASICs used to implementPE 4. An I)O de'ice such

  • 8/13/2019 Embedded Sys Unit 4

    8/28

    as PE 1 !which we call here a sensor or actuator,

    depending on whether it pro'ides input or output" may

    also be a E, so long as it can speak the network protocol

    to communicate with other Es. *he network in this case

    is a bus, but other network topologies are also possible. It

    is also possible that the system can use more than one

    network, such as when relati'ely independent functions

    re%uire relati'ely little communication among them. 4e

    often refer to the connection between Es pro'ided by thenetwork as a communication link.

    *he system of Es and networks forms the hardware

    platform on which the

    Application runs.

    *he distributed embedded system does not ha'e

    memory on the bus !unless a memory unit is organied as

    an I)O de'ice that speaks the network protocol". In

    particular, Es does not fetch instructions o'er the

    network as they do on the microprocessor bus. 4e take

    ad'antage of this fact when analying network

  • 8/13/2019 Embedded Sys Unit 4

    9/28

    performance@the speed at which Es can communicate

    o'er the bus would be difficult if not impossible to predict

    if we allowed arbitrary instruction and data fetches as we

    do on microprocessor buses.

    WHY DISTRIBUTED

    3uilding an embedded system with se'eral Es

    talking o'er a network is definitely more complicated than

    using a single large microprocessor to perform the same

    tasks. So why would anyone build a distributed embeddedsystem All the reasons for designing accelerator systems

    also apply to distributed embedded systems, and se'eral

    more reasons are uni%ue to distributed systems.

    In some cases, distributed systems are necessary

    because the de'ices that the Es communicates with are

    physically separated. If the deadlines for processing the

    data are short, it may be more cost(effecti'e to put the Es

    where the data are located rather than build a higher(speed

    network to carry the data to a distant, fast E.

    An important ad'antage of a distributed system with

    se'eral #$s is that one part of the system can be used to

    help diagnose problems in another part. 4hether you are

    debugging a prototype or diagnosing a problem in the

    field, isolating the error to one part of the system can be

    difficult when e'erything is done on a single #$. If you

    ha'e se'eral #$s in the system, you can use one to

    generate inputs for another and to watch its output.

    NETWORK ABSTRACTIONS

    Networks are comple& systems. Ideally, they

    pro'ide high(le'el ser'ices while hiding many of the

    details of data transmission from the other components in

    the system. In order to help understand !and design"

    networks, the International Standards Organiation has

    de'eloped a se'en(layer model for networks known as

    Open Systems Interconnection !OSI" models.

    $nderstanding the OSI layers will help us to understand

    the details of real networks.

  • 8/13/2019 Embedded Sys Unit 4

    10/28

  • 8/13/2019 Embedded Sys Unit 4

    11/28

    HARDWARE AND SOFTWARE ARCHITECTURES

    Distributed embedded systems can be organied in

    many different ways depending upon the needs of the

    application and cost constraints. One good way to

    understand possible architectures is to consider the

    different types of interconnection networks that can be

    used.

    A point-to-point link establishes a connection

    between e&actly two Es. oint to(point links are simple

    to design precisely because they deal with only two

    components. 4e do not ha'e to worry about other Es

    interfering with communication on the link.

    0igure 7.2 shows a simple e&ample of a distributed

    embedded system built from oint(to(point links. *he

    input signal is sampled by the input de'ice and passed to

    the first digital filter, F?, o'er a point(to(point link. *heresults of that filter are sent through a second point(to(

    point link to filter F6. *he results in turn are sent to the

    output de'ice o'er a third point(to(point link. A digital

    filtering system re%uires that its outputs arri'e at strict

    inter'als, which means that the filters must process their

    inputs in a timely fashion. $sing point(to(point

    connections allows both F? and F6 to recei'e a new

    sample and send a new output at the same time without

    worrying about collisions on the communications network.

  • 8/13/2019 Embedded Sys Unit 4

    12/28

    It is possible to build a full-duplex, point(to(point

    connection that can be used for simultaneous

    communication in both directions between the two Es. !A

    half duple& connection allows for only one(way

    communication."

    A bus is a more general form of network since it

    allows multiple de'ices to be connected to it.

  • 8/13/2019 Embedded Sys Unit 4

    13/28

  • 8/13/2019 Embedded Sys Unit 4

    14/28

    A bus transaction comprised a series of ?(byte

    transmissions and an address followed by one or more

    data bytes. I6# encourages a data(push programmingstyle. 4hen a master wants to write a sla'e, it transmits

    the sla'e/s address followed by the data. Since a sla'e

    cannot initiate a transfer, the master must send a read

    re%uest with the sla'e/s address and let the sla'e transmit

    the data. *herefore, an address transmission includes the

    1(bit address and ? bit for data direction8 : for writingfrom the master to the sla'e and ? for reading from the

    sla'e to the master. !*his e&plains the 1(bit addresses on

    the bus." *he format of an address transmission is shown

    in 0igure 7.9.

    A bus transaction is initiated by a start signal and

    completed with an end signal as follows8

    A start is signaled by lea'ing the S#< high and sending

    a ? to : transition on

    SD

  • 8/13/2019 Embedded Sys Unit 4

    15/28

    high while the data line assumes its proper 'alue of : or

    ?.An acknowledgment is sent at the end of e'ery 7(bit

    transmission, whether it is an address or data. 0oracknowledgment, the transmitter does not pull down the

    SD

  • 8/13/2019 Embedded Sys Unit 4

    16/28

    ETHERNET

    Ethernet is 'ery widely used as a local area network

    for general(purpose computing. 3ecause of its ubi%uityand the low cost of Ethernet interfaces, it has seen

    significant use as a network for embedded computing.

    Ethernet is particularly useful when #s are used as

    platforms, making it possible to use standard components,

    and when the network does not ha'e to meet rigorous real(

    time re%uirements.*he physical organiation of an Ethernet is 'ery

    simple, as shown in 0igure 7.?C.*he network is a bus with

    a single signal path- the Ethernet standard allows for

    se'eral different implementations such as twisted pair and

    coa&ial cable.

  • 8/13/2019 Embedded Sys Unit 4

    17/28

    $nlike the I6# bus, nodes on the Ethernet are not

    synchronied@they can send their bits at any time. I6#

    relies on the fact that a collision can be detected and

    %uashed within a single bit time thanks to synchroniation.

    3ut since Ethernet nodes are not synchronied, if two

    nodes decide to transmit at the same time, the message

    will be ruined. *he Ethernet arbitration scheme is known

    as Carrier Sense !ultiple "ccess with Collision

    Detection (CS!"/CD. *he algorithm is outlined in

    0igure 7.?G. A node that has a message waits for the bus to

    become silent and then starts transmitting. It

    simultaneously listens, and if it hears another transmission

    that interferes with its transmission, it stops transmitting

    and waits to retransmit. *he waiting time is random, butweighted by an e&ponential function of the number of

    times the message has been aborted. 0igure 7.?H shows the

    e&ponential back off function both before and after it is

    modulated by the random wait time. Since a message may

    be interfered with se'eral times before it is successfully

    transmitted, the exponential back off techni#uehelps toensure that the network does not become o'erloaded at

    high demand factors. *he random factor in the wait time

    minimies the chance that two messages will repeatedly

    interfere with each other.

  • 8/13/2019 Embedded Sys Unit 4

    18/28

    *he ma&imum length of an Ethernet is determined

    by the nodes/ ability to detect collisions. *he worst caseoccurs when two nodes at opposite ends of the bus are

    transmitting simultaneously. 0or the collision to be

    detected by both nodes, each node/s signal must be able to

  • 8/13/2019 Embedded Sys Unit 4

    19/28

    tra'el to the opposite end of the bus so that it can be heard

    by the other node. In practice, Ethernets can run up to

    se'eral hundred meters.0igure 7.?1 shows the basic format of an Ethernet

    packet. It pro'ides addresses of both the destination and

    the source. It also pro'ides for a 'ariable(length data

    payload.

    NETWORK BASED DESIGN

    Designing a distributed embedded system around a

    network in'ol'es some of the same design tasks we faced

    in accelerated systems. 4e must schedule computations in

    time and allocate them to Es. Scheduling and allocation

    of communication are important additional design tasks

    re%uired for many distributed networks. 5any embedded

    networks are designed for low cost and therefore do not

    pro'ide e&cessi'ely high communication speed. If we arenot careful, the network can become the bottleneck in

    system design. In this section we concentrate on design

    tasks uni%ue to network(based distributed embedded

    systems.

    4e know how to analye the e&ecution time of

    programs and systems of processes on single #$s, but toanalye the performance of networks we must know how

    to determine the delay incurred by transmitting messages.

  • 8/13/2019 Embedded Sys Unit 4

    20/28

    4here tx is the transmitter(side o'erhead, tn is the

    network transmission time, and tr is the recei'er(side

    o'erhead. In I6#, tx and tr are negligible relati'e to tn, asillustrated by E&ample 7.6.

  • 8/13/2019 Embedded Sys Unit 4

    21/28

    If messages can interfere with each other in the

    network, analying communication delay becomesdifficult. In general, because we must wait for the network

    to become a'ailable and then transmit the message, we can

    write the mess$.edel$! as

    ty tdJtm !7.6"

    4here td is the network a$ailability delay incurred

    waiting for the network to3ecome a'ailable. *he main problem, therefore, is

    calculating td. *hat 'alue depends on the type of

    arbitration used in the network.

    If the network uses fi&ed(priority arbitration, the

    network a'ailability delay is

    $nbounded for all but the highest(priority de'ice. Since

    the highest(priority de'ice always gets the network first,

    unless there is an application(specific limit on how long it

    will transmit before relin%uishing the network, it can keep

    blocking the other de'ices indefinitely.

    If the network uses fair arbitration, the network

    a'ailability delay is bounded.

    In the case of round(robin arbitration, if there are Nde'ices, then the worst case

    Network a'ailability delay is N (tx +tarb), where tarb is

    the delay incurred for arbitration. tarb is usually small

    compared to transmission time.

    E'en when round(robin arbitration is used to bind

    the network a'ailability delay, the waiting time can be'ery long. If we add acknowledgment and data corruption

    into the analysis, figuring network delay is more difficult.

    Assuming that errors are random, we cannot predict a

    worst(case delay since e'ery packet may contain an error.

    4e can, howe'er, compute the probability that a packet

    will be delayed for more than a gi'en amount of time.

    Arbitration on networks is a form of prioritiation.

    In a rate(monotonic communication scheme, the task with

    the shortest deadline should be assigned the highest

    priority in the network.

  • 8/13/2019 Embedded Sys Unit 4

    22/28

    Our process scheduling model assumed that we

    could interrupt processes at any point. 3ut network

    communications are organied into packets. In mostnetworks we cannot interrupt a packet transmission to take

    o'er the network for a higher priority packet. As a result,

    networks e&hibit priority in'ersion like that introduced in

    #hapter H.4hen a low(priority message is on the network,

    the network is effecti'ely allocated to that low(priority

    message, allowing it to block higher(priority messages.

    *his cannot cause deadlock since each message has a

    bounded length, but it can slow down critical

    communications. *he only solution is to analye network

    beha'ior to determine whether priority in'ersion causes

    some messages to be delayed for too long.

    Of course, a round(robin arbitrated network puts all

    communications at the same priority. *his does not

    eliminate the priority in'ersion problem because processes

    still ha'e priorities.

    *hus far we ha'e assumed a single-hop network: A

    message is recei'ed at its intended destination directly

    from the source, without going through any other networknode. It is possible to build multi-hop networks in which

    messages are routed through network nodes to get to their

    destinations. !$sing a multistage network does not

    necessarily mean using a multi(hop network@the stages in

    a multistage network are generally much smaller than the

    network Es." 0igure 7.?7 shows an e&ample of a multi(

    hop communication. *he hardware platform has two

    separate networks !perhaps so that communications

    between subsets of the Es do not interfere", but there is

    no direct path from ? to G.*he message is therefore

    routed through2, which reads it from one network and

    sends it on to the other one. Analying delays through

    multi(hop systems is 'ery difficult. 0or e&ample, the time

    that the message is held at 2 depends on both the

    computational load of 2 and the other messages that it

    must handle.

  • 8/13/2019 Embedded Sys Unit 4

    23/28

  • 8/13/2019 Embedded Sys Unit 4

    24/28

    *he I'e*'e P*oo#ol 0IP1 is the fundamental

    protocol on the %nternet. It pro'ides connectionless,

    packet(based communication. Industrial automation haslong been a good application area for Internet(based

    embedded systems. Information appliances that use the

    Internet are rapidly becoming another use of I in

    embedded computing.

    Internet protocol is not defined o'er a particular

    physical implementation@it is an internetworking

    standard. Internet packets are assumed to be carried by

    some other network, such as an Ethernet. In general, an

    Internet packet will tra'el o'er se'eral different networks

    from source to destination. *he I allows data to flow

    seamlessly through these networks from one end user to

    another. *he relationship between I and indi'idual

    networks is illustrated in 0igure 7.?9. I works at the

    network layer. 4hen node A wants to send data to node 3,

    the application/s data pass through se'eral layers of the

    protocol stack to send to the I. I creates packets for

    routing to the destination, which are then sent to the d!t!

    lin" and#$ysic!l layers. A node that transmits data among

    different types of networks is known as a router. *herouter/s functionality must go up to the I layer, but since

    it is not running applications- it does not need to go to

    higher le'els of the OSI model. In general, a packet may

    go through se'eral routers to get to its destination. At the

    destination, the I layer pro'ides data to the transport

    layer and ultimately the recei'ing application. As the data

    pass through se'eral layers of the protocol stack, the I

    packet data are encapsulated in packet formats appropriate

    to each layer.

  • 8/13/2019 Embedded Sys Unit 4

    25/28

    *he basic format of an I packet is shown in 0igure 7.6:.

    *he header and data payload are both of 'ariable length.

    *he ma&imum total length of the header and data payload

    is HG,G2G bytes.

    An Internet address is a number !26 bits in early

    'ersions of I, ?67 bits in I'H. *he I address is typically

    written in the form &&&.&&.&&.&&. *he names by which

    users and applications typically refer to Internet nodes,

    such as foo.ba.com, are translated into I addresses 'iacalls to a Domain &ame Ser$er, one of the higher(le'el

    ser'ices built on top of I.

    *he fact that I works at the network layer tells us

    that it does not guarantee that a packet is deli'ered to its

    destination. 0urthermore, packets that do arri'e may come

    out of order. *his is referred to as best-effort routing.

    Since routes for data may change %uickly with subse%uent

    packets being routed along 'ery different paths with

    different delays, real(time performance of I can be hard

    to predict. 4hen a small network is contained totally

    within the embedded system, performance can be

    e'aluated through simulation or other methods because the

    possible inputs are limited. Since the performance of the

    Internet may depend on worldwide usage patterns, its real(

    time performance is inherently harder to predict.

  • 8/13/2019 Embedded Sys Unit 4

    26/28

    *he Internet also pro'ides higher(le'el ser'ices

    built on top of I. *he 'ransmission Control rotocol

    ('C is one such e&ample. It pro'ides a connectionoriented ser'ice that ensures that data arri'e in the

    appropriate order, and it usesan acknowledgment protocol

    to ensure that packets arri'e. 3ecause many higher le'el

    ser'ices are built on top of *#, the basic protocol is often

    referred to as*#)I.

    0igure 7.6? shows the relationships between I and

    higher(le'el Internet ser'ices. $sing I as the foundation,

    *# is used to pro'ide )ile 'ransport rotocolfor batch

    file transfers, *ypertext 'ransport rotocol (*'' for

  • 8/13/2019 Embedded Sys Unit 4

    27/28

  • 8/13/2019 Embedded Sys Unit 4

    28/28

    embedded system. *his nonreal(time interaction can be

    used to monitor the system, set its configuration, and

    interact with it.

    INTERNET SECURITY

    #onnecting an embedded system to the Internet

    opens up the system to the same sorts of attacks that are

    made on #s and ser'ers e'ery day. Bowe'er, attacks on

    embedded systems can destroy not only information but

    also the physical de'ices connected to the embedded

    processor. Dung listed se'eral e&ample attacks that

    caused significant damage8

    A work infected the computer network of the #SF

    railway, causing all trains

    in the 4ashington, D# area to be shut down for a half day.

    A worm disabled the computer(based safety monitoring

    system at the Da'is(

    3esse nuclear power plant in Ohio.

    A former consultant to a waste water plant in Australia

    used its computers to

    Kelease one million liters of sewage into the areawaterways.

    *hey point out that security can be enforced at all

    le'els of the network stack. ;eneral network security

    principles can be applied to Internet(enabled embedded

    systems- 'arious industrial standards also deal with

    measures specific to industrial networks.