Upload
tigabu-yaya
View
11
Download
2
Tags:
Embed Size (px)
Citation preview
Mekelle Institute of TechnologyEmbedded Systems (CSE507)
Department of Electronics and Communication Engineering/Computer Science and Engineering
Lecture – 6Hardware Acceleration and Embedded Networks
Hardware Acceleration
● Hardware acceleration is the use of computer hardware to perform some function faster than is possible in software running on the general-purpose CPU. Examples of hardware acceleration include acceleration functionality in graphics processing units (GPUs) and instructions for complex operations in CPUs.( from wikipedia)
....
● Normally, processors are sequential, and instructions are executed one by one. Various techniques are used to improve performance; hardware acceleration is one of them. The main difference between hardware and software is concurrency, allowing hardware to be much faster than software. Hardware accelerators are designed for computationally intensive software code. Depending upon granularity, hardware acceleration can vary from a small functional unit to a large functional block
....
● The hardware that performs the acceleration, when in a separate unit from the CPU, is referred to as a hardware accelerator, or often more specifically as graphics accelerator or floating-point accelerator, etc. Those terms, however, are older and have been replaced with less descriptive terms like video card or graphics card.
....
● Many hardware accelerators are built on top of field-programmable gate array chips.
CPUs and accelerators
● Accelerated Systems• Use additional computational unit
dedicated to some functions• hardwired logic• extra CPU
• Coprocessor
• Applications– graphics and multimedia (streaming data)– encryption and compression– communication devices (signal processing)– supercomputing, numerical computations
Why Accelerators?
● Better cost/performance.• Custom logic may be able to perform operation faster
than a CPU of equivalent cost.• CPU cost is a non-linear function of performance.
● Better real-time performance.• Put time-critical functions on less-loaded processing
elements.
Role of Performance Estimation
● First, determine that the system really needs to be accelerated.
• How much faster is the accelerator on the core function (speed up)?
• How much data transfer overhead?
• How much is the overhead for synchronization with CPU?● Performance estimation must be done on all levels of
abstraction• Simulation based methods (only average case, need to find
test patterns that stress the system)
• Analytic methods (only limited accuracy, quick, used mainly on system level)
Accelerator Execution Time
● Total accelerator execution time:
Input/output time (bus transactions) include:– flushing register/cache values to main memory;– time required for CPU to set up transaction;– overhead of data transfers by bus packets, handshaking,
etc.
Accelerator Gain● For simplification let us consider:
• An application consists of one task only, repeated n times• task can be executed completely on the accelerator
• But in general:– not all tasks can be executed on the accelerator– CPU has also other obligations– possibilities:
• single-threaded/blocking: CPU waits for accelerator
• multithreaded/non-blocking: CPU continues to execute along with accelerator, CPU must have useful work to do, software must support multi-threading
Accelerator/CPU Interface Issues
● Synchronization• via interrupts• via special data and control registers at accelerator
● Data transfer to main memory• assisted by DMA or special logic within the accelerator• caching problems as CPU works on cache and
accelerator works on main memory (declare data area as non-cacheable, invalidate cache after transfer, write through cache, …)
System Design Issues● Hardware/software co-design
• meeting system-level objectives by exploiting the synergism of hardware and software through their concurrent design.
• joint design of hardware and software architectures.
• Co-design problems have different flavor according to the application domain, implementation technology and design methodology
● Design a heterogeneous multiprocessor architecture
• Communication (bus, network, …)• Memory architecture• Interfaces and I/O
• Processing elements (CPU, application-specific integrated circuit, FPGA (Field-programmable gate array))
● Program the system
Overheads for Computers as Components 13
Networking for Embedded Systems
● Why we use networks.● Network abstractions.● Example networks.
Overheads for Computers as Components 14
Network elements
PEPE
PE
network
communication link
distributed computing platform:
PEs may be CPUs or ASICs.
Overheads for Computers as Components 15
Networks in embedded systems
PEPE sensor
PE actuator
initial processing
more processing
Overheads for Computers as Components 16
Why distributed?
● Higher performance at lower cost.● Physically distributed activities---time
constants may not allow transmission to central site.
● Improved debugging---use one CPU in network to debug others.
● May buy subsystems that have embedded processors.
Overheads for Computers as Components 17
Hardware architectures
● Many different types of networks:● topology;● scheduling of communication;● routing.
18
Point-to-point networks
● One source, one or more destinations, no data switching (serial port):
PE 1 PE 2 PE 3
link 1 link 2
Overheads for Computers as Components 19
Bus networks
● Common physical connection:
PE 1 PE 2 PE 3 PE 4
header address data ECC packet format
Overheads for Computers as Components 20
Bus arbitration
● Fixed: Same order every time.● Fair: every PE has the same access over long
periods.● round-robin: rotate top priority among Pes.
A,B,C A,B,C
fixed
round-robin
A B C A B C
A B C AB C
Overheads for Computers as Components 21
Multi-stage networks
● Use several stages of switching elements.● Often blocking.● Often smaller than crossbar.
Overheads for Computers as Components 22
I2C bus
● Designed for low-cost, medium data rate applications.
● Characteristics:● serial;● multiple-master;● fixed-priority arbitration.
● Several micro-controllers come with built-in I2C controllers.
23
The CAN Bus
● Originally designed for automotive electronics● Now used for other applications as well● Bit serial transmission, 500 Kb/s, over twisted pair,up to 40 m● Synchronous, nodes synchronize themselves by listening to
the bit transitions on the bus● Arbitration by using Carrier Sense Multiple Access with
Arbitration on Message Priority (CSMA/AMP)● For error handling a special error frame and an overload
frame are used as well as acknowledgements