224
1 © Krithi Ramamritham / Kavi Arya System Software for Embedded Systems Krithi Ramamritham Kavi Arya IIT Bombay VLSI 2004

© Krithi Ramamritham / Kavi Arya 1 System Software for Embedded Systems Krithi Ramamritham Kavi Arya IIT Bombay VLSI 2004

Embed Size (px)

Citation preview

1© Krithi Ramamritham / Kavi Arya

System Software for Embedded Systems

Krithi RamamrithamKavi Arya

IIT Bombay

VLSI 2004

2© Krithi Ramamritham / Kavi Arya

Embedded Systems?

3© Krithi Ramamritham / Kavi Arya

Plan

• Embedded Systems– Introduction– Application Examples

• New Approaches to building ESW– New paradigms: Lava, Handel-C– Examples + “Engineering Returns to Software”– Build a RISC processor in 48hrs– Advantages of reconfigurable hardware.

• Real-time support for ESW

4© Krithi Ramamritham / Kavi Arya

Embedded Systems

• Single functional e.g. pager, mobile phone• Tightly constrained

– cost, size, performance, power, etc.• Reactive & real-time

– e.g. car’s cruise controller– delay in computation => failure of system

5© Krithi Ramamritham / Kavi Arya

Hardware is not the whole System !!!

A Micro-Electronic System is the result of a projection of …– Architecture– Hardware– Software

… distinguished by its gross Functional Behaviour !

• Software is an important part of the Product and must be part of the Design Process

… or we are only designing a Component of the system.

6© Krithi Ramamritham / Kavi Arya

Why Is Embedded Software Not Just

Software On Small Computers?• Embedded = Dedicated• Interaction with physical processes

– sensors, actuators, processes• Critical properties are not all functional

– real-time, fault recovery, power, security, robustness• Heterogeneity

– hardware/software tradeoffs, mixed architectures• Concurrency

– interaction with multiple processes• Reactivity

– operating at the speed of the environment

These features look more like hardware!These features look more like hardware!

Source:Source:Edward A. Lee, UC BerkeleyEdward A. Lee, UC BerkeleySRC/ETAB Summer Study 2001SRC/ETAB Summer Study 2001

Source:Source:Edward A. Lee, UC BerkeleyEdward A. Lee, UC BerkeleySRC/ETAB Summer Study 2001SRC/ETAB Summer Study 2001

7© Krithi Ramamritham / Kavi Arya

What is Embedded SW?

One definition:

“Software that is directly in contact with, or significantly affected by, the hardware that it executes on, or can directly influence the behavior of that hardware.”

8© Krithi Ramamritham / Kavi Arya

What is Embedded SW? • What is it not?

• Application software can be recompiled and executed on any number of hardware platforms so long as the basic services/libraries are provided.– It is divided by vertical market segments (application

domains)– Well-established methodologies, architectures,…– HW platform independent, highly portable

• Any SW that has no direct relationship with HW.

9© Krithi Ramamritham / Kavi Arya

Embedded System Challenges for HW Folks

• PARADIGM CHANGE!– Designers main tasks convert from processor integration to

performance analysis. Concentration on functional requirements instead of integration work

– Concentration on architectural exploration (including performance analysis Re-use and Platform-based design become key!

Early validation of system/solution correctness Parallel hardware and software development More effective use of previous work Faster ways to build new elements of a solution Ways to test more effectively, efficiently, quickly

10© Krithi Ramamritham / Kavi Arya

Software Guys can Learnfrom Hardware Experts!

• Concurrency– the synchrony abstraction– event-driven modeling

• Reusability– cell libraries– interface definition

• Reliability– leveraging limited abstractions– leveraging verification

• Heterogeneity– mixing synchronous and asynchronous designs– resource management

Source:Source:Edward A. Lee, UC BerkeleyEdward A. Lee, UC BerkeleySRC/ETAB Summer Study 2001SRC/ETAB Summer Study 2001

Source:Source:Edward A. Lee, UC BerkeleyEdward A. Lee, UC BerkeleySRC/ETAB Summer Study 2001SRC/ETAB Summer Study 2001

11© Krithi Ramamritham / Kavi Arya

Trade-offs. Methodology ESW Architectural specifics

• Portability– ESW itself is intended to provide portability for higher SW layers– (At least parts of) ESW is per definition not portable

• Real-time– Restricted use of standardized Inter-process communication (IPC)

mechanisms (CORBA,…) for performance reasons– Typically hard real-time requirements

• RTOS dependency– Implementation of OS like services– Sometimes shielding of the RTOS to higher level SW layers– Direct dependency on RTOS implementation

12© Krithi Ramamritham / Kavi Arya

Functional Design & Mapping

HW1 HW2 HW3 HW4Hardware Interface

RTOS/Drivers

Thr

eadArchitectural

Design

F1F2

F3

F4

F5Functional

Design

(F3) (F4)

(F5)

(F2)

Source:Source:Ian Phillips, ARMIan Phillips, ARMVSIA 2001

Source:Source:Ian Phillips, ARMIan Phillips, ARMVSIA 2001

13© Krithi Ramamritham / Kavi Arya

The Embedded Market: Disruptive Change

Traditional Embedded WorldNever small enoughNever fast enoughHeadless/Character-basedStandaloneBoot & Run from ROMMore Hardware than SoftwareLow-Level Programming ModelApplication tied to hardware

Today’s Embedded WorldNever functional enoughAlways connectedHigh Integration Chips (ASIC/SOC)Architectural diversityCOTS & custom hardwareEPROM/Flash/Rotating MediaSoftware IntensiveWeb interfacesOOP Programming ModelStandard applications

• Time to Market Pressures• Shortage of Embed. SW Engineers

Source: Jim Ready President / CEO MontaVista Software

Source: Jim Ready President / CEO MontaVista Software

14© Krithi Ramamritham / Kavi Arya

Plan

• Embedded Systems

• New Approaches to building ESW– New paradigms: Lava, Handel-C– Examples + “Engineering Returns to Software”– Build a RISC processor in 48hrs– Advantages of reconfigurable hardware.

• Real-time support for ESW

15© Krithi Ramamritham / Kavi Arya

Motorola Software Survey Findings

• Hardware design is a software task: IC designers write code (VHDL, Verilog, Scripting)!

• We must become a software-intensive embedded system solutions company, focused on integrating our platforms into users’ products -in the future we’ll be neither a hardware nor a software company– Focus on developing systems capability, not just a software counterpart to our current

hardware capability (though that’s needed too)– We should have software content from drivers to applications

• The fundamental goal isn’t 70% margin on software products, it’s helping someone choose your total solution– Embedded systems platforms and solutions will be the key to market differentiation

and profitable growth

Source:Source:Bob Altizer, BASYSBob Altizer, BASYSVSIA 2001

Source:Source:Bob Altizer, BASYSBob Altizer, BASYSVSIA 2001

16© Krithi Ramamritham / Kavi Arya

Common Design Metrics

• NRE (Non-recurring engineering) cost• Unit cost• Size (bytes, gates)• Performance (execution time)• Power (more power=> more heat & less

battery time)• Flexibility (ability to change functionality)

17© Krithi Ramamritham / Kavi Arya

• Time to prototype• Time to market• Maintainability• Correctness• Safety (probability that system won’t

cause harm)

Common Design Metrics

18© Krithi Ramamritham / Kavi Arya

Time to Market Design Metric• Simplified revenue model

– Product life = 2W, peak at W– Time of market entry defines a triangle,

representing market penetration– Triangle area equals revenue

• Loss – The difference between the on-time and

delayed triangle areas

• Avg. time to market today = 8 mth• 1 day delay may amount to $Ms

– see Sony Playstation vs XBox

On-time Delayedentry entry

Peak revenue

Peak revenue from delayed entry

Market rise

Market fall

W 2W

Time

D

On-time

Delayed

Rev

enue

s ($

)

Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc.2002)

19© Krithi Ramamritham / Kavi Arya

NRE and unit cost metrics

• But, must also consider time-to-market

$0

$40,000

$80,000

$120,000

$160,000

$200,000

0 800 1600 2400

A

B

C

$0

$40

$80

$120

$160

$200

0 800 1600 2400

Number of units (volume)

A

B

C

Number of units (volume)

tota

l co

st (

x100

0)

pe

r p

rod

uc

t c

ost

• Compare technologies by costs -- best depends on quantity– Technology A: NRE=$2,000, unit=$100– Technology B: NRE=$30,000, unit=$30– Technology C: NRE=$100,000, unit=$2

Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc.2002)

20© Krithi Ramamritham / Kavi Arya

Losses due to delayed market entry

• Area = 1/2 * base * height– On-time = 1/2 * 2W * W– Delayed = 1/2 * (W-D+W)*(W-D)

• Percentage revenue loss = (D(3W-D)/2W2)*100%

• Try some examples

On-time Delayedentry entry

Peak revenue

Peak revenue from delayed entry

Market rise

Market fall

W 2W

Time

D

On-time

Delayed

Rev

enue

s ($

)

– Lifetime 2W=52 wks, delay D=4 wks

– (4*(3*26 –4)/2*26^2) = 22%– Lifetime 2W=52 wks, delay D=10

wks– (10*(3*26 –10)/2*26^2) = 50%– Delays are costly!

Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc.2002)

21© Krithi Ramamritham / Kavi Arya

Trends• Moore’s Law

– IC transistor capacity doubles every 18 mths– 1981: leading edge chip had 10k transistors– 2002: leading edge chip has 150M transistors

• Designer productivity has improved due to better tools:– Compilation/Synthesis tools– Libraries/IP– Test/verification tools– Standards – Languages and frameworks (Handel-C, Lava, Esterel, …)– 1981: designer produced 100 transistors per month– 2002 designer produces 5000 transistors per month

22© Krithi Ramamritham / Kavi Arya

Our New Understanding• We have simultaneous optimisations of competing design

metrics: speed, size, power, complexity, etc.

• We need a “Renaissance Engineer”– with holistic view of design process and comfortable with technologies

ranging from hardware, software to formal methods

• Maturation of behavioral synthesis tools and other tools has enabled this kind of unified view of hardware/ software co-design.

• Design efforts now focus at higher levels of abstraction => abstract specifications now refined into programs and then into gates and logic.

• There is no fundamental difference of between what hardware and software can implement.

23© Krithi Ramamritham / Kavi Arya

Designer Productivity• “The Mythical Man Month” by Frederick Brooks ’75

• More designers on team => lower productivity because of increasing communication costs between groups

• Consider 1M transistor project:- Say, a designer has productivity of 5000 transistor/mth- Each extra designer => decrease of 100 transistor/mth

productivity in group due to comm. costs

– 1 designer 1M/5000 = 200mth– 10 designer 1M/(10*4100) = 24.3mth– 25 designer 1M/(25*2600) = 15.3mth– 27 designer 1M/(27*2400) = 15.4mth

• Need new design technology to shrink the design gap

Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc.2002)

24© Krithi Ramamritham / Kavi Arya

Design Productivity Gap• Designer productivity has grown over the last decade• Rate of improvement has not kept pace with the chip-

capacity growth• 1981: leading edge chip:

– 100 designers * 100 trans/mth => 10k trans complexity• 2002: leading edge chip:

– 30k designer mth * 5k trans/mth => 150M trans complexity• Designers at avg. of $10k pm

=> cost of building leading edge chips gone from $1M in 1981 to $300M in 2002

• Need paradigm shift to cope with the complexities of system design

25© Krithi Ramamritham / Kavi Arya

Plan

• Embedded Systems– Introduction– Application Examples

• New Approaches to building ESW– New paradigms: Lava, Handel-C– Examples + “Engineering Returns to Software”– Build a RISC processor in 48hrs– Advantages of reconfigurable hardware.

• Real-time support for ESW

26© Krithi Ramamritham / Kavi Arya

Embedded Applications

They are everywhere!

• wristwatches, washing machines,• microwave ovens, • elevators, mobile telephones, • printers, FAX machines, • telephone exchanges, • automobiles, aircrafts

27© Krithi Ramamritham / Kavi Arya

Embedded Apps

• A modern home– has one general purpose desktop PC – but has several embedded systems.

• More prevalent in industrial sectors – Dozens of embedded computers in modern

automobiles – chemical and nuclear power plants

28© Krithi Ramamritham / Kavi Arya

Embedded Applications

An embedded system typically has a digital signal processor and a variety of I/O devices connected to sensors and actuators.

Computer (controller) is surrounded by other subsystems, sensors and actuators

Computer -- Controller's function is :• to monitor parameters of physical processes

of its surrounding system • to control these processes whenever needed.

29© Krithi Ramamritham / Kavi Arya

Simple Examples

A simple thermostat controller• periodically reads the temperature of the

chamber • switches on or off the cooling system.

a pacemaker• constantly monitors the heart• paces the heart when heart beats are missed

30© Krithi Ramamritham / Kavi Arya

Open loop temperature control

Closed loop temperature control

31© Krithi Ramamritham / Kavi Arya

Feedback Control

Feedforward Control

32© Krithi Ramamritham / Kavi Arya

Example: Elevator Controller

33© Krithi Ramamritham / Kavi Arya

Remote Camera-based Survelliance

• Observers and the observed sites connected through a network.

• Input from sites displayed at observers' end at regular intervals.

• Need: System should capture, process and transmit images at regular intervals, predictably

34© Krithi Ramamritham / Kavi Arya

When there is an alarm

• Observer redirects one or more cameras to zoom in on to a specific part of a site.

• Sends commands with the necessary pan/tilt/zoom parameters across the network.

• Cameras retarget their views within bounded time and start transmitting as before, scenes from the chosen location.

35© Krithi Ramamritham / Kavi Arya

What do we need?

• timely transmission of user needs from observer to camera.• camera platform retargeting the camera within bounded

time.• camera capturing images at regular intervals• images sent to observers predictably across the network

36© Krithi Ramamritham / Kavi Arya

Functional Design & Mapping

HW1 HW2 HW3 HW4Hardware Interface

RTOS/Drivers

Thr

eadArchitectural

Design

F1F2

F3

F4

F5Functional

Design

(F3) (F4)

(F5)

(F2)

Source:Source:Ian Phillips, ARMIan Phillips, ARMVSIA 2001

Source:Source:Ian Phillips, ARMIan Phillips, ARMVSIA 2001

37© Krithi Ramamritham / Kavi Arya

Examples of Embedded Systems

We will look at the details of

• A simple Digital Camera• Digital Flight Control• Plastic Injection Molding

What the future holds… e.g., automotive electronics

38© Krithi Ramamritham / Kavi Arya

Digital camera…

• Only recently possible– Systems-on-a-chip

• Multiple processors and memories on one IC– High-capacity flash memory

Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc.2002)

39© Krithi Ramamritham / Kavi Arya

Designer’s perspective: two key tasks

• Processing images and storing in memory• When shutter pressed:

– Image captured– Converted to digital form by charge-coupled

device (CCD)– Compressed and archived in internal memory

• Uploading images to PC• Digital camera attached to PC• Special software commands camera to transmit archived

images serially

40© Krithi Ramamritham / Kavi Arya

Compression• Store more images• Transmit image to PC in less time• JPEG (Joint Photographic Experts Group)

41© Krithi Ramamritham / Kavi Arya

Requirements Specification• System’s requirements – what system should do

– Nonfunctional requirements• Constraints on design metrics (e.g., “should use

0.001 watt or less”)– Functional requirements

• System’s behavior (e.g., “output X should be input Y times 2”)

– ….

42© Krithi Ramamritham / Kavi Arya

Requirements Specification…Initial specification is general - from marketing dept.• E.g., short document detailing market need for a low-end digital camera that:

– captures and stores at least 50 low-res images and uploads to PC,– costs around $100 with single medium-size IC costing less that

$25,– has long as possible battery life,– expected sales vol. =200,000 if mkt entry < 6 mths– 100,000 if between 6 and 12 months,– insignificant sales beyond 12 months

43© Krithi Ramamritham / Kavi Arya

Nonfunctional requirements

• Design metrics of importance based on initial specification– Performance: time required to process image– Size: number of elementary logic gates (2-input NAND gate)

in IC– Power: measure of avg. electrical energy consumed while

processing– Energy: battery lifetime (power x time)

44© Krithi Ramamritham / Kavi Arya

Nonfunctional requirements…

• Constrained metrics– Values must be below (sometimes above) certain threshold

• Optimization metrics– Improved as much as possible to improve product

• Metric can be both constrained and optimization

45© Krithi Ramamritham / Kavi Arya

Nonfunctional requirements…

• Power– Must operate below certain temperature (cooling fan not

possible)– Therefore, constrained metric

• Energy– Reducing power or time reduces energy– Optimized metric: want battery to last as long as possible

46© Krithi Ramamritham / Kavi Arya

Nonfunctional requirements…• Performance

– Must process image fast enough to be useful– 1 sec reasonable constraint

• Slower would be annoying• Faster not necessary for low-end of market

– Therefore, constrained metric• Size

– Must use IC that fits in reasonably sized camera– Constrained and optimization metric

• Constraint may be 200,000 gates, but smaller would be cheaper

47© Krithi Ramamritham / Kavi Arya

Informal functional specification• Flowchart breaks functionality

down into simpler functions• Each function’s details

described in English• Low quality image has

resolution of 64 x 64• Mapping functions to a

particular processor type not done at this stage serial output

e.g., 011010...

yes no

CCDinput

Zero-bias adjust

DCT

Quantize

Archive in memory

More 8×8

blocks?

Transmit serially

yes

no Done?

48© Krithi Ramamritham / Kavi Arya

Informal functional specification

serial outpute.g., 011010...

yes no

CCDinput

Zero-bias adjust

DCT

Quantize

Archive in memory

More 8×8

blocks?

Transmit serially

yes

noDone

?

49© Krithi Ramamritham / Kavi Arya

Refined functional specification

• Refine informal specification into one that can actually be executed

• Can use C-like code to describe each function– Called system-level model,

prototype, or simply model– Also is first implementation

Image file

101011010110101010010101101...

CCD.C

CNTRL.C

UART.C

output file

1010101010101010101010101010...

CODEC.C

CCDPP.C

Executable model of digital camera

50© Krithi Ramamritham / Kavi Arya

Design

• Determine system’s architecture– Processors

• Any combination of single-purpose (custom or standard) or general-purpose processors

– Memories, buses• Map functionality to that architecture

– Multiple functions on one processor– One function on one or more processors

51© Krithi Ramamritham / Kavi Arya

Design..• Implementation

– A particular architecture and mapping– Solution space is set of all implementations

• Starting point– Low-end general-purpose processor connected to flash memory

• All functionality mapped to software running on processor• Usually satisfies power, size, time-to-market constraints• If timing constraint not satisfied then try:

– use single-purpose processors for time-critical functions– rewrite functional specification

52© Krithi Ramamritham / Kavi Arya

Implementation 1: Microcontroller alone

• Low-end processor could be Intel 8051 microcontroller• Total IC cost including NRE about $5• Well below 200 mW power• Time-to-market about 3 months• However…

53© Krithi Ramamritham / Kavi Arya

Implementation 1: Microcontroller alone…

• However, one image per second not possible– 12 MHz, 12 cycles per instruction

• Executes one million instructions per second

– CcdppCapture has nested loops resulting in 4096 (64 x 64) iterations

• ~100 assembly instructions each iteration• 409,000 (4096 x 100) instructions per image• Half of budget for reading image alone

– Would be over budget after adding compute-intensive DCT and Huffman encoding

54© Krithi Ramamritham / Kavi Arya

Implementation 2: Microcontroller and CCDPP

8051

UARTCCDPP

RAMEEPROM

SOC

55© Krithi Ramamritham / Kavi Arya

Implementation 2: Microcontroller and CCDPP

• CCDPP function on custom single-purpose processor– Improves performance – less microcontroller cycles– Increases NRE cost and time-to-market– Easy to implement: Simple datapath, Few states in controller

• Simple UART easy to implement as single-purpose processor also• EEPROM for program memory and RAM for data memory added as well

8051

UART CCDPP

RAMEEPROM

SOC

56© Krithi Ramamritham / Kavi Arya

Microcontroller

• Synthesizable version of Intel 8051 available– Written in VHDL – Captured at register transfer level (RTL)

• Fetches instruction from ROM• Decodes using Instruction Decoder• ALU executes arithmetic operations

– Source and destination registers reside in RAM• Special data movement instructions used to load and store

externally• Special program generates VHDL description of ROM from output

of C compiler/linker

To External Memory Bus

Controller

4K ROM

128RAM

Instruction Decoder

ALU

Block diagram of Intel 8051 processor core

57© Krithi Ramamritham / Kavi Arya

Implementation 2: Microcontroller and CCDPP

• Analysis of implementation 2– Total execution time for processing one image:

• 9.1 seconds– Power consumption:

• 0.033 watt– Energy consumption:

• 0.30 joule (9.1 s x 0.033 watt)– Total chip area:

• 98,000 gates

58© Krithi Ramamritham / Kavi Arya

Implementation 3: Microcontroller and

CCDPP/Fixed-Point DCT• 9.1 seconds still doesn’t meet performance constraint of 1

second• DCT operation prime candidate for improvement

– Execution of implementation 2 shows microprocessor spends most cycles here

– Could design custom hardware like we did for CCDPP• More complex so more design effort

– Instead, will speed up DCT functionality by modifying behavior

59© Krithi Ramamritham / Kavi Arya

DCT floating-point cost• Floating-point cost

– DCT uses ~260 floating-point operations per pixel transformation– 4096 (64 x 64) pixels per image– 1 million floating-point operations per image– No floating-point support with Intel 8051

• Compiler must emulate– Generates procedures for each floating-point operation:

mult, add– Each procedure uses tens of integer operations

– Thus, > 10 million integer operations per image– Procedures increase code size

• Fixed-point arithmetic can improve on this

60© Krithi Ramamritham / Kavi Arya

Implementation 3: Microcontroller and

CCDPP/Fixed-Point DCT• Analysis of implementation 3

– Use same analysis techniques as implementation 2– Total execution time for processing one image:

• 1.5 seconds– Power consumption:

• 0.033 watt (same as 2)– Energy consumption:

• 0.050 joule (1.5 s x 0.033 watt)• Battery life 6x longer!!

– Total chip area: • 90,000 gates• 8,000 less gates (less memory needed for code)

61© Krithi Ramamritham / Kavi Arya

Implementation 4:Microcontroller and CCDPP/DCT

• Performance close but not good enough• Must resort to implementing CODEC in hardware

– Single-purpose processor to perform DCT on 8 x 8 block

8051

UART CCDPP

RAMEEPROM

SOCCODEC

62© Krithi Ramamritham / Kavi Arya

Implementation 4:Microcontroller and CCDPP/DCT

• Analysis of implementation 4– Total execution time for processing one image:

• 0.099 seconds (well under 1 sec)– Power consumption:

• 0.040 watt• Increase over 2 and 3 because SOC has another processor

– Energy consumption: • 0.00040 joule (0.099 s x 0.040 watt)• Battery life 12x longer than previous implementation!!

– Total chip area: • 128,000 gates, significant increase over previous implementations

63© Krithi Ramamritham / Kavi Arya

Digital Camera -- Summary

• Digital camera example– Specifications in English and executable language– Design metrics: performance, power and area

• Several implementations– Microcontroller: too slow– Microcontroller and coprocessor: better, but still too slow– Fixed-point arithmetic: almost fast enough– Additional coprocessor for compression: fast enough, but expensive and

hard to design– Tradeoffs between hw/sw

64© Krithi Ramamritham / Kavi Arya

Summary of implementation

s

• Implementation 3 Close performanceCheaperLess time to build

• Implementation 4– Great performance and energy consumption– More expensive and may miss time-to-market window

• If DCT designed ourselves then increased NRE cost and time-to-market• If existing DCT purchased then increased IC cost

• Which is better?

Implementation 2 Implementation 3 Implementation 4 Performance (second) 9.1 1.5 0.099 Power (watt) 0.033 0.033 0.040 Size (gate) 98,000 90,000 128,000 Energy (joule) 0.30 0.050 0.0040

65© Krithi Ramamritham / Kavi Arya

CLIENT - pilot SERVER - simulator

2. Flight Simulator

Constraints on responses to pilot inputs, aircraft state updates

66© Krithi Ramamritham / Kavi Arya

CLIENT SERVER

Time Periods to meet Timing Requirements

Requirement

Choice Made

Rationale

Continuous pilot inputs should be polled at rates greater than 16 ms

The time period of the writer on Client should be less than 16 ms

The writer thread on the Client polls for the pilot inputs from the joystick

67© Krithi Ramamritham / Kavi Arya

CLIENT SERVER

Time Periods to meet Timing Requirements…

Requirement

Choice Made

Rationale

The state of the aircraft is to be advanced at 12.5 ms time steps

The time period of the Flight Dynamics thread on the Server is 12.5 ms

The flight dynamics thread on the Server advances the state of the system

68© Krithi Ramamritham / Kavi Arya

Time Periods to meet Timing Requirements…

Requirement

Choice Made

Rationale

Response time for pilots should be less than 150 ms for commercial aircrafts and 100 ms for fighter aircrafts

Reader and Writer threads on Server, and the Reader thread on the Client should be as fast as the system permits. (Time period of 4ms in our case)

• Delay in data transfer at these threads increases the response time• These threads should be interrupt driven in order to minimize the response time

69© Krithi Ramamritham / Kavi Arya

Example: Injection Molding

–Keep plastic at proper temperature (liquid, not boiling)–Control injector solenoid (make sure that the motion of the solenoid terminates before the piston reaches the end of its travel.

Source: “Laboratory for Perceptual Robotics, UMass” Copyright 1996 by Roderic A. Grupen

70© Krithi Ramamritham / Kavi Arya

Controlling a reaction

• we know:– if temperature too high, it explodes– maximum rate of temperature increase– rate of cooling

• events:– temperature change– temperature > safe threshold

• we can derive:– how often we have to check temperature– when we have to finish cooling

71© Krithi Ramamritham / Kavi Arya

Example – Injection Molding (cont.)– Timing constraints

72© Krithi Ramamritham / Kavi Arya

Example – Injection Molding (cont.)

– Concurrent control tasks

73© Krithi Ramamritham / Kavi Arya

Examples of Embedded Systems

We looked at details of

• A simple Digital Camera• Digital Flight Control• Plastic Injection Molding

The world gets exciting… e.g. Automotive electronics

74© Krithi Ramamritham / Kavi Arya

Automotive Electronics

75© Krithi Ramamritham / Kavi Arya

Cruise Control

• Controls car speed • Actuates the throttle valve by

a cable connected to an actuator, instead of by pressing a pedal.

• The throttle valve controls the power and speed of the engine by limiting how much air the engine takes in .

76© Krithi Ramamritham / Kavi Arya

Control Architecture for Cruise Control

77© Krithi Ramamritham / Kavi Arya

State Machine for Activation

78© Krithi Ramamritham / Kavi Arya

Adaptive Cruise Control with Driver Alert

• Helps to reduce the need for drivers to manually adjust speed or disengage cruise control when encountering Slower traffic.

• Automatically manages vehicle speed to maintain a distance set by the driver.

• Alerts drivers when slower traffic is detected in the path. • Audible and visual alerts warn the driver when braking is necessary to avoid

slower moving vehicles ahead. • Drivers can adjust system sensitivity to their preferred driving style.

79© Krithi Ramamritham / Kavi Arya

Web Servers… get smaller

80© Krithi Ramamritham / Kavi Arya

iPic : Tiny Web-Server

2mm*2mm,

PIC 12c508

512b ROM, 24b RAM, 6bits IO, 4MHz RC

81© Krithi Ramamritham / Kavi Arya

Plan

• Embedded Systems

• New Approaches to building ESW– New paradigms: Lava, Handel-C– Examples + “Engineering Returns to Software”– Build a RISC processor in 48hrs– Advantages of reconfigurable hardware.

• Real-time support for ESW

82© Krithi Ramamritham / Kavi Arya

Lava

• Not so much a hardware description language

• More a style of circuit description

• Emphasises connection patterns

• Think of Lego

83© Krithi Ramamritham / Kavi Arya

Lava

• Mary Sheeran, Koen Classen, & Satnam SinghChalmers University (Sweden)

• Based on earlier work on MuFP to describe circuit functionality and layout in single language

• Built using functional programming paradigm

84© Krithi Ramamritham / Kavi Arya

Behaviour and Structure

f g

gf

f ->- g

85© Krithi Ramamritham / Kavi Arya

Lava Properties• Higher-order functions

– Circuits are functions– May be passed as arguments to other functions. – => Easier to produce parameterized circuits than with VHDL.

• Functions can return circuits as results– Circuit combinators take circuits as arguments, return circuits as results. – => Powerful glue for composing circuits to form larger systems.

• Circuit combinators combine behavior + layout– Combinators lay out circuits in rows, columns, triangles, trees etc.

• Performance of circuit – Improved by exploring the layout design space by experimenting with alternative

layout combinators. • Examples of circuits produced:

– High speed constant coefficient multipliers, finite impulse response filters (1D and 2D), adder tree networks and sorting butterfly networks.

86© Krithi Ramamritham / Kavi Arya

Parallel Connection Patterns

f -|- g

g

f

87© Krithi Ramamritham / Kavi Arya

map f

f

f

f

f

88© Krithi Ramamritham / Kavi Arya

Four Sided Tiles

89© Krithi Ramamritham / Kavi Arya

Column

90© Krithi Ramamritham / Kavi Arya

Full Adder

fa

fa (cin, (a,b)) = (sum, cout) where part_sum = xor (a, b) sum = xorcy (part_sum, cin) cout = muxcy (part_sum, (a, cin))

a

b

cin

cout

sum

91© Krithi Ramamritham / Kavi Arya

Generic Adder

fa

fa

fa adder = col fa

92© Krithi Ramamritham / Kavi Arya

Top Level

adder16Circuit = do a <- inputVec ”a” (bit_vector 15 downto 0) b <- inputVec ”b” (bit_vector 15 downto 0) (s, carry) <- adder4 (a, b) sum <- outputVec ”sum” s (bit_vector 16 downto 0)

? circuit2VHDL ”add16” adder16Circuit? circuit2EDIF ”add16” adder16Circuit? circuit2Verilog ”add16” adder16Circuit

93© Krithi Ramamritham / Kavi Arya

Xilinx FPGA Implementation

• 16-bit implementation on a XCV300 FPGA• Vertical layout required to exploit fast carry chain• No need to specify coordinates in HDL code

94© Krithi Ramamritham / Kavi Arya

16-bit Adder Layout

Source: Mary Sheeran Nov.2002

95© Krithi Ramamritham / Kavi Arya

Four adder trees

Source: Mary Sheeran Nov.2002

96© Krithi Ramamritham / Kavi Arya

No Layout Information

Source: Mary Sheeran Nov.2002

97© Krithi Ramamritham / Kavi Arya

Plan

• Embedded Systems

• New Approaches to building ESW– New paradigms: Lava, Handel-C– Examples + “Engineering Returns to Software”– Build a RISC processor in 48hrs– Advantages of reconfigurable hardware.

• Real-time support for ESW

98© Krithi Ramamritham / Kavi Arya

Handel-C

• Programming language- enables compilation of programs into synchronous hardware

• NOT Hardware Description Language- it’s a prog. language aimed at compiling high-level algorithms into gate-level hardware

• Syntax (loosely) based on “C”

• Handel-C is to hardware (gates) what “C” is to micro-assembly code

99© Krithi Ramamritham / Kavi Arya

Handel-C (cont.)

• Inventor - Ian Page, Programming Research Group (Oxford University/UK)

• Semantics based on Hoare’s Communication Seq. Processes (CSP) model &

• Occam: transputer prog. language

• Industry heavyweights using tools: Marconi, Ericcson, BAe, Creative Labs, etc.

100© Krithi Ramamritham / Kavi Arya

What this means

• Hardware design produced is exactly the hardware specified in source program

• No intermediate “interpreting” layer as in assembly language targeting general purpose microprocessor

• Logic gates are assembly instructions of Handel-C system

• Design/re-design/optimise at software level!!!

101© Krithi Ramamritham / Kavi Arya

What This Means

• True parallelism– not time-shared (interpreted) parallelism of gen.purpose

computers

• PAR {a;b}– instructions executed in // at same instant of time by 2 sep.

pcs of hw

• Timing– branches that complete early forced to wait for slowest

branch before continuing

102© Krithi Ramamritham / Kavi Arya

Comparison with “C”

• Similar:- Programs inherently sequential- Similar control-flow constructs: if-then-else, switch, while, for, etc.

• Dissimilar :- No malloc/ dynamic store allocation- No recursion (limited rec. in macros)- No nested procedures- No stdin/stdout - “Void main()”- variable width words- PAR, etc.

103© Krithi Ramamritham / Kavi Arya

Handel-C is based on

• ANSI-standard C without external library-functions:

– I/O functions: printf(), putc(), scanf(),...– File functions: fopen(), fclose(), fprintf(), ...– String-functions: length(), strcpy(), strcmp(),…– Math-functions: sin(), cos(), sqrt(),…– ...

104© Krithi Ramamritham / Kavi Arya

Supported declarationsstatements & instructions:

• Main program structure

• Variables• Arrays• Switch statement• FOR Loop• Comments• Constants• Scope & Variable sharing• Arithmetic, Relational, Relational Logic ops• Conditional Execution• While loop• Do … While Loop

105© Krithi Ramamritham / Kavi Arya

Channel Communication

• link!v … link?v– channel input is form of assignment

• Provides link between parallel (‘//’) branches– One // branch outputs data onto channel– Other // branch reads data from channel

• => Synchronisation– data transfers only when both processes are ready

106© Krithi Ramamritham / Kavi Arya

Additional Features & Statements

• Channelunsigned int 8 a;

chan unsigned int 8 c;

c ! 5;

c ? A;

107© Krithi Ramamritham / Kavi Arya

Additional Features & Statements

• Prialt

prialt

{

case CommsStatement:

Statement

break;

...

default:

Statement

break;

}

A!1 C ?x

B!2 D?y

A?u

B?v D!9

C !8

108© Krithi Ramamritham / Kavi Arya

Example 1 (sum)Void main()

{ unsigned int 16 sum; // variable width wordunsigned int 8 data;chanin input; // input/outputchanout output;

sum=0;do{ input?data;

sum = sum + (0@data);} while (data!=0);output!sum;

}

IMPORTANT – width!!

109© Krithi Ramamritham / Kavi Arya

Example 2 (divider) #define DATA_WIDTH 16Void main(void)

{ unsigned int DATA_WIDTH a, mult, result;

unsigned int (DATA_WIDTH*2 -1) b;chanin input;chanout output;

while (1){ input?a;

input?result;b = result @ 0;mult = 1<< (DATA_WIDTH-1)result = 0;<<<<< MAIN LOOP >>>>>output ! Result;

}}

result = integer(a / b)

110© Krithi Ramamritham / Kavi Arya

Example 2 (cont.)

while (mult != 0){

if (0 @ a) >= b)par { a -= b <- width(a);

result != mult;}

par{ b = b >> 1;

mult = mult >> 1;}

}

111© Krithi Ramamritham / Kavi Arya

Example 3 Void main(void){

chan unsigned int undefined link[2];chanin unsigned int 8 input;chanout unsigned int 8 outputunsigned int undefined state[3];par{ while (1) // first queue location{ input ? State[0];link[0] ! State[0];}while (1) // second queue location { link[0] ? State[1];link[1] ! State[1];} while (1) // third queue location{ link[1] ? State[2];output ! State[2];}}

}

State[0] State[1] State[2]

Parallel tasksComm between tasksArray of variablesArray of channelsParameterised on width

input outputLink[0] Link[1]

112© Krithi Ramamritham / Kavi Arya

Additional Features & Statements

• Timing

An assignment statement takes exactly one clock cycle to execute. Everything else is free

void main(void){

unsigned 8 x, y;…x = x + y;

}

113© Krithi Ramamritham / Kavi Arya

Timing/efficiency issues • One clock source for entire program

- Assignment & delay take one clock cycle- Expressions are “for free”

• Handel-C designed such that experienced programmer can immediately tell which instructions execute on which clock cycles

• Examplex = y;x = (((y*z) + (w*v) )<<2)<-7;

both statements take one clock cycle

• Clock at longest logic depth=> reduce the depth of logic to speed up program=> pipelining

114© Krithi Ramamritham / Kavi Arya

Porting “C” to Handel-C • Decide how software maps to hardware platform• Partition algorithm between multiple FPGAs• Port C to Handel-C & use simulator to check correctness• Modify code to take advantage of extra operators in

Handel-C - simulate to ensure correctness• Add fine-grain parallelism through PAR & parallel

assignments or parallellise algorithm - simulate• Add hardware interfaces for target architecture & map

simulator channels communications onto these interfaces - simulate

• Use FPGA place & route tools to generate FPGA images

115© Krithi Ramamritham / Kavi Arya

Design Flow Overview

Port algorithm to Handel-C

Compile program to .net file

for simulator

Use simulator to evaluateand debug design

Add interfaces to external hardware

Use Handel-C compiler to target h/w netlist

Use FPGA tools toplace & route netlist

Program FPGA withresult of place & route

Modify/debug program

116© Krithi Ramamritham / Kavi Arya

Essence • Software approach allows us to rapidly prototype applications

for a given domain

• Handel-C provides a seamless approach toderive expressive and fast implementations from the software level

• Cost of silicon is falling & shortage of trained engineers& high cost of programmer time

=> Software based, high-level approaches to solving problems become increasingly attractive.

117© Krithi Ramamritham / Kavi Arya

Handel-C Concepts (Recap)

• Describes hardware - h/w design produced = h/w in source program

• Logic gates are assembly instructions of Handel-C system

• Real parallelism – not interpreted

• Assignment, delay take 1 clock cycle;Expression evaluation is free

• No side-effectsI.e. a++ is statement (not expression as in ‘C’)

• Variable width words => great performance improvement over softwareMin. datapath widths => minimal h/w usage

118© Krithi Ramamritham / Kavi Arya

Additional Features & Statements

• Concurrency...par{

{}…{ …}

}

119© Krithi Ramamritham / Kavi Arya

Concurrency (example)

void main(void){

unsigned 8 x, y;unsigned 5 temp1;unsigned 4 temp2;...temp1 = (0@(x <- 4)) + (0@(y <- 4));temp2 = (x \\ 4) + (y \\ 4);x = (temp2 + (0@temp1[4])) @ temp1[3:0];

}

120© Krithi Ramamritham / Kavi Arya

Additional Features & Statements

• Concurrency...

par

{

temp1=(0@(x<-4))+(0@(y<-4));

temp2=(x\\4)+(y\\4);

}

x=(temp2+(0@temp1[4]))@temp1[3:0];

...

121© Krithi Ramamritham / Kavi Arya

Features & Statements (contd.)

• Delay...par{

x = 1;{ delay; x=2;}

}

while (x == 0) delay;

Delay

X=2

X=1

Statement

122© Krithi Ramamritham / Kavi Arya

Additional Features & Statements

• Channelunsigned int 8 a;chan unsigned int 8 c;

c ! 5;c ? A;

Single variable must not be accessed by >1 // branch=>par{ out!3;

out!4} // illegal

Statem ent

ChannelC !5 C ?a

123© Krithi Ramamritham / Kavi Arya

Features & Statements(contd.)

• Macros(Examples - contd)

– Combinatorialmacro expr abs(a) = ((a) [width(a)-1] == 0 ? (a) : (-a));

shared expr incwrap(e, m) = (((e==m) ? 0 : (e)+1);

– Recursivemacro expr copy (e, n) = select(n==1, (e), copy(e, n/2) @ copy(e, n-(n/2)))

124© Krithi Ramamritham / Kavi Arya

Features & Statements(contd)

• Operators for Bit Manipulationz = x <- 2; // Take least significant bitsz = y \\ 2; // Drop least significant bitsz = x @ y; // Concatenationz = x[3]; // Bit selectionz = y[2:3]; // Bus selectionz = width(x); // Width of expression

Note: in the form y[m:n] the order is MSB:LSB

Unsigned int 3 y = 4;y[0] is 0;y[2] is 1;

125© Krithi Ramamritham / Kavi Arya

Additional Features & Statements

• External RAM / ROM

ram unsigned int 4 ExtRAM[8] with {offchip = 1,

data = {"P01", "P02", "P03", "P04"},

addr = {"P05", "P06", "P07"},

we = {"P08"}, oe = {"P09"}, cs = {"P10"} };

rom unsigned int 4 ExtROM[8] with {offchip = 1,

data = {"P01", "P02", "P03", "P04"},

addr = {"P05", "P06", "P07"},

we = {}, oe = {"P09"}, cs = {"P10"} };

126© Krithi Ramamritham / Kavi Arya

Additional Features & Statements

• Internal RAM / ROM

ram unsigned int 8 speicher[256];

rom unsigned int 8 program[] = {1,2,3,4};

unsigned char i;

i = 3;

speicher[i] = 25;

for (i = 0; i < 4; i++) stdout ! program[i];

127© Krithi Ramamritham / Kavi Arya

Recursive Macro Expressions – Example

• Illustrates the generation of large quantities of hardware from simple macros.

• Multiplier whose width depends on the parameters of the macro.

• Starting point for generating large regular hardware structures using macros.

• Single-cycle long multiplication from single macro:

macro expr multiply(x, y) =select(width(x) == 0,

0, multiply(x \\ 1, y << 1) + (x[0] == 1 ? y : 0));

a = multiply (b , c);

128© Krithi Ramamritham / Kavi Arya

Timing

129© Krithi Ramamritham / Kavi Arya

Additional Features & Statements• Off-Chip Interface

– Input, registered Input, latched Input– Output– Tristate Bus

• Off-Chip Interface (examples)

interface bus_in (int 4) InBus() with{data = {"P1", "P2", "P3", "P4"} };

int 4 x;x = InBus.in;

interface bus_out () OutBus (x+y) with{data = {"P11", "P12", "P13", "P14"} };

130© Krithi Ramamritham / Kavi Arya

Parallel Access to Variables• Rules of parallelism:

same variable must not be accessed from two separate parallel branches. (to avoid resource conflicts on the variables)

• Actually, the same variable must not be assigned to more than once on the same clock cycle but may be read as often as required (see wires!)

• Allows some useful and powerful programming techniques. eg:par{

a = b;b = a;

} // swaps values of a and b in single clock cycle.

131© Krithi Ramamritham / Kavi Arya

Parallel Access to Variables• Four place queue:

while(1){

par{ int x[3];x[0] = in;x[1] = x[0];x[2] = x[1]; // values at “out” delayed out = x[2]; // by 4 clock cycles }

}

132© Krithi Ramamritham / Kavi Arya

Time Efficiency of Handel-C Hardware

• Requirement:Clock period for program to be longer than longest path thru combinatorial logic in whole program.

• => once FPGA place and route is done, max. clock-rate = 1/longest-path-delay

• Example:FPGA place and route tools calculate longest path delay between flip-flops in a design is 70nS.

• The max. clock rate is 1/70nS = 14.3MHz.Speed allowed by system: 400kHz - 100MHz

• BUT WHAT IF THIS IS NOT FAST ENOUGH

133© Krithi Ramamritham / Kavi Arya

Improving Time Efficiency

• Reducing Logic DepthAvoid multiplication, avoid wide-adders, reduce complex expressions into stages, etc. unsigned 8 x;

unsigned 8 y;

unsigned 5 temp1;

unsigned 4 temp2;

par

{

temp1 = (0@(x<-4)) + (0@(y<-4));

temp2 = (x \\ 4) + (y \\ 4);

}

x = (temp2+(0@temp1[4])) @ temp1[3:0];Pipelining => increased latency for higher throughput

136© Krithi Ramamritham / Kavi Arya

Plan

• Embedded Systems

• New Approaches to building ESW– New paradigms: Lava, Handel-C– Examples (“Engineering Returns to Software”– Build a RISC processor in 48hrs– Advantages of reconfigurable hardware.

• Real-time support for ESW

137© Krithi Ramamritham / Kavi Arya

RISC-Processor • Features:

– 16 instructions– 4 bit I/O Ports– one accumulator– Program memory (16x8 ROM)– Data memory (16x4 RAM)

• Problem:Execute a program stored in ROM to calculate the first few members of the Fibonacci number sequence.

1, 2, 3, 5, 8, 13, 21, 34, …

fib(n) = 1 if n=0 V n=1fib(n) = fib(n-1) + fib(n-2) if n>=2

138© Krithi Ramamritham / Kavi Arya

RISC-Processor

• Instruction Set

139© Krithi Ramamritham / Kavi Arya

RISC-Processor (cont.) • Program:

chanin input;chanout output;

// Parameterisation#define dw 32 /* Data width */#define opcw 4 /* Op-code width */#define oprw 4 /* Operand width */#define rom_aw 4 /* Width of ROM address bus */#define ram_aw 4 /* Width of RAM address bus */

// The opcodes#define HALT 0#define LOAD 1#define LOADI 2#define STORE 3#define ADD 4#define SUB 5#define JUMP 6#define JUMPNZ 7#define INPUT 8#define OUTPUT 9

// The assembler macro#define _asm_(opc, opr) (opc + (opr << opcw))

142© Krithi Ramamritham / Kavi Arya

RISC-Processor (cont.) • Program (cont):

// Rom program datarom unsigned int undefined program[] ={_asm_(LOADI, 1), /* 0 */ /* Get a one */_asm_(STORE, 3), /* 1 */ /* Store this */_asm_(STORE, 1), /* 2 */_asm_(INPUT, 0), /* 3 */ /* Read value from user */_asm_(STORE, 2), /* 4 */ /* Store this */_asm_(LOAD, 1), /* 5 */ /* Loop entry point */_asm_(ADD, 0), /* 6 */ /* Make a fib number */_asm_(STORE, 0), /* 7 */ /* Store it */_asm_(OUTPUT, 0), /* 8 */ /* Output it */_asm_(ADD, 1), /* 9 */ /* Make a fib number */_asm_(STORE, 1), /* a */ /* Store it */_asm_(OUTPUT, 0), /* b */ /* Output it */_asm_(LOAD, 2), /* c */ /* Decrement counter */_asm_(SUB, 3), /* d */_asm_(JUMPNZ, 4), /* e */ /* Repeat if not zero */_asm_(HALT, 0) /* f */};

143© Krithi Ramamritham / Kavi Arya

RISC-Processor (cont.) • Program (cont):

/* RAM for processor */ram unsigned int dw data[1 << ram_aw];

/* Processor registers */unsigned int rom_aw pc; /* Program counter */unsigned int (opcw+oprw) ir; /* Instruction register */unsigned int dw x; /* Accumulator */

/* Macros to extract opcode and operand fields */#define opcode (ir <- opcw)#define operand (ir \\ opcw)

144© Krithi Ramamritham / Kavi Arya

RISC-Processor (cont.) • Program (cont):

/* Main program */void main(void){

pc = 0;// Processor loopdo{

// fetchpar{

ir = program[pc];pc = pc + 1;

}/* === MAIN DECODE/EXECUTE ===*/

} while (opcode != HALT);} /* main program */

145© Krithi Ramamritham / Kavi Arya

RISC-Processor (cont.) • Program (cont):

// decode and executeswitch (opcode){

case LOAD : x = data[operand<-ram_aw]; break;case LOADI : x = 0 @ operand; break;case STORE : data[operand<-ram_aw] = x; break;case ADD : x = x+data[operand<-ram_aw]; break;case SUB : x = x-data[operand<-ram_aw]; break;case JUMP : pc = operand<-rom_aw; break;case JUMPNZ : if (x!=0) pc=operand<-rom_aw; break;case INPUT : input ? x; break;case OUTPUT : output ! x; break;default : while(1) delay; // unknown opcode

}

146© Krithi Ramamritham / Kavi Arya

RISC-Processor (cont.) • The Final Program! (Don’t worry if you can’t read it - fits on a page!!)

147© Krithi Ramamritham / Kavi Arya

Simulation & debugging

• The simulator is integrated into the compiler.• Executing a cycle-based simulation.• Variables are traceable at any clock cycle.• Port interface will be replaced by standard I/O.• Handel-C simulator supports debugging at any

clock-cycle.• Highlighting of characteristic Values e.g. Area of

any program line.

148© Krithi Ramamritham / Kavi Arya

Some Representative Work • “Customising Graphics Applications:

Techniques & Programming Interface”Henry Styles & Wayne Luk, Proceedings of IEEE Symposium on Field Programmable Custom Computing Machines, IEEE Computer Society Press, 2000.

• Exploit custom data-formats and datapath widthsto optimise graphics operations such as texture mapping & hidden-surface removal.

• Discusses techniques for balancing graphics pipeline

• Customised architectures captured in Handel-Ccompiled for Xilinx Virtex FPGAs

• Handel-C API based on OpenGL standardfor automatic speedup of graphics applications, include Quake-2 action game.

149© Krithi Ramamritham / Kavi Arya

The Graphics Pipeline

150© Krithi Ramamritham / Kavi Arya

Performance Case Studies • Geometric Visualisation

Implementation Medium Clock rate (MHz) Frame rate (FPS)

Cost

Software on PC 400 24 $1,000

Xilinx XCV1000 40 41 $4,000

Nvidia TNT2 Ultra 170 55 $200

Nvidia is a 3-D graphics chipset – I.e. specialised graphics ASICChart => FPGA platform fast approaching performance of

dedicated graphics ASICfor gen. Purpose graphics applications

151© Krithi Ramamritham / Kavi Arya

Performance Case Studies • Infrared Simulation

requires custom pixel format not supported by graphics ASICs

Implementation Medium Clock rate (MHz) Frame rate (FPS)

Cost

Software on PC 400 96 $1,000

Xilinx XCV1000 40 330 $4,000

SGI Onyx2 Reality 180 2750 $180,000

Onyx contains two 180 MHz MIPs processors, two Geometry Engine processors and two rasteriser ASICs, with a memory Bandwidth of 6.4 GB/sec (I.e. 10X cost & mem.b/w of FPGA

152© Krithi Ramamritham / Kavi Arya

Some Observations • FPGA renderer is a low-cost platform for custom graphics

applications

• Development time of a customised FPGA renderer comparable to optimised software=> effective to use a reconfigurable platform

• Good for reconfigurable designs where ASIC is not available or too expensive

• Useful in exploring desirable algorithms and architectures for ASICs

• Hardware renderer may be customised to maximixe performance for each application

153© Krithi Ramamritham / Kavi Arya

Some Features of the Rapid Prototyping Board

• Full length 32 bit PCI card

• Virtex XCV1000: 1.000.000 system gates,

• 131 kBit Block RAM, 393 kBit SelectRAM

• Programmable clock 400 kHz to 100 MHz

• 4 banks of fast asynchronous 32 bit wide SRAM, each 2 Mbytes

• PCI interface: 32 bit, 33 MHz, 132 Mbytes/sec burst

• 2 x PMC sites for VME grade I/O & processing modules

• 50 pin Aux I/O, 8 LEDs

154© Krithi Ramamritham / Kavi Arya

Summary • Cost of silicon is falling

& Products are getting more complex& Time-to-market shrinking rapidly & shortage of trained engineers& cost of programmer time is major constraint

=>Software based, high-level approaches to solving problems become increasingly attractive.

• New generation of languages let us build systems at high level of abstraction.

• High-density FPGAs and SoCs allow complex designs to be rapidly prototyped => reduce the development cycle of new technology – perhaps even to deploy final product as “soft cores”.

• Broader understanding demanded from system designer – need “Renaissance Engineer” with equal understanding of hardware and software.

155© Krithi Ramamritham / Kavi Arya

Plan• Embedded Systems

• New Approaches to building ESW

• Real-Time Support

– Special Characteristics of Real-Time Systems

– Real-Time Constraints

– Canonical Real-Time Applications

– Scheduling in Real-time systems – Operating System Approaches

156© Krithi Ramamritham / Kavi Arya

computer world real worlde.g., PC industrial system, airplane

average response for user, events occur in environment at own speedinteractive

occasionally longer reaction too slow: deadline miss

reaction: user annoyed reaction: damage, pot. loss of human life

computer controls speed of user computer must follow speed of environment

“computer time” “real-time”

What is “real” about real-time?

157© Krithi Ramamritham / Kavi Arya

A real-time system is a system that reacts to events in the environment by performing predefined actions

I/O - data

I/O - data

Real-Time Systems

Real-timecomputing system

event

action

within specified time intervals.

time

158© Krithi Ramamritham / Kavi Arya

CLIENT SERVER

Flight Avionics

Constraints on responses to pilot inputs, aircraft state updates

159© Krithi Ramamritham / Kavi Arya

Constraints:–Keep plastic at proper temperature (liquid, but not boiling)–Control injector solenoid (make sure that the motion of the piston reaches the end of its travel)

160© Krithi Ramamritham / Kavi Arya

Real-Time Systems: Properties of Interest

• Safety: Nothing bad will happen.

• Liveness: Something good will happen.

• Timeliness: Things will happen on time -- by their deadlines, periodically, ....

162© Krithi Ramamritham / Kavi Arya

Performance Metrics in Real-Time Systems

• Beyond minimizing response times and increasing the throughput:

– achieve timeliness.

• More precisely, how well can we predict that deadlines will be met?

163© Krithi Ramamritham / Kavi Arya

Types of RT Systems

Dimensions along which real-time activities can be categorized:• how tight are the deadlines? --deadlines are tight when the laxity

(deadline -- computation time) is small.• how strict are the deadlines? what is the value of executing an

activity after its deadline?• what are the characteristics of the environment? how static or

dynamic must the system be?

Designers want their real-time system to be fast, predictable, reliable, flexible.

164© Krithi Ramamritham / Kavi Arya

deadline (dl)

+

Hard, soft, firm• Hard

result useless or dangerousif deadline exceeded

value

time

-

hardsoft

• Softresult of some - lower -value if deadline exceeded

Deadline intervals:result required not laterand not before

• Firm

If value drops to zero at deadline

165© Krithi Ramamritham / Kavi Arya

Examples

• Hard real time systems– Aircraft– Airport landing services– Nuclear Power Stations– Chemical Plants– Life support systems

• Soft real time systems– Mutlimedia– Interactive video games

166© Krithi Ramamritham / Kavi Arya

Real-Time: Items and Terms

Task– program, perform service, functionality– requires resources, e.g., execution time

Deadline– specified time for completion of, e.g., task– time interval or absolute point in time– value of result may depend on completion time

167© Krithi Ramamritham / Kavi Arya

Plan

• Special Characteristics of Real-Time Systems

• Real-Time Constraints

• Canonical Real-Time Applications

• Scheduling in Real-time systems

• Operating System Approaches

168© Krithi Ramamritham / Kavi Arya

Timing ConstraintsReal-time means to be in time ---

how do we know something is “in time”?how do we express that?

• Timing constraints are used to specify temporal correctnesse.g., “finish assignment by 2pm”, “be at station before train departs”.

• A system is said to be (temporally) feasible, if it meets all specified timing constraints.

• Timing constraints do not come out of thin air:design process identifies events, derives, models, and finally specifies timing constraints

169© Krithi Ramamritham / Kavi Arya

• Periodic– activity occurs repeatedly– e.g., to monitor environment values, temperature, etc.

time

period

periodic

170© Krithi Ramamritham / Kavi Arya

• Aperiodic– can occur any time– no arrival pattern given

time

aperiodicaperiodic

171© Krithi Ramamritham / Kavi Arya

• Sporadic– can occur any time, but– minimum time between arrivals

time

mint

sporadic

172© Krithi Ramamritham / Kavi Arya

Who initiates (triggers) actions?

Example: Chemical process – controlled so that temperature stays below danger level– warning is triggered before danger point …… so that cooling can still occur

Two possibilities:– action whenever temp raises above warn;

event triggered– look every int time intervals; action when temp if measures above warn

time triggered

173© Krithi Ramamritham / Kavi Arya

TT

ET

time

t

174© Krithi Ramamritham / Kavi Arya

TT

ET

time

t

175© Krithi Ramamritham / Kavi Arya

ET vs TT

• Time triggered– Stable number of invocations

• Event triggered– Only invoked when needed– High number of invocation and computation demands if value

changes frequently

177© Krithi Ramamritham / Kavi Arya

Other Issues to worry about• Meet requirements -- some activities may run only:

– after others have completed - precedence constraints– while others are not running - mutual exclusion– within certain times - temporal constraints

• Scheduling– planning of activities, such that required timing is kept

• Allocation– where should a task execute?

178© Krithi Ramamritham / Kavi Arya

Plan

• Special Characteristics of Real-Time Systems

• Real-Time Constraints

• Canonical Real-Time Applications

• Scheduling in Real-time systems

• Operating System Approaches

179© Krithi Ramamritham / Kavi Arya

A Typical Real time system

Temperature sensor

CPU

Memory

Input port

Output portHeater

180© Krithi Ramamritham / Kavi Arya

Code for example

While true do{read temperature sensorif temperature too high then turn off heater else if temperature too low then turn on heater

else nothing}

181© Krithi Ramamritham / Kavi Arya

Comment on code

• Code is by Polling device (temperature sensor)• Code is in form of infinite loop• No other tasks can be executed• Suitable for dedicated system or sub-system only

182© Krithi Ramamritham / Kavi Arya

Extended polling example

Computer

Temperature Sensor 1

Temperature Sensor 2

Temperature Sensor 3

Temperature Sensor 4

Heater 1

Heater 2

Heater 3

Heater 4

Task 1

Task 2

Task 3

Task 4

Conceptual link

183© Krithi Ramamritham / Kavi Arya

Polling

• Problems– Arranging task priorities– Round robin is usual within a priority level– Urgent tasks are delayed

184© Krithi Ramamritham / Kavi Arya

Interrupt driven systems

• Advantages– Fast– Little delay for high priority tasks

• Disadvantages– Programming– Code difficult to debug– Code difficult to maintain

185© Krithi Ramamritham / Kavi Arya

How can we monitor a sensor every 100 ms

Initiate a task T1 to handle the sensor

T1:

Loop

{Do sensor task T2

Schedule T2 for +100 ms

}

Note that the time could be relative (as here) or could be an actual time - there would be slight differences between the methods, due to the additional time to execute the code.

186© Krithi Ramamritham / Kavi Arya

An alternative…

Initiate a task to handle the sensor T1

T1:

Do sensor task T2

Repeat

{Schedule T2 for n * 100 ms

n:=n+1}

There are some subtleties here...

187© Krithi Ramamritham / Kavi Arya

Clock, interrupts, tasks

Clock ProcessorInterrupts

Task 1 Task 2 Task 3 Task 4

Job/Task queue

Examines

Tasks schedule events using the clock...

188© Krithi Ramamritham / Kavi Arya

Plan• Special Characteristics of Real-Time Systems

• Real-Time Constraints

• Canonical Real-Time Applications

• Scheduling in Real-time systems

• Operating System Approaches

189© Krithi Ramamritham / Kavi Arya

Why is scheduling important?

Definition:

A real-time system is a system that reacts to events in the environment by performing predefined actions within specified time intervals.

Real-timecomputing system

time

I/O - data

I/O - data

event

action

190© Krithi Ramamritham / Kavi Arya

Schedulability analysis

a.k.a. feasibility checking:

check whether tasks will meet their

timing constraints.

191© Krithi Ramamritham / Kavi Arya

Scheduling Paradigms

Four scheduling paradigms emerge, depending on• whether a system performs schedulability

analysis• if it does,

– whether it is done statically or dynamically – whether the result of the analysis itself produces

a schedule or plan according to which tasks are dispatched at run-time.

192© Krithi Ramamritham / Kavi Arya

1. Static Table-Driven Approaches

• Perform static schedulability analysis by checking if a schedule is derivable.

• The resulting schedule (table) identifies the start times of each task.

• Applicable to tasks that are periodic (or have been transformed into periodic tasks by well known techniques).

• This is highly predictable but, highly inflexible.

• Any change to the tasks and their characteristics may require a complete overhaul of the table.

193© Krithi Ramamritham / Kavi Arya

2. Static Priority Driven Preemptive Approaches

• Tasks have -- systematically assigned -- static priorities.• Priorities take timing constraints into account:

– e.g. RMA: Rate-Monotonic ---- the lower the period, the higher the priority.– e.g. EDF: Earliest-deadline-first --- the earlier the deadline, the higher the priority.

• Perform static schedulability analysis but no explicit schedule is constructed– RMA - Sum of task Utilizations <= ln 2. – EDF - Sum of task Utilizations <= 1

• At run-time, tasks are executed highest-priority-first, with preemptive-resume policy.• When resources are used, need to compute worst-case blocking

times.

Task utilization =

computation-time / Period

194© Krithi Ramamritham / Kavi Arya

Static Priorities:Rate Monotonic Analysis

presented by Liu and Layland in 1973

Assumptions• Tasks are periodic with deadline equal to period.

Release time of tasks is the period start time.• Tasks do not suspend themselves• Tasks have bounded execution time• Tasks are independent• Scheduling overhead negligible

195© Krithi Ramamritham / Kavi Arya

RMA: Design Time vs. Run Time

At Design Time:Tasks priorities are assigned according to their periods; shorter period means

higher priority

Schedulability testTaskset is schedulable if

Very simple test, easy to implement.

Run-time The ready task with the highest priority is executed.

C i

T ii1

n

n(21/ n 1)

196© Krithi Ramamritham / Kavi Arya

RMA: Exampletaskset: t1, t2, t3, t4 t1 = (3, 1) t2 = (6, 1) t3 = (5, 1) t4 = (10, 2)

The schedulability test:1/3 + 1/6 + 1/5 + 2/10 ≤ 4 (2(1/4) - 1) ?

0.9 < 0.75 ?

…. not schedulable

197© Krithi Ramamritham / Kavi Arya

RMA…A schedulability test is • Sufficient: there may exist tasksets that fail the test, but are schedulable• Necessary: tasksets that fail are (definitely) not schedulable

The RMA schedulability test is sufficient, but not necessary.

e.g., when periods are harmonic, i.e., multiples of each other, utilization can be 1.

198© Krithi Ramamritham / Kavi Arya

Exact RMAby Joseph and Pandya, based on critical instance analysis

(longest response time of task, when it is released at same time as all higher priority tasks)

What is happening at the critical instance?

• Let T1 be the highest priority task. Its response time

R1 = C1 since it cannot be preempted

• What about T2 ?R2 = C2 + delays due to interruptions by T1.

Since T1 has higher priority, it has shorter period. That means it will interrupt T2 at least once, probably more often. Assume T1 has half the period of T2, R2 = C2 + 2 x C1

199© Krithi Ramamritham / Kavi Arya

Exact RMA….In general:

Rni denotes the nth iteration of the response time of task i

hp(i) is the set of tasks with higher priority as task i

R CR

TCi

ni

in

jj hp i

j

1

( )

200© Krithi Ramamritham / Kavi Arya

Example - Exact AnalysisLet us look at our example, that failed the pure rate monotonic test, although we can schedule it Exact analysis says so.

• R1 = 1; easy• R3, second highest priority task

hp(t3) = T1

R3 = 2R C C

R C C

R R

t

t

t t

t t

t t

31

1 1 2

32

1 1 2

33

32

3 1

3 1

1

3

2

3

201© Krithi Ramamritham / Kavi Arya

• R2, third highest priority taskhp(t2) = {T1 ,T3 }

R2 = 3

R C C C

R C C C

R R

t

t

t t

t t t

t t t

21

1 1 1 3

22

1 1 1 3

23

22

2 1 3

2 1 3

1

3

1

5

3

3

3

5

202© Krithi Ramamritham / Kavi Arya

• R4, third lowest priority taskhp(t4) = {T1 ,T3 ,T2 }

R4 = 9 Response times of first instances of all tasks < their periods => taskset feasible under RM scheduling

R C C C C

R C C C C

R C C C C

t

t

t

t t t t

t t t t

t t t t

41

2 1 1 1 5

42

2 2 1 1 6

43

4 1 2 3

4 1 2 3

4 1 2 3

2

3

2

6

2

5

5

3

5

6

5

5

6

3

6

6

6

5

2 2 1 2 7

44

2 3 2 2 9

45

2 3 2 2 9

45

44

4 1 2 3

4 1 2 3

7

3

7

6

7

5

9

3

9

6

9

5

R C C C C

R C C C C

R R

t

t

t t

t t t t

t t t t

203© Krithi Ramamritham / Kavi Arya

3. Dynamic Planning based Approaches

• Feasibility is checked at run-time -- a dynamically arriving task is accepted only if it is feasible to meet its deadline. – Such a task is said to be guaranteed to meet its time constraints

• One of the results of the feasibility analysis can be a schedule or plan that determines start times

• Has the flexibility of dynamic approaches with some of the predictability of static approaches

• If feasibility check is done sufficiently ahead of the deadline, time is available to take alternative actions.

204© Krithi Ramamritham / Kavi Arya

4. Dynamic Best-effort Approaches

• The system tries to do its best to meet deadlines. • But since no guarantees are provided, a task may be

aborted during its execution.• Until the deadline arrives, or until the task finishes,

whichever comes first, one does not know whether a timing constraint will be met.

• Permits any reasonable scheduling approach, EDF, Highest-priority,…

205© Krithi Ramamritham / Kavi Arya

Cyclic scheduling• Ubiquitous in large-scale dynamic real-time systems• Combination of both table-driven scheduling and priority

scheduling. • Tasks are assigned one of a set of harmonic periods. • Within each period, tasks are dispatched according to a

table that just lists the order in which the tasks execute.• Slightly more flexible than the table-driven approach • no start times are specified• In many actual applications, rather than making worse-

case assumptions, confidence in a cyclic schedule is obtained by very elaborate and extensive simulations of typical scenarios.

206© Krithi Ramamritham / Kavi Arya

Plan• Special Characteristics of Real-Time Systems

• Real-Time Constraints

• Canonical Real-Time Applications

• Scheduling in Real-time systems

• Operating System Approaches

207© Krithi Ramamritham / Kavi Arya

Real-Time Operating SystemsSupport process management and synchronization, memory

management, interprocess communication, and I/O. Three categories of real-time operating systems:

small, proprietary kernels. e.g. VRTX32, pSOS, VxWorksreal-time extensions to commercial timesharing operatin systems.

e.g. RT-Linux, RT-NTresearch kernels

e.g. MARS, ARTS, Spring, Polis

208© Krithi Ramamritham / Kavi Arya

Real-Time Applications Spectrum

Hard

Soft

Real-Time Operating System

General-PurposeOperatingSystem

VxWorks, Lynx, QNX, ...

Windows NT

Windows CE

Intime, HyperKernel, RTX

209© Krithi Ramamritham / Kavi Arya

Real-Time Applications Spectrum

Hard

Soft

Real-Time Operating System

General-PurposeOperatingSystem

VxWorks, Lynx, QNX, ...Intime, HyperKernel, RTX

Windows NT

Windows CE

210© Krithi Ramamritham / Kavi Arya

Embedded (Commercial) KernelsStripped down and optimized versions of timesharing operating systems. • Intended to be fast

– a fast context switch,– external interrupts recognized quickly– the ability to lock code and data in memory– special sequential files that can accumulate data at a fast rate

• To deal with timing requirements– a real-time clock with special alarms and timeouts– bounded execution time for most primitives– real-time queuing disciplines such as earliest deadline first,– primitives to delay/suspend/resume execution– priority-driven best-effort scheduling mechanism or a table-driven

mechanism. • Communication and synchronization via mailboxes, events, signals, and

semaphores.

211© Krithi Ramamritham / Kavi Arya

Real-Time Extensions to General Purpose Operating

Systems

E.g., extending LINUX to RT-LINUX, NT to RT-NT• Advantage:

– based on a set of familiar interfaces (standards) that speed development and facilitate portability.

• Disadvantages– Too many basic and inappropriate underlying assumptions

still exist.

212© Krithi Ramamritham / Kavi Arya

Using General Purpose Operating Systems

• GPOS offer some capabilities useful for real-time system builders

• RT applications can obtain leverage from existing development tools and applications

• Some GPOSs accepted as de-facto standards for industrial applications

213© Krithi Ramamritham / Kavi Arya

Real Time Linux approaches

1. Modify the current Linux kernel to handle RT constraints– Used by KURT

2. Make the standard Linux kernel run as a task of the real-time kernel– Used by RT-Linux, RTAI

214© Krithi Ramamritham / Kavi Arya

Modifying Linux kernel

• Advantages– Most problems, such as interrupt handling, already

solved– Less initial labor

• Disadvantages– No guaranteed performance– RT tasks don’t always have precedence over non-RT

tasks.

215© Krithi Ramamritham / Kavi Arya

Running Linux as a process of a second RT kernel

•Advantages

–Can make hard real time guarantees

–Easy to implement a new scheduler

•Disadvantages

–Initial port difficult, must know a lot about underlying hardware

–Running a small real-time executive is not a substitute for a full-fledged RTOS

217© Krithi Ramamritham / Kavi Arya

GPOS -- for RT applications?

• Scheduling and priorities– Preemptive, priority-based scheduling

non-degradable priorities priority adjustment

– No priority inheritance– No priority tracking – Limited number of priorities– No explicit support for guaranteeing timing constraints

218© Krithi Ramamritham / Kavi Arya

Thread Priority = Process class + level

Real-timeclass

2625242322

16 Idle

Above NormalNormalBelow NormalLowest

Highest31 Time-critical

Dynamicclasses

15 Time-critical

14131211

15

High class

1 Idle

987

11

Normal class10

5432

6

Idle class

ThreadLevel

219© Krithi Ramamritham / Kavi Arya

Scheduling Priorities

• Threads scheduled by executive.

• Priority based preemptive scheduling.

Interrupts

Deferred Procedure Calls (DPC)

System anduser-level threads

220© Krithi Ramamritham / Kavi Arya

GPOS -- for RT applications? (contd.)

• Quick recognition of external events– Priority inversion due to Deferred Procedure Calls (DPC)

• I/O management• Timers granularity and accuracy

– High resolution counter with resolution of 0.8 sec. – Periodic and one shot timers with resolution of 1 msec.

• Rich set of synchronization objects and communication mechanisms. – Object queues are FIFO

221© Krithi Ramamritham / Kavi Arya

Research Operating Systems

• MARS – static scheduling• ARTS – static priority scheduling• Spring –dynamic guarantees

222© Krithi Ramamritham / Kavi Arya

MARS -- TU, Vienna (Kopetz)Offers support for controlling a distributed application based

entirely on time events (rather than asynchronous events) from the environment.

• A priori static analysis to demonstrate that all the timing requirements are met.

• Uses flow control} on the maximum number of events that the system handles.

• Based on the time driven model -- assume everything is periodic.• Static table-driven scheduling approach• A hardware based clock synchronization algorithm• A TDMA-like protocol to guarantee timely message delivery

223© Krithi Ramamritham / Kavi Arya

ARTS -- CMU (Tokuda, et al)• The ARTS kernel provides a distributed real-time computing

environment.• Works in conjunction with the static priority driven preemptive

scheduling paradigm. • Kernel is tied to various tools that a priori analyze schedulability.• The kernel supports the notion of real-time objects and real-time

threads. • Each real-time object is time encapsulated -- a time fence

mechanism:The time fence provides a run time check that ensures that the slack time is greater than the worst case execution time for an object invocation

224© Krithi Ramamritham / Kavi Arya

SPRING – Umass. (Ramamritham & Stankovic)

• Real-time support for multiprocessors and distributed sys• Strives for a more flexible combination of off-line and on-line

techniques– Safety-critical tasks are dealt with via static table-driven scheduling. – Dynamic planning based scheduling of tasks that arrive dynamically.

• Takes tasks' time and resource constraints into account and avoids the need to a priori compute worst case blocking times

• Reflective kernel retains a significant amount of application semantics at run time – provides flexibility and graceful degradation.

225© Krithi Ramamritham / Kavi Arya

Polis: Synthesizing OSs• Given a FSM description of a RT application• Each FSM becomes a task• Signals, Interrupts, and polling • Tasks with waiting inputs handled in FIFS order (priority

order – TB done)• Some interrupts can be made to directly execute the

corresponding task• Needed OS execute synthesized based on just what is

needed

226© Krithi Ramamritham / Kavi Arya

Configurable Computing Lab -- Hardware Environment

227© Krithi Ramamritham / Kavi Arya

IIT-KReSIT Reconfigurable Computing Lab Projects (2003)

• Network packet-processing- Packet Classifier (a la Stiliades/ Laxman)

• Wireless Protocol– 802.11 interface card

• Video codec- MPEG-4 with encryption

• Encryption (IDEA, etc.)

• Real-time reactive control systems- Inertial Navigation System (ILS)- Flight simulation- Scheduling co-processor

• Satellite Error Correcting codec

228© Krithi Ramamritham / Kavi Arya

References This tutorial is a short version of a semester-long course.

Visit http://www.it.iitb.ac.in/~it606

for all the material from that course

• Jack Ganssle, "The Art of Designing Embedded Systems", Newnes, 1999.

• David Simon, "An Embedded Software Primer", Addison Wesley, 2000.

• C.M. Krishna and Kang G. Shin, "RTS: Real-Time Systems", McGraw-Hill, 1997, ISBN 0-07-057043.

• Frank Vahid, Tony Givargis, "Embedded System Design: A Unified Hardware/ Software Introduction", John Wiley & Sons Inc., 2002.

• J. A. Stankovic, and K. Ramamritham, Advances in Hard Real-Time Systems, IEEE Computer Society Press, Washington DC, September 1993, 777 pages.

229© Krithi Ramamritham / Kavi Arya

References…

• K. Ramamritham and J. A. Stankovic, Scheduling Scheduling Algorithms and Operating Systems Support for Real-Time Systems, invited paper, Proceedings of the IEEE, Jan 1994, pp. 55-67.

• Sundeep Kapila, K. Ramamritham, Sudhakar, Distributed Real-Time Embedded Applications using Off-the-Shelf Components? Experiences Building a Flight Simulator, IEEE/IEE Real-Time Embedded Systems Workshop (held in conjunction with the IEEE Real-Time Systems Symposium), December 2001.

• Real-Time Linux, http://www.fsmlabs.com/articles/archive/usenix.pdf• Handel-C material based on "Handel-C Language Reference Manual",

Celoxica Ltd.

230© Krithi Ramamritham / Kavi Arya

References…

• David Harel, Hagi Lachover, Ammon Naamad, Amir Pnueli, Michal Politi, Rivi Sherman, Aharon Shtull-Trauring, and Mark Trakhtenbrot, Statemate: A working Environment for the Development of Complex Reactive Systems, IEEE Transactions on Software Engineering, Vol 16 No. 4, April 1999.

• Ptolemy Project, http://ptolemy.eecs.berkeley.edu/.

• S. Ramesh and P. Bhaduri, Validation of Pipelined processors using Esterel Tools: A Case study, Proc. of Computer Aided Verification, LNCS Vol. 1633, 1999. (pdf version).

231© Krithi Ramamritham / Kavi Arya

Summary• What are Embedded Systems?• What is Embedded software?• New Approaches to building ESW• Real-time support for ESW