4. Hardware Platform: Real-Time Requirementsecker/Scripte/RTSYSKap4.pdf · = segmentation (Motorola 68010, Intel 80286) ... General hardware requirements: ... frame size: should kee

© K. H. Ecker, T.U. Clausthal. Sept. 2000 4. Hardware Platform: Real-Time Requirements 4-1

4. Hardware Platform: Real-Time Requirements

Contents:

4.1 Evolution of Microprocessor Architecture 4.2 Performance-Increasing Concepts 4.3 Influences on System Architecture 4.4 A Real-Time Hardware Architecture

© 2003 K. H. Ecker, T.U. Clausthal. 4. Hardware Platform ... 4-2

Obviously, processors have to fulfill the time constraints of the technical process Microprocessors are often integrated in technical processes "embedded systems"

→ r-t systems are the most innovative application areas for today's microprocessors

Question: − architectural concepts of microprocessors; − relevance for r-t systems?

Silicon technology: driving force in processor development increase of integration density; price reduction


4.1 Evolution of Microprocessor Architecture • CISC versus RISC RISC characteristics:

− almost all instruction need one clock cycle − arithmetical and logical operations are confined to registers; load/store

operation for accessing memory − large number of general purpose registers on chip − reduced instruction set: about 40 or less instructions

as compared to over 120 in CISC

− instruction format has fixed length • Register organization performance of procedure calls stack-, code-, data-cache

register windows (for enhancing program branches)


• Interrupt system

are very important in r-t systems because of time-related interaction with the environment

conventional interrupt systems of microprocessors: software-programmed interrupt controllers more recent developments:

interrupt controllers are integrated in the processor

• Hardware support for multitasking and multi-user environments

- virtual memory management mapping of virtual addresses to physical addresses:

= segmentation (Motorola 68010, Intel 80286) = paging (Motorola 68030, Intel 80386) = segmentation and paging (Motorola 88000, Intel 80386, AMD 29000)


- memory protection reasons for memory protection:

= support for simplified debugging and testing by run-time checks and exception handling

= protection of operating system from erroneous application programs = protection of operating system and application programs against illegal

access - realization

= access to main memory pages is checked by the MMU = introduction of privilege levels with different access rights is used to

protect different system components against each other

Trend: integration of memory protection into processor chips; two privilege levels: user mode/supervisor mode


• Multitasking support

Operating systems are significantly determined by their multitasking capabilities New microprocessors offer more and more multitasking support integrated in VLSI technology

Today, we find two levels of on-chip support: - some processors offer the notion of tasks and explicit context switches

implemented in hardware (Intel 80386, Weitek 32100) - other processors have even implemented complete scheduling algorithms in

hardware (Inmos T800)

In typical r-t systems: large number of concurrently running tasks; high rate of context switches


• Debugging and monitoring

Tools for software development: - single stepping, breakpointing - event trigger logic - breakpointing and tracing based in different types of events - logical and time related event combination - and others

In typical r-t systems: Increasing complexity of software, the integration of these features into the processor chip is very important


4.2 Performance-Increasing Concepts

In MIPS, performance of microprocessors has increasing by a factor of 2.25 each year in the past

improvements in silicon technology, in architectural concepts; integration of cache, and on-chip pipelining

4.2.1 Pipelining

Multiple phases of different instructions are executed in parallel; jumps, branches, procedure calls/returns, interrupts and context switches

... cause a pipeline "flush" Pipeline flushes can be reduced by

- code reorganization (delayed execution of jumps, delayed branches)

- doubling of the first pipeline


- prediction of probability of branches based on heuristics (branch prediction)

- adequate register allocation to avoid pipeline flushes

New trends in processor design: more sophisticated concepts such as: superpipelined, superscalar, long instruction word (LIW), very long instruction word (VLIW)

In typical r-t systems: the high interrupt rates and context switches decrease the performance

of pipelines substantially


4.2.2 Cache memories ... used to avoid memory-access latency

In addition: cache memories are a precondition for enhancing branch prediction Cache efficiency is measured in hit rates:

Code cache: - 512-byte cache: approx. 80% hit rate - 2-Kbyte cache: approx. 95% hit rate

Data cache: - 4-Kbyte cache: approx. 80% hit rate

In typical r-t systems: - high interrupt and context switch rates cause useless entries in the cache

and consequently a low hit rate

+ very important routines of real-time programs may fit completely into on-chip caches: execution without misses

"freezing the cache"


4.2.3 Modular design to reduce and manage the complexity of new microprocessors

Modularity of new designs: structure the processor design into well-defined reusable modules - submodules are independent and can be developed separately

- new processors may be combined easily from already existing modules - processor can be distributed among several chips

- modules can be tuned independently of each other with respect to different requirements

- application-specific or special-purpose modules can be added easily

Adequate definitions of interfaces between modules r-t systems: future "real-time" processors are modifications of "standard" microprocessor

families integration of microprocessor cores in application-specific integrated circuits

(ASICs)


4.2.4 Testability

Higher integration and complexity of microprocessors require new ways of chip testing - complete testing not possible due to the huge number of required test

patterns - pin limitations: increasingly complicated to access internal structures from

outside - testing at the board level becomes more difficult because of higher board

integration, technique of surface-mounted-devices

On-chip test circuits: tests are initiated from the outside via additional pins


4.3 Influences on System Architecture

Relationship between processor and major components: • Processor bus interface between processor and memory: bottleneck

- separate instruction and data buses - burst mode operation; pipelined transmissions - 64 or 128 bit buses

• Multiprocessors architectures Three classes of available multiprocessor architectures: - homogeneous parallel computers based on identical microprocessors in each

processor node and non-bus-based interconnection networks e.g. hypercube, tree, array

- tightly coupled processors with specific buses and global shared memory - loosely coupled processors based on standard bus systems Ebus; Multibus I, II, Futurebus


• Coprocessors

... for special purposes

Coprocessors enhance system performance by off-loading the CPU Types of coprocessors: - network coprocessors - direct memory access controllers - I/O processors - graphics coprocessors - floating point arithmetic coprocessors

Interface between processor and coprocessor: tightly coupled (CPU specific) or loosely coupled (CPU independent)


• Reliability and fault tolerance

Reliability and availability of microprocessor systems decreases with increasing number of components

Solution: integration of fault tolerant concepts Example: integration of master/checker circuits in CPUs

r-t systems: very important in case of security-sensitive applications


4.4 A Real-Time Hardware Architecture

4.4.1 A Basic Architectural Concept

Early days of real-time processing: - conventional von Neumann computers - adaptation to the real time application by including process peripherals and

externally available interrupt lines - all other real time requirements were met by software

operating system, carefully programming

In this chapter: development of a concept of a hardware platform on which predictably behaving real-time systems may be based


General hardware requirements: - known time for each machine instruction - hardware must not introduce unpredictably long delays - fail-safe hardware: support of fault detection; predictable graceful degradation; recover within predictable intervals Consequences: - simple, well-defined architecture, - comprehensive instruction set, - no features like pipelining, caches, virtual memory or general DMA


Proposed solution: • task-oriented hierarchical storage administration scheme, • DMA without cycle stealing • explicit timing of I/O facilities • two-processor architecture:

− general task processor for user tasks and OS tasks that interfere with user tasks mainly supervisor shell services, e.g. data exchange with peripherals file management, as initiated by the user tasks

− co-processor for the OS kernel firmware: responsible for system functions events, time and task management, communication, synchronization



4.4.2 Layered Structure of R-T Operating Systems

Basic architectural concepts: • processors for task execution • separate co-processor for the os-kernel and firmware

interrupt handling task management communication synchronization time administration I/O routines operator interface

distinction between operating system nucleus OS processes of the first kind by interrupts

and the operating system shell OS processes of the second kind handled as used tasks


Advantages: • normal program flow is only interrupted when required by the

scheduling algorithm no unnecessary context switches • event-driven tasks are executed in a way that disturbs other active tasks

as little as possible • operating system overhead becomes predictable • no task preemption due to occurring events event: immediate reaction required the co-processor provides an independently working

event recognition mechanism


Three layer structure of a co-processor

1. Hardware layer accurate r-t management based on high resolution clock exact timing of operations separate programmable interrupt generator for software simulation event representation by storage element and latch for time of occurrence synchronizer representation shared variable representation

2. Primary reaction layer recognition of events (interrupts, signals, time events, status transfer of synchronizers, value changes of shared variables) initiation of secondary reactions recording of events for error tracking management of time schedules and critical instants


3. Secondary reaction layer deadline-driven processor scheduling with overload handling task oriented hierarchical storage management execution of (secondary) event reactions (tasks) synchronizer management shared variable management acceptance of requests initiation of processor activities


Realization of communication between general processor and co-processor communication operations require: − first-in-first-out buffers − shared variables, system internal data: e.g. task control blocks − common memory area directly accessible by all system components


4.4.3 Predictable Storage Management

Situation: task software ... usually rather small pieces of software "paging elements" for fast and predictable reaction times:

entire task segment should be loaded into main memory each time it is needed

Task-oriented hierarchical storage administration: main storage is divided into

− an area for the supervisor (no paging) − shared data structures (no paging) − K ≥ 2 page frames (paging)

frame size: should keep the code of a single task


Influence of number of page frames: (T1, ..., Tnt) ... list of ready tasks at time t, in EDF order

loading into main memory: subset B = {Ti | i = 1, ..., min{K, nt}} (K ≥ 2)

each task of B is assigned a free frame T1 ∈ B is running

If a task with earlier deadline as some task in B arrives: task replacement During execution of T1: next task (T2) can be "paged in"

code of T2 should latest be available as soon as T1 terminates

Actual choice of K depends on: − number of I/O channels available for paging − transfer time − average execution time of tasks − frequency of suspension of the running task due to I/O and synchronization when other tasks are processed while the task is suspended


4.4.4 Direct Memory Access

Cycle stealing: slows down other activities in the system indirectly and unpredictably on the other hand: DMA speeds up I/O transfers of large blocks of data but reaction times in control applications cannot be guaranteed in the presence of DMA

DMA without cycle stealing: − main storage is organized in several independent modules processor operates on one module, DMA on another module dynamic bus subdivision


temporary (and programmed) separations of bus sections

bus accesses in isolated sections only


− dynamic RAMs refreshing can be connected with DMA


4.4.5 Precise Timing Requirement: formulation of the following conditions: − data transmission from the r-t system to the environment at specified

times − data entering the r-t system: time instant must be recorded to ascertain reaction times

Low-level statements for timing: TAKE variable FROM source AT clock_expression SEND expression TO sink AT clock_expression ... for precisely timed program initiated I/O

Externally triggered input operation with time stamp: TAKE variable FROM source RECEIVED clock_variable


Summary of Chapter 4

Performance increasing concepts for microprocessors: RISC architecture on-chip interrupt controllers operating system support multitasking support pipelining cache memories testing

Not all these concepts are necessarily well suited for real-time systems

On the level of system architecture, enhancement concepts concern the processor bus co-processors, multiprocessors, integration of fault tolerant solutions

Documents

4. Hardware Platform: Real-Time Requirementsecker/Scripte/RTSYSKap4.pdf · = segmentation (Motorola 68010, Intel 80286) ... General hardware requirements: ... frame size: should kee