Time-Aware Correct-By-Construction Systems Design

Time-Aware Correct-By-Construction Systems Design SCS Seminar, December 4, 2014

David Broman Associate Professor, KTH Royal Institute of Technology

Assistant Research Engineer, University of California, Berkeley

Part I Time-Aware Systems Design: a Vision

Part II Predictable Processors for Mixed-Criticality Systems

David Broman [email protected]

2

Agenda

Part II

Predictable Processors for Mixed-Criticality Systems

Part I

Time-Aware Systems Design: a Vision




3

Part I

Time-Aware Systems Design: a Vision




4

What is a Time-Aware System?

Time-aware systems are systems where time or timing affects the correctness of the system behavior.

Real-time systems are time-aware. For instance, execution of tasks must finish within certain deadlines.

Simulation systems can be time-aware, but are not necessarily real-time.

Cyber-Physical Systems are time-aware and real-time. Emphasis on networks and the interaction between cyber and physical.

Distributed systems can be time-aware that are not CPS.




5

Time-Aware Systems - Examples

Aircraft Automotive Process Industry and Industrial Automation

Cyber-Physical Systems (CPS)

Time-Aware Simulation Systems

Physical simulations (Simulink, Modelica, etc.)

Time-Aware Distributed Systems

Time-stamped distributed systems

(E.g. Google Spanner)




6

Time-Aware Systems Design

Physical system (the plant) Cyber system: Computation (embedded) + Networking

Sensors

Actuators

System

Model

Modeling

Equation-based model

Platform 1

Physical Plant 2

Physical Plant 2

PhysicalInterface

Physical Plant 1

NetworkPlatform 2

Platform 3

PhysicalInterface

Sensor

Sensor

PhysicalInterfaceActuator

PhysicalInterface Actuator

Computation 3

Delay 1Computation 1

Computation 4Computation 2

Delay 2

Various models of computation (MoC)

Simulation with timing properties

Modeling




7

Physical prototyping

Compiling/ synthesizing

Physical system (the plant) Cyber system: Computation (embedded) + Networking

Sensors

Actuators

System

Model Equation-based model

Platform 1

Physical Plant 2

Physical Plant 2

PhysicalInterface

Physical Plant 1

NetworkPlatform 2

Platform 3

PhysicalInterface

Sensor

Sensor

PhysicalInterfaceActuator

PhysicalInterface Actuator

Computation 3

Delay 1Computation 1

Computation 4Computation 2

Delay 2

Various models of computation (MoC)

Simulation with timing properties

Modeling Modeling

Challenge: Compile/synthesize the model’s cyber part, such that the simulated model and the behavior of the real system coincide. The main challenge is to guarantee correct timing behavior.

Model fidelity problem

“Ensuring that the model accurately represents the real system”





8

What is our goal?

“Everything should be made as simple as possible, but not simpler“

Execution time should be as short as possible, but not shorter

attributed to Albert Einstein

Task

Deadline

Slack

No point in making the execution time shorter, as long as the deadline is met.

Minimize the slack Objective: Minimize area, memory, energy.

Challenge: Still guarantee to meet all timing constraints.




9

A Story…

Success?

They have to purchase and store microprocessors for at least 50 years production and maintenance…

Fly-by-wire technology controlled by software.

Why?

Apparently, the software does not specify the behaviour that has been validated and certified!

Safety critical ! �Rigorous validation and certification




10

Programming Model and Time

Timing is not part of the software semantics Correct execution of programs (e.g., in C, C++, C#, Java, Scala, Haskell, OCaml) has nothing to do with how long time things takes to execute.

Programming Model

Timing Dependent on the Hardware Platform

Make time an abstraction within the programming model

Traditional Approach

Programming Model

Our Objective

Timing is independent of the hardware platform (within certain constraints)




11

Time-Aware Tool Chain Vision

Modeling Languages

Programming Languages

Assembly Languages

Modelyze (Broman and Siek, 2012)

Ptolemy II (Eker et al., 2003)

Simulink/ Stateflow (Mathworks)

Modelica (Modelica

Associations)

Real-Time Euclid (Klingerman & Stoyenko, 1986)

Real-time Concurrent C (Gehani and Ramamritham, 1991)

The assembly languages for todays processors lack the notion of time

PRET Machines at UC Berkeley (see part II)

Giotto and E machine

(Henzinger et al, 2003)




12


Assembly Languages PRET

ISA

Timed C

Work-in-progress: C extended with timing constructs

Difficult to compute WCET (e.g., determine loop bounds and infeasible paths)


Modeling Languages




Modelica (Modelica

Associations)






13


Assembly Languages

Timed C

PRETIL - Abstracting away memory hierarchy (scratchpad, DRAM etc.)

- Expose timing constructs

Our current work-in-progress is an extension to LLVM



Modeling Languages




Modelica (Modelica

Associations)



PRET ISA




14


Assembly Languages

Timed C

PRETIL - Abstracting away memory hierarchy (scratchpad, DRAM etc.)

- Expose timing constructs

Other (non PRET) ISA

Time-Aware Compilation


PRET ISA

Modeling Languages




Modelica (Modelica

Associations)







15


Research Objective: Develop methodologies, algorithms, and a time-aware tool chain that change the way we develop these kind of systems using a correct-by-construction approach. Area 1: Programming Languages

and APIs with timing constrains -  Modelyze -  Timed C -  Functional Mockup Interface (FMI)

Modeling/Program Language

Time-Aware Tool Chain

(Compilation/Synthesis)

Area 3: Predictable Architectures and Clock Synchronization -  PRET processors -  Clock Synchronization

Clock sync

Clock sync

Area 2: Time-aware compilation/synthesis -  LLVM-based time-aware compiler -  WCET analysis




16

Part II

Predictable Processors for Mixed-Criticality Systems

* This part highlights key aspects of two papers that will appear in RTAS 2014 (April 15-17, Berlin), authored by the following persons:

Michael Zimmer David Broman Chris Shaver Edward A. Lee

Yooseong Kim David Broman Jian Cai Aviral Shrivastava




17

Modern Systems with Many Processor Platforms

Modern aircraft have many computer controlled systems •  Engine control •  Electric power control •  Radar system •  Navigation system •  Flight control •  Environmental control system etc…

Modern cars have many ECU (Electronic Control Units) •  Airbag control •  Door control •  Electric power steering control •  Power train control •  Speed control •  Battery management. etc.. Over 80 ECUs in a high-end model (Albert and Jones, 2010)

Automotive

Aerospace




18

Mixed-Criticality Systems

Issues with too many processors •  High cost •  Space and weight •  Energy consumption

Federated Approach Each processor has its own task

Consolidate into fewer processors

Task Processor Platform

Required for Safety •  Spatial isolation between tasks •  Temporal isolation between tasks

(necessary to meet deadlines)




19

Consolidate into fewer processors

Required for Safety •  Spatial isolation between tasks •  Temporal isolation between tasks

(necessary to meet deadlines)

Mixed-Criticality Challenge Reconcile the conflicting requirements of: •  Partitioning (for safety) •  Sharing (for efficient resource usage) (Burns & Davis, 2013)

…but such safety requirements are only needed for highly critical tasks

Mixed-Criticality Systems

Issues with too many processors •  High cost •  Space and weight •  Energy consumption

Federated Approach Each processor has its own task




20

Our solution

FlexPRET Softcore

Fine-grained Multithreaded Processor Platform (thread interleaved) implemented on an FPGA

Flexible schedule (1 to 8 active threads) and scheduling frequency (1, 1/2, 2/3, 1/4, 1/8 etc.)

Hard real-time threads (HRTT) with predictable timing behavior •  Thread-interleaved pipleine (no pipeline hazards) •  Scratchpad memory instead of cache Soft real-time threads (SRTT)

with cycle stealing from HRTT

WCET-Aware Scratchpad

Memory (SPM) Management

Automatic DMA transfer of code to SPM

Optimal mapping for minimizing WCET




21

Related Work

Software Scheduling for Mixed Criticality •  Reservation-based partitioning, ARNIC 653 •  First priority-based MC (Vestal, 2007) •  Sporadic task scheduling (Baruha and

Vestal, 2008) •  Slack scheduling (Niz et al. 2009) •  Review of MC area, 168 references (Burns &

David, 2013)

WCET Analysis

Predictable and Multithreaded Processors

•  WCET-aware compiler (Falk & Lukuciejewski, 2010)

•  Detection of loop and infeasible paths (Gustafsson et al., 2006)

•  Cache analysis (Ferdinand & Wilhelm, 1999) •  WCET Survey (Wilhelm et al., 2008)

•  PRET idea (Edwards and Lee, 2007) •  PTARM (Liu et al., 2012) •  Patmos (Schoeberl et al., 2011) •  JOP (Schoeberl, 2008) •  XMOS X1 (May, 2009) •  MERASA, MC on multicore (Ungerer, 2010)

Scratchpad Memory Management •  Average case SPM methods for SMM

(Bai et al, 2013; Jung et al., 2010; Pabalkar et al. 2008; Baker et al., 2010)

•  Static SPM WCET methods (Keinaorge 2008, Platzar 2012)

•  SPM management at basic block level (Puaut & Pais, 2007) Several EU projects related to Mixed-Criticality:

MultiPARTES, Recomp, CERTAINTY, Proxima,…




22

Flexible Scheduling with Cycle Stealing

•  FlexPRET allow arbitrary interleaving •  Soft real-time threads (SRTT) can steal

cycles from hard real-time threads(HRTT)

HRTT

SRTT

Example execution (read from up to down, left to right)

Task A (hard) frequency 2/4 = 1/2 Task B (hard) frequency 1/4 Task C (soft) frequency 1/4 + cycle stealing

Task B finish, cycles are used by task C (soft thread) Task A and B are temporally isolated




23

C level programming using real-time

D. Timing Instructions

New timing instructions augment the RISC-V ISA forexpressing real-time semantics. In contrast to previous PRETarchitectures supporting timing instructions [14], [18], [21],our design is targeted for mixed-critical systems.

The FlexPRET processor contains an internal clock thatcounts the number of elapsed nanoseconds since the processorwas booted. The current time is stored in a 64-bit register,meaning that the processor can be active for 584 years withoutthe clock counter wrapping around. Two new instructions canbe used to get the current time: get time high GTH r1 andget time low GTL r2 store the higher and lower 32 bits inregister r1 and r2, respectively. When GTL is executed, theprocessor stores internally the higher 32 bits of the clockand then returns this stored value when executing GTH. Asa consequence, executing GTL followed by GTH is atomic, aslong as the instruction order is preserved.

To provide a lower bound on the execution time for acode fragment, the RISC-V ISA is extended with a delay untilinstruction DU r1,r2, where r1 is the higher 32 bits and r2is the lower 32 bits of an absolute time value. Semantically,the thread is delayed (replays this instruction) until the currenttime becomes larger or equal to the time value specified by r1and r2. However, in contrast to previous processors supportingtiming instructions (e.g., PTARM [14], [18]), the clock cyclesare not wasted, but can instead be utilized for other SRTTs.

To provide an upper bound on execution time withoutconstantly polling, a task needs to be interrupted. Instructionexception on expire EE r1,r2 enables a timer exception thatis executed when the current time exceeds r1,r2. The jumpaddress is specified by setting a control register with MTPCR(move to program control register). Only one exception perthread can be active at any point in time; nested exceptionsmust be implemented in software. The instruction deactivateexception on expire DE deactivates the timer exception.

Exception on expire can be used for many purposes, suchas detecting and handling a deadline miss, implementing apreemptive scheduler, or performing timed I/O. By first issuingan exception on expire and then executing a new thread sleepTS instruction, the clock cycles for the sleeping thread can beutilized by other active SRTTs. Another use of exception onexpire is for anytime algorithms, that is, algorithms that canbe interrupted at any point in time and returns a better solutionthe longer time it is executed.

E. Memory Hierarchy

For spatial isolation between threads, FlexPRET allowsthreads to read anywhere in memory, but only write to certainregions. The regions are specified by control registers that canonly be set by a thread in supervisory mode with MTPCR.Virtual memory is a standard and suitable approach, but Flex-PRET currently uses a different scheme for simplicity. Thereis one control register for the upper address of a shared region(which starts at the bottom of data memory) and two controlregisters per thread for the lower and upper addresses of athread-specific region. Memory is divided into 1kB regions,and a write only succeeds if the address is within the shared orthread-specific region. By specifying all thread-specific regions

and the shared region to be disjoint, each thread will have bothprivate memory and access to shared memory.

For timing predictability, FlexPRET uses scratchpad mem-ories [22]. These are local memories that have a separateaddress space than main memory and are explicitly controlledby software; all valid memory accesses always succeed and aresingle cycle, unlike caches where execution time depends oncache state. There is active research in scratchpad memorymanagement techniques to reduce WCET [23]. Instructionsare stored in instruction scratchpad memory (I-SPM) anddata is stored separately in data scratchpad memory (D-SPM). Scratchpad memories are not required; caches couldbe used instead if the reduction in fine-grained predictability isacceptable. We envision a hybrid approach where HRTTs tasksuse scratchpads and SRTTs use caches for future versions ofFlexPRET.

F. Programming, Compilation, and Timing Analysis

FlexPRET can be programmed using low level program-ming languages, such as C, that are augmented with con-structs for expressing temporal semantics. FlexPRET can bean integral part of a precision timed infrastructure [24] thatincludes languages and compilers with an ubiquitous notionof time. Such a complete infrastructure with timing-awarecompilers is outside the scope of this paper; instead, we usea RISC-V port of the gcc compiler and implement the newtiming instructions using inline assembly. The following codefragment illustrates how a simple periodic control loop can beimplemented.1 int h,l; // High and low 32-bit values2 get_time(h,l); // Current time in nanoseconds3 while(1){ // Repeat control loop forever4 add_ms(h,l,10); // Add 10 milliseconds5 exception_on_expire(h,l,missed_deadline_handler);6 compute_task(); // Sense, compute, and actuate7 deactivate_exception(); // Deadline met8 delay_until(h,l); // Delay until next period9 }

Before the control loop is executed, the current time(in nanoseconds) is stored in variables h and l (line2). The time is incremented by 10ms (line 4) and atimer exception is enabled (line 5), followed by taskexecution (line 6). If a deadline is missed, an excep-tion handler missed_deadline_handler is called. Toforce a lower bound on the timing loop, the executionis delayed until the time period has elapsed (line 8);the cycles during the delay can be used by an activeSRTT. Functions get_time, exception_on_expire,deactivate_exception, and delay_until implementthe new RISC-V timing instructions using inline assembly.

To have full control over timing, real-time applications canbe implemented as bare-metal software, using only lightweightlibraries for hardware interaction. As a scheduling designmethodology, we propose that tasks with the highest criticalitylevel (e.g. A in DO-178C [4]) are assigned individual HRTTs,thus providing both temporal and spatial isolation. The next-highest criticality level tasks (e.g. B in DO-178C) also useHRTTs, but several tasks can share the same thread, thusreducing the hardware enforced isolation. Lower criticalitytasks (e.g. C, D, E in DO-178C) can then share SRTTs

5

•  Currently using a GCC port for RISC-V when compiling programs with C inline assembly macros.

•  Work-in-progress of a LLVM based WCET-aware compiler

1-2: Get time in nano seconds (64 bits)

5: Add en exception handler (immediate detection of missed deadline)

6: Compute

7-8: Deactivate and delay (force lower bound)

NOTE: The delay until (DU) instruction is used for cycle stealing




24

Software Managed Multicores WCET-Aware Dynamic Code Management onScratchpads for Software-Managed Multicores

Yooseong Kim⇤†, David Broman⇤‡, Jian Cai†, and Aviral Shrivastaval⇤†⇤ University of California, Berkeley, {yooseongkim, davbr, aviral}@berkeley.edu† Arizona State University, {yooseong.kim, jian.cai, aviral.shrivastava}@asu.edu

‡ Linkoping University, [email protected]

Abstract—Software Managed Multicore (SMM) architectureshave advantageous scalability, power efficiency, and predictabilitycharacteristics, making SMM particularly promising for real-timesystems. In SMM architectures, each core can only access itsscratchpad memory (SPM); any access to main memory is doneexplicitly by DMA instructions. As a consequence, dynamic codemanagement techniques are essential for loading program codefrom the main memory to SPM. Current state-of-the-art dynamiccode management techniques for SMM architectures are, how-ever, optimized for average-case execution time, not worst-caseexecution time (WCET), which is vital for hard real-time systems.In this paper, we present two novel WCET-aware dynamic SPMcode management techniques for SMM architectures. The firsttechnique is optimal and based on integer linear programming(ILP), whereas the second technique is a heuristic that is sub-optimal, but scalable. Experimental results with benchmarks fromMalardalen WCET suite and MiBench suite show that our ILPsolution can reduce the WCET estimates up to 80% comparedto previous techniques. Furthermore, our heuristic can, for mostbenchmarks, find the same optimal mappings within one secondon a 2GHz dual core machine.

I. INTRODUCTION

In real-time [1] and cyber-physical [2] systems, timing is acorrectness criterion, not just a performance factor. Executionof program tasks must be completed within certain timingconstraints, often referred to as deadlines. When real-timesystems are used in safety-critical applications, such as auto-mobiles or aircraft, missing a deadline can cause devastating,life-threatening consequences. Computing safe upper boundsof a task’s worst-case execution time (WCET) is essential toguarantee the absence of missed deadlines.

Real-time systems are becoming more and more complexwith increasing performance demands. Performance improve-ments in recent processor designs have mainly been drivenby the multicore paradigm because of power and temperaturelimitations with single-core designs [3]. Some recent real-time systems architectures are moving towards multicore [4]or multithreaded [5], [6] designs. However, coherent caches,which are popular in traditional multicore platforms, are not agood fit for real-time systems. Coherent caches make WCETanalysis difficult and result in pessimistic WCET estimates [7].

This work was supported in part by the iCyPhy Research Center (IndustrialCyber-Physical Systems, supported by IBM and United Technologies), theSwedish Research Council (#623-2011-955), and the Center for Hybrid andEmbedded Software Systems (CHESS) at UC Berkeley (supported by theNational Science Foundation, NSF awards #0720882 (CSR-EHS: PRET),#1035672 (CPS: Medium: Timing Centric Software), and #0931843 (Action-Webs), the Naval Research Laboratory (NRL #N0013-12-1-G015), and thefollowing companies: Bosch, National Instruments, and Toyota).

SPM

Core

SPM

Core

SPM

Core

Main Memory

SPM

Core

DM

A

DM

A

DM

A

DM

A

SPM

Core

DM

A

Main Memory

(a) (b)

Fig. 1. (a) SMM architecture vs. (b) traditional architecture with SPM. Corescannot access main memory directly in SMM architecture. All code and datamust be present in SPM at the time of execution.

SMM (Software Managed Multicore) architectures [8], [9]are a promising alternative for real-time systems. In SMM,each core has a scratchpad memory (SPM), so-called localmemory, as shown in Fig. 1(a). A core can only access itsSPM in an SMM architecture, as opposed to the traditionalarchitecture in Fig. 1(b) where a core can access both mainmemory and SPM with different latencies. Accesses to themain memory must be done explicitly through the use of directmemory access (DMA) instructions. The absence of coherencymakes such architectures scalable and simpler to design andverify compared to traditional multicore architectures [3]. Anexample of an SMM architecture is the Cell processor that isused in Playstation 3 [10].

If all code and data of a task can fit in the SPM, thetiming model of memory accesses is trivial: each load andstore always take a constant number of clock cycles. However,if all code or data does not fit in the SPM, it must bedynamically managed by executing DMA instructions duringruntime. Dynamic code management strongly affects timingand must consequently be an integral part of WCET analysis.

In traditional architectures that have SPMs, cores candirectly access main memory, though it takes a longer timeto access main memory than the SPM. In such architectures,the question is what to bring in the SPM to reduce the WCETof a task. This approach is not, however, feasible in SMMarchitectures because all relevant code must be present in theSPM at the time of execution. For this reason, existing WCET-aware dynamic code management techniques for SPMs [11],[12]—which select part of the code to be loaded in the SPMand keep the rest in the main memory—are not applicable inSMM architecture.

There exists previous work on developing dynamic code

This is the author prepared accepted version. © 2014 IEEE. The published version is: Yooseong Kim, David Broman, Jian Cai, and Aviral Shrivastaval. WCET-Aware Dynamic Code Management on Scratchpads for Software-Managed Multicores. Proceedings of the 20th IEEE Real-Time and Embedded Technology and Application Symposium (RTAS), Berlin, Germany, April 15-17, 2014.

© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

WCET-Aware Dynamic Code Management onScratchpads for Software-Managed Multicores




I. INTRODUCTION




SPM

Core

SPM

Core

SPM

Core

Main Memory

SPM

Core

DM

A

DM

A

DM

A

DM

A

SPM

Core

DM

A

Main Memory

(a) (b)








In FlexPRET, HRTT can only access Scratchpad memory (SPM) directly.

Problem: How can we dynamically load code from the main memory to SPM such that WCET is minimized?

Traditional use of SPM. Static allocation (partioning) and direct access to main memory.)

Software Managed Multicore (SMM) Only access to SPM. Need DMA.

Examples: •  Cell processor •  FlexPRET




25

WCET-Aware Scratchpad Allocation: main idea

SPM

R1

R2

R3

RN

Main Memory

N number of regions in SPM.

M number of functions.

F1

Function-to-region mapping

F2

…

F3

F4

F5

F6

FM

Task1: Given a function-to-region mapping, compute WCET Task2: Find an optimal mapping the minimizes WCET Contribution: •  Formalized an optimal solution using ILP •  Developed a scalable, but sub-optimal heuristic




26

More info on this topic

Michael Zimmer, David Broman, Chris Shaver, and Edward A. Lee. FlexPRET: A Processor Platform for Mixed-Criticality Systems. Proceedings of the 20th IEEE Real-Time and Embedded Technology and Application Symposium (RTAS), Berlin, Germany, April 15-17, 2014.

Yooseong Kim, David Broman, Jian Cai, and Aviral Shrivastaval. WCET-Aware Dynamic Code Management on Scratchpads for Software-Managed Multicores. Proceedings of the 20th IEEE Real-Time and Embedded Technology and Application Symposium (RTAS), Berlin, Germany, April 15-17, 2014.

WCET-Aware Dynamic Code Management onScratchpads for Software-Managed Multicores




I. INTRODUCTION




SPM

Core

SPM

Core

SPM

Core

Main Memory

SPM

Core

DM

A

DM

A

DM

A

DM

A

SPM

Core

DM

A

Main Memory

(a) (b)











27

Conclusions




28

Conclusions

•  Time-aware systems are systems where time or timing affects the correctness of the system behavior.

Thanks for listening!

Some key take away points:

•  Cyber-physical systems (CPS) are Time-Aware, but systems without physical plants can also be time-aware (e.g., distributed time-stamped systems)

•  Overall objective: Develop a new methodology, algorithms, and a tool chain that are time-aware and use a correct-by-construction approach.

•  Mixed-criticality systems can be designed using predictable processors.

Documents

Time-Aware Correct-By-Construction Systems Design