46
Public FP7-ICT-2009- 4 (247999) COMPLEX COdesign and power Management in PLatform- based design space EXploration Project Duration 2009-12-01 – 2013-03-31 Type IP WP no. Deliverable no. Lead participant WP6 D6.2.5 OFFIS Final publishable summary report Prepared by Kim Grüttner (OFFIS), all contributors Issued by OFFIS Document Number/Rev. COMPLEX/OFFIS/R/D6.2.5/1.0 Classification COMPLEX Public Submission Date 2013-05-08 Due Date 2013-03-31 Project co-funded by the European Commission within the Seventh Framework Programme (2007-2013) © Copyright 2013 OFFIS e.V., STMicroelectronics srl., STMicroelectronics Beijing R&D Inc, Thales Communications SA, GMV Aerospace and Defence SA, SNPS Belgium NV, EDALab srl, Magillem Design Services SAS, Politecnico di Milano, Universidad de Cantabria, Politecnico di Torino, Interuniversitair Micro-Electronica Centrum vzw, European Electronic Chips & Systems design Initiative. This document may be copied freely for use in the public domain. Sections of it may be copied provided that acknowledgement is given of this original work. No responsibility is assumed by COMPLEX or its members for any aplication or design, nor for any infringements of patents or rights of others which may result from the use of this document.

COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

Public

FP7-ICT-2009- 4 (247999) COMPLEX

COdesign and power Management in PLatform-

based design space EXploration

Project Duration 2009-12-01 – 2013-03-31 Type IP

WP no. Deliverable no. Lead participant

WP6 D6.2.5 OFFIS

Final publishable summary report

Prepared by Kim Grüttner (OFFIS), all contributors

Issued by OFFIS

Document Number/Rev. COMPLEX/OFFIS/R/D6.2.5/1.0

Classification COMPLEX Public

Submission Date 2013-05-08

Due Date 2013-03-31

Project co-funded by the European Commission within the Seventh Framework Programme (2007-2013)

© Copyright 2013 OFFIS e.V., STMicroelectronics srl., STMicroelectronics Beijing

R&D Inc, Thales Communications SA, GMV Aerospace and Defence SA, SNPS Belgium

NV, EDALab srl, Magillem Design Services SAS, Politecnico di Milano, Universidad de

Cantabria, Politecnico di Torino, Interuniversitair Micro-Electronica Centrum vzw, European

Electronic Chips & Systems design Initiative.

This document may be copied freely for use in the public domain. Sections of it may

be copied provided that acknowledgement is given of this original work. No responsibility is

assumed by COMPLEX or its members for any aplication or design, nor for any

infringements of patents or rights of others which may result from the use of this document.

Page 2: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 2

History of Changes

ED. REV. DATE PAGES REASON FOR CHANGES

KG 1.0 2013-05-06 45 Final document

Page 3: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 3

Contents

1 Executive summary 4

2 Summary description of project context and objectives 5

2.1 Motivation and context 5

2.2 Objectives 7

3 Description of the main S&T results/foregrounds 9

3.1 MDA Design Entry 10

3.1.1 Viewpoints 10

3.1.2 Component-based design 11

3.1.3 Use-cases, scenarios, verification 11

3.1.4 Support of Design-Space Exploration 11

3.1.5 MDE tools and flow 12

3.2 Executable Specification 15

3.2.1 Executable application model 16

3.2.2 System input stimuli 16

3.2.3 User constrained HW/SW separation & mapping 17

3.2.4 Architecture/Platform description 17

3.3 Estimation & Model Generation 19

3.3.1 Hardware/Software task separation 19

3.3.2 Custom Hardware estimation 20

3.3.3 Software estimation 22

3.3.4 Pre-existing IP & virtual component models 26

3.3.5 Virtual system generation 26

3.4 Simulation 29

3.4.1 Pre-optimized power controller 29

3.4.2 Timing & power aware executable virtual system prototype in SystemC 30

3.5 Exploration & Optimization 31

3.5.1 Simulation trace 32

3.5.2 Exploration & optimization 33

3.5.3 Design space definition 34

3.5.4 Design space instance parameters 34

4 Potential impact and the main dissemination activities and exploitation of results 35

4.1 Exploitable Use-Cases 37

4.1.1 Use-Case 1 – Networked embedded system 37

4.1.2 Use-Case 2 – Battery powered multi-core system 39

4.1.3 Use-Case 3 – Model-based space application 42

4.2 Socio-economic impact and the wider societal implications of the project 43

4.3 Main dissemination activities 43

5 Contact information 44

6 References 45

Page 4: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 4

1 Executive summary

The consideration of an embedded device's power consumption and its management is

increasingly important nowadays. Currently, it is not easily possible to integrate power

information already during the platform exploration phase. In this integrated project,

integrated device manufacturers, system integrators, Electronic Design Automation (EDA)

vendors and research partners collaboratively worked on solving the design challenges of

today's heterogeneous HW/SW systems regarding power and complexity.

The main objective of the COMPLEX project was to increase the competitiveness of the

European semiconductor, system integrator and EDA industry by addressing the problem of

platform-based design space exploration (DSE) under consideration of power and

performance constraints early in the design process. High performance usually causes high

power consumption. A main challenge in today’s embedded system design is to find the

perfect balance between performance and power. This balance cannot be found efficiently and

at high quality, because until now no generic framework for accurately and jointly estimating

performance and power consumption starting at the algorithmic level has been available.

As a result, we propose a reference framework and design flow concept that combines

system-level power optimization techniques with platform-based rapid prototyping. Virtual

executable prototypes are generated from MARTE/UML and functional C/C++ descriptions,

which then allows to study different platforms, mapping alternatives, and power management

strategies.

Our proposed flow combines system-level timing and power estimation techniques available

in commercial tools with platform-based rapid prototyping. We propose an efficient code

annotation technique for timing and power properties enabling fast host execution as well as

adaptive collection of power traces. Combined with a flexible design-space exploration (DSE)

approach our flow allows a trade-off analysis between different platforms, mapping

alternatives, and optimization techniques, based on domain-specific workload scenarios.

Page 5: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 5

2 Summary description of project context and objectives

2.1 Motivation and context

High performance usually causes high power consumption. Especially in embedded system

design today one main objective is to find the perfect balance between performance and

power for a given design. One strategy is to optimise the performance everywhere where

speed is absolutely needed and to design everything else for low power. This helps to achieve

the ultimate goal of handheld embedded devices, ensuring the required performance with the

longest possible battery life.

Figure 1 shows the key challenges for the embedded mobile device industry. The performance

growth of mobile phones per generation from analogue (1G) over GSM (2G) to high

bandwidth (3G and 4G), clearly exceeds Moore’s law of technology development. Instead it

follows the much faster Shannon’s law of application development, predicting a complexity

doubling in 8.5 month. Since more than 30 years now the integration density of integrated

circuits approximately doubles every 18 months (Moore’s Law). In contrast, battery makers

need 5 to 10 years to achieve comparable increase in power density, and memory access time

performance doubles every 12 years only.

Figure 1: Key Technology Gaps

The gaps in this figure define the challenges which the industry is facing: 1. Algorithmic

complexity gap, 2. Microprocessor and memory bandwidth gap, and 3. Power reduction gap.

In order to keep up with the rapid technological advances, system design methodologies and

EDA support were always forced to evolve in the past. Without the support of design

methodologies and appropriate tool support the design gaps are becoming larger. In the

COMPLEX project we addressed all three gaps:

1G

2G

2.5G

3G

4G

1

10

100

1.000

10.000

1.000.000

10.000.000

100.000

Performance Shannon‘s law

2x in 8.5 months

Moore‘s law

2x in 18 months

1980 1984 1988 1992 1996 2000 2004 2008 2012 2016 2020

Memory access time

2x in 12 years

Eveready`s law

(Battery energy density)

2x in 10 years

Power

reduction

Algorithmic

complexity

CPU-memory

bandwidth

Source: Jan M. Rabaey

Page 6: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 6

We observe a rising complexity of applications and execution platforms. The gap

between these complexities boosts the uncertainty of platform selection and

application to platform mapping, and requires EDA tools and methods that are fast

enough to cope with the very size of recent algorithms.

Since the power reduction gap is the next main limiting factor, a balance between

performance and power needs to be found early in the design process. This can only

be performed under explicit consideration of the application. To ensure a sufficient

power control, tools and methods have to be not only fast enough, but also accurate in

prediction and efficient in optimization.

The gap of memory access times requires smarter memory organization. Since this

influences both power and performance a multi-objective design space exploration is

required.

In custom hardware design, advances in performance and power consumption have been

mainly influenced by new technologies and an evolution in design methodology. The latter

was achieved by several important steps in climbing up the level of abstraction for the design

entry. All these steps gave the productivity in hardware design a great boost and made it

possible to manage the steadily growing complexity of integrated circuits. Software

processing units like microcontrollers, SIMD processors or DSPs have been made more

efficient. Modern platforms support advanced power management capabilities with dynamic

frequency scaling (sometimes independent for processing and memory subsystem) and power

islands that needs to be effectively controlled to maximise the optimization potential.

For these reasons it was high time to initiate the next ground-breaking step in addressing the

formulated challenges in a holistic approach. Thus, there is a need for a new evolving design

entry at system level where hardware and software are described in the same way. The latest

trends in software engineering and hardware design have some commonalities which might be

not apparent on the first look. In the hardware and software world we observe the introduction

of methodologies that separate the functionality/algorithm from the concrete implementation

platform. The HW world calls this evolution ESL (Electronic System Level); the SW world

calls this model-based design or MDE (Model-Driven Engineering), with Model-Driven

Architecture (MDA). Both follow the Y-chart approach (separation of functionality – what –

from the implementation platform – how).

The availability of this new design entry in conjunction with the traditional bottom-up

approach defines a new viewpoint for the design of embedded systems: How to find a

mapping and implementation of an application onto an execution platform that fulfils all

functional and extra-functional requirements at minimal cost. To avoid expensive redesigns

and costly code modifications, the platform decision should be done before investing money

into a concrete target platform. For this purpose reliable information about the execution

behaviour of the application running on the platform in terms of functionality, performance

and power consumption are absolutely mandatory.

Until now no generic framework for accurately and jointly estimating performance and power

consumption of complete embedded systems has been available at the algorithmic level.

Available point-tools and predesigned system components need to be bundled and properly

integrated into a holistic framework for platform-based design space exploration:

Behavioural synthesis for the generation of custom hardware from algorithmic

descriptions

Page 7: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 7

Dedicated embedded SW compilers for the generation of executables from algorithmic

descriptions

Portfolio of HW intellectual property, ranging from microprocessors, DSPs,

memories, on-chip communication structures, communication peripherals, and domain

specific accelerators.

Virtual platforms for early system simulation, analysis and integration of HW and SW

components without using costly test-chips.

The motivation of COMPLEX was to build a framework on top of these assets that supports a

software-like design entry, integrates platforms and software development tool-chains from

different European providers, and incorporates European EDA tools and know-how in the

area of power and timing estimation of HW, SW and run-time power management. The

project outcome is the connection of this framework to the next-generation system

specification and design methodology, the automatic generation of an efficient executable

virtual system, giving accurate and reliable timing and power information, and the integration

of an automatic design space exploration for finding the optimal design space instance

parameters.

2.2 Objectives

The primary scientific and technical objective of COMPLEX was to develop an innovative,

highly efficient and productive design methodology and a holistic framework for iteratively

exploring the design space of embedded HW/SW systems. This objective has a strategic

dimension, since platform providers, EDA providers and system integrators would benefit

from this framework likewise. These companies are seeking short to mid-term consolidation

and growth of their market shares in business sectors such as telecom, consumer and

automotive electronics, in which the European industry holds world-wide technical excellence

and commercial leadership.

The R&D activities which have been performed in COMPLEX targeted new modelling and

specification methodologies by using software like MDA design entry for system design as

well as the integration of HW and SW timing and power estimation in efficient virtual system

simulation, and also multi-objective design-space exploration under consideration of run-time

management for power and performance optimizations. Specifically, the scientific and

technical results of the project are:

Highly efficient and productive design methodology and holistic framework for

iterative design space exploration of embedded HW/SW systems. The resulting

framework is platform vendor and application domain independent, provides open

interfaces, and enables the integration of new industry players.

Combination and augmentation of well-established ESL synthesis & analysis tools

into a seamless design flow enabling performance & power aware virtual prototyping

from a combined HW/SW perspective

Proposal of an interface to the next-generation model-driven SW design approach

(MARTE/UML) and the industry standard Matlab/Simulink model-based design

environment. This seamless design entry lowers existing barriers between HW and

Page 8: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 8

SW developers, allowing SW designers to take more influence on the exploration of

the HW platform.

Multi-objective co-exploration to assess the design quality and to optimize the system

platform with respect to performance and power.

Fast simulation and assessment of the platform at ESL with up to bus-cycle accuracy

at the earliest instant in the design iteration.

Optimization benefits from run-time mode adaptation techniques, such as dynamic

power management or application adaptation to varying workloads.

Demonstration of the accuracy and ease of integration of existing EDA tools within

the new methodology and framework by comparison with state-of-the-art reference

methodologies.

Demonstration of the applicability and effectiveness of the new methodology and

framework through validation against measured data and/or available power and

performance characterized virtual platforms.

Demonstration of the usability and effectiveness of new design methodologies, tools

and framework by their application to industry-strength design cases made available

by some of the project partners.

Distinguishing feature of the R&D approach of COMPLEX is that it unifies the development

and integration of next-generation MDA design-entry with platform-based design, existing

EDA techniques and tools for estimation and model generation for virtual system prototypes,

and a multi-objective design-space exploration technique and tool. This enables a synergic

approach to a holistic embedded HW/SW virtual system prototyping approach regardless of

the target platform and application domain.

The COMPLEX design framework has been developed by research, industry and EDA

partners, ensuring its usability in realistic, industry-strength design flows and environments,

thus allowing the industrial partners to take advantage of the new solutions during the course

of the project and to apply the new tools for production purposes shortly after project end.

The technical objectives highlighted above constitute a prerequisite for the commercial targets

of the industrial partners, which are geared towards an improvement of their (and their

customers) competitiveness in the world-wide market of electronic products and applications.

One additional objective of COMPLEX has been the Europe-wide dissemination of the

valuable know-how and competence that each single partner has acquired through R&D effort

performed by all the researchers, designers, application, and EDA engineers participating in

the project.

Page 9: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 9

3 Description of the main S&T results/foregrounds

The design framework proposed in the COMPLEX project is illustrated in Figure 2. As

described in the motivation and objectives above, we follow the PBD approach with a

separation of application (a) (e), architecture (d) (g), and mapping description (c). The

architecture/platform consists of pre-existing IP components like processors, buses, hardware

accelerators and memories, while the application describes how these resources are used to

implement certain system functionality. For the specification of different domain-specific

application workload scenarios, specified as use-cases (b) we propose to generate a system

input stimuli specification (f) for triggering the executable system model.

Figure 2 : The COMPLEX Reference Framework

The most important property of the proposed framework is that timing and power

characterisation is separated from application specification and development. This separation

allows platform providers to offer timing and power characterized virtual platform component

models (IPs) (k). Together with the estimated custom HW (i) and SW components (j) timing

and power aware executable virtual system prototype (n) can be generated.

Based on the simulation trace (o), obtained from executing the generated platform model,

analysis tools (p) can either generate a report or a visualization of the power consumption per

system component over time (q). The application of metrics on the trace is used to drive an

automatic or semi-automatic exploration and optimization process (r) that modifies different

design parameters (t) in a pre-defined design-space (s). These parameters can be applied on

Page 10: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 10

the MDA design entry model, executable SystemC model, or the estimation and model

generation tools.

The following sections give a more detailed description the different phases of our proposed

rapid prototyping framework. The definition of application, stimuli and platform

specification, and definition of tool interfaces can be found in [1].

3.1 MDA Design Entry

The COMPLEX modeling entry is supported by the COMPLEX UML/MARTE modelling

methodology [2][3] that includes a toolset fully integrated in the Eclipse framework [4]. This

toolset automates the generation of the code which serves to generate the performance

executable model. The UML/MARTE specification models both the system and the input

stimuli environment.

Among all the features, the following will be described in the next paragraphs:

Viewpoints for separation of functional and extra-functional concerns

Component-based design approach

Explicit support for Design-Space Exploration (DSE)

3.1.1 Viewpoints

The COMPLEX UML/MARTE methodology enables the specification of the different facets

of the system in different model viewpoints. The COMPLEX model viewpoints are the Data

Model View, the Functional View, the Communications and Concurrency (CC) View, the

Platform View, the Architectural View, and the Verification View. These views enable

separation of concerns and thus raise the level of abstraction as each view focuses on a

specific aspect of interest of the system. The COMPLEX UML/MARTE methodology also

defines the relationships among these views, and a workflow which guarantees them. This

enables the designer to build a synthetic model (avoiding possible redundancies and thus

coherence checks) and enables a cooperative workflow where the application and platform

can be captured in parallel.

The separation of concerns is given at several levels of the model design. The system (i.e.

application mapped onto platform resources) is separated from the environment. Within the

system model, the platform specification (i.e. processing resources, operating system) is

separated from the model of the application.

Finally, within the application, data structures, functionality (interfaces and classes) and

application components are also separately captured.

The extra-functional properties of the application are specified on the CC view, while the

platform extra-functional properties are described in the Platform view. The Architectural

view provides information about the allocation of application components onto the platform

components and the DSE parameters and metrics can be reflected.

Page 11: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 11

3.1.2 Component-based design

The COMPLEX UML/MARTE methodology follows also a component-based approach. At

the application level, the designer encapsulates the functionality using application components

in the CC View. A system application is captured as a component, and instances of application

components are used to capture the application architecture. This component represents the

Platform Independent Model (PIM) according to the MDA paradigm.

The platform architecture is captured in the Platform view by means of instances of SW and

HW platform components, e.g. RTOS component instances, and instances of HW processor

components. This architecture represents the Platform Description Model (PDM) according to

the MDA paradigm.

The component is the elemental unit used in the COMPLEX UML/MARTE methodology for

deploying functions onto processing resources (i.e. microprocessors, FPGA) in the

Architectural view. For instance, application components can be mapped onto the platform

components which represent processing resources.

3.1.3 Use-cases, scenarios, verification

Finally, the COMPLEX UML/MARTE methodology allows designers to specify the input

stimuli in a separated view, the Verification view.

As mentioned, this view supports the specification of a set of environmental components and

how they connect with the system component. This view provides a description of the

interactions (through sequence diagrams) among the environmental components and the

system component as the sequence of ordered messages in the context of a use case scenario.

Timing information and ordering constraints among environment events are captured in the

sequence diagrams, so that it enables the documentation and generation of realistic use cases.

This is crucial for the dynamic performance estimation enabled by the executable model

derived from the UML/MARTE system model.

The methodology enables the definition of multiple scenarios that represent different use

cases of the system. Designers may choose any scenario from those modeled in the

Verification view to generate the performance executable model and explore the design space.

3.1.4 Support of Design-Space Exploration

The COMPLEX UML/MARTE methodology has been explicitly designed for supporting

design space exploration. Specifically, the methodology supports the specification of a design

space, i.e., a set of design solutions, rather than a single design. The description of such

design space is enabled by means of defining: a set of architectural mappings (allocation

space), a range of values for platform attributes (parameters of the space), and several

platform architectures (architecture space).

In order to specify this design space, the methodology relies on the MARTE profile and

proposes new stereotypes for the missing semantics (i.e. DSE, IP-XACT concepts), which are

proposed as a necessary enhancement of current capabilities of MARTE for embedded system

design. Moreover, the COMPLEX UML/MARTE methodology supports also explicit

constraints and rules that can be used by the designers to limit the space of solutions to those

that are of main interest.

Page 12: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 12

The methodology also supports the specification of system local metrics. In contrast to global

system metrics, such as total power consumption, system dependent metrics depend on each

specific system model. For example, the latency metric for servicing a specific function of an

application component or the miss rate of the instruction cache of a given processor of the

platform. This feature provides a capability of paramount importance: all the aspects of the

system with some impact on its final performance: application and platform, SW and HW,

architectures and architectural mappings, component attributes and different types of metrics,

form now part of the DSE loop and therefore can be optimized at once, under a real holistic

approach.

3.1.5 MDE tools and flow

GMV and University of Cantabria have participated together in the development of the

COMPLEX design framework based on UML/MARTE and Eclipse technologies. Both

partners have defined a methodology for UML/MARTE modelling oriented to the support of

design space exploration and a set to generators capable of mapping the information described

in the model to the inputs required by the tools of the flow.

In order to evaluate the different design space points, the University of Cantabria have

developed the SCoPE+ simulator. This simulator provides all the required resources needed to

simulate and evaluate the performance of the platform independent code when executed under

the different configurations described in the system model, including different allocations

(HW and SW) and communication semantics.

All these tools has been integrated with the design space exploration tool MOST, developed

by POLIMI, generating an automatic exploration framework, which can be controlled by an

Eclipse GUI, minimizing the effort an knowledge required by the user.

Finally, as part of the exercising of the COMPLEX methods and tools, GMV has

implemented a space domain use case in order to demonstrate the proposed COMPLEX

solutions.

As it is well-known, for on-board systems, performance and power consumption are critical

requirements. Thus, the UML/MARTE evaluation framework covers these goals in the scope

of COMPLEX, managing them at different levels:

The Model Driven Engineering (MDE) methodology developed within COMPLEX by GMV

and UC provides techniques, methods and tools to model both the system functions and the

hardware platform. Moreover, it allows the specification of the system functionality allocation

to platform resources, enabling the exploration of different allocation schemes. The modelling

methodology offers to designers a set of advanced features specifically suited for enabling the

Design Space Exploration (DSE) which make possible to describe a complete space of

possible design solutions to be explored.

By means of UML/MARTE system models, designers and system architects are allowed to

model system functional and non-functional properties, as well as the requirements the system

must fulfil. Additionally, they can also define the system stimuli environment to enable the

later system simulation and performance and power estimation.

For that methodology, new modelling stereotypes have been defined in order to enable

designers to capture design space parameters used for defining different design alternatives

and to organize all the information in different, simpler views. These different design

Page 13: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 13

alternatives define different HW resources characteristics (i.e. frequency), or different

application-HW/SW resources allocations.

According to the information captured in the COMPLEX UML/MARTE model, some code

generators have been implemented in order to extract all the relevant information, which

enables the execution of the DSE process.

In order to execute SCoPE+ simulations, the eclipse generators create the system object files

and the XML files required as inputs. To compile the object files with the system code, the

generators create different wrappers to integrate and communicate the components, using the

macros provided by SCoPE+ for such purpose. At the same time, they create the Makefiles

required to compile and link these files together with the functional code provided by the user,

creating the executable files. Additionally the “System Description” XML file is generated.

This file include the definition of the different HW/SW components used for implemented the

HW/SW platform and the description of the application components and the mapping of these

application components onto the HW/SW platform resources.

For automatic exploration, a file describing the design space is also created. The “Design

Space” XML file includes the specification of all the DSE parameters and DSE rules required

for DSE process. The DSE parameters are exploration variables which covers HW

characteristics (frequency, memory size…) and application-HW/SW resources allocation. The

DSE rules enable to constraint all the possible combinations of these DSE parameters

according to logical expressions.

Figure 3: COMPLEX UML/MARTE-based design flow

In addition to that, UC has developed the Stimuli code generator. The Stimuli Scenarios code

generator produces the necessary infrastructure to excite the system during the simulation and

performance analysis, and producing the skeletons of the stimuli scenarios so that the

verification engineer can insert the necessary behaviour. The stimuli scenarios enable a DSE

loop to simulate some significant execution scenarios to obtain representative system metrics.

The Stimuli scenarios code generator has been developed as a set of generation templates

written in the standard MTL language. The Stimuli scenarios code generator is included in the

plug-in and integrated in the Eclipse application as an extension plug-in. By means of the

COMPLEX Eclipse Application

Model-Driven Architecture (MDA) Entry

PDM PIM

PSM

Final platform

(after DSE)

Makefile

XML

Performance

Analysis

SCoPE+Containers

Source

Code

Design

Space

Exploration

MOST

Script

DSE Loop

Page 14: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 14

plug-in extension, the stimuli code generator can be triggered; obtaining the corresponding

files that enables the simulation of the set of stimuli scenarios modelled by UML/MARTE

models.

Finally, An IP/XACT generator has been developed and integrated in the COMPLEX plug-in.

The MARTIX generator is a tool able to automatically produce the IP/XACT description of

the HW platform from a COMPLEX UML/MARTE description of the system. This option

enables the user to select the use of SCoPE+ proprietary XML format or IP-XACT format for

connecting the Eclipse application to the simulator, which also enables the connection of the

UML/MARTE frameworks with other tools, such as Synopsys.

All these code generators have been developed as a set of generation templates written in the

standard MTL language [20]. The development has been done through Acceleo [19], a code

generation framework fully integrated in Eclipse. Using that, these XML generators have

been implemented as an independent plugins integrated in the COMPLEX Eclipse

Application as a plug-in extension.

MDE level process is fully integrated in an Eclipse based framework through the COMPLEX

Eclipse Application. For that purpose, a GUI has been developed adding to Eclipse the menus

and options required to perform the entire process.

Figure 4: COMPLEX Eclipse Application GUI

The MDA methodology and tools developed in the scope of COMPLEX directly impacts the

development of the system by providing early estimation of performances and power

consumption.

The SCoPE+ tool is a high-level, fast simulator developed to provide early timed virtual

platforms where designers can perform the SW design process considering HW

characteristics. SCoPE+ tool is capable of obtaining the expected performance metrics of the

simulated configurations, providing the user with the needed information to optimize the final

product. SCoPE+ tool works on top of SystemC, so specific HW components can be

Page 15: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 15

integrated in the internal TLM HW platform model, together with the generic components

provided by the tool.

SCoPE+ is based on the previous SCoPE tool, a virtual simulator based on annotated native

simulation. However it adds novel critical features to improve the design process at early

design steps. Previous fast simulators, such as SCoPE, are oriented to enable easy exploration

of the design space. However, they have a great drawback: they require a completely refined

SW code for performing the simulations. Inter-thread, inter-process or distributed

communications must be completely fixed in the source code before performing any

simulation. As a result, the real exploration effort can be huge, since the development of all

the codes required to provide communication and concurrency in the system for each explored

configuration can represent a tremendous work.

SCoPE+ plus overcomes this limitation since it only requires the platform-independent

functional code of the system components. All additional elements for providing

communication and concurrency in the system are automatically generated at simulation time

from the information obtained from the system description. For such purpose a new interface

has been generated, capable of receiving and managing all the information contained in the

UML model. From that information, all the system wrappers required to interconnect the

functional components are automatically created for each simulation, adapted to the

configuration to be explored on each execution.

As a result SCoPE+ provides new features, enabling the definition of multiple possibilities in

the UML model that can be simulated later without additional user effort. This is especially

interesting when considering communication and concurrency features. System functional

services can be identified as cyclical, with different periods, or sporadic. Communications can

be defined as synchronous or asynchronous, protected or not, etc. Additionally,

communications are modelled differently depending on the allocation of client and server

components.

At the same time, SCoPE+ enables considering different implementation alternatives during

the exploration. Allocations of the same components to different processor types, which imply

different annotations, can be handled by the tool in a single exploration. Additionally, HW

allocations of the different components can be simulated together with SW allocations of this

and other components.

SCoPE+ has been connected with the UML/MARTE flow in order to automatically receive

all the information of the high-level model, without requiring intermediate manual operation.

It has also been integrated with MOST, which automatically launches all the SCoPE+

executions required to perform the explorations required by the user. All the compilation

scripts are also generated from the UML/MARTE model, so the use of the SCoPE+ can be

hidden by the COMPLEX Eclipse Application GUI, since no manual intervention is required

during the simulation process.

3.2 Executable Specification

The output of the MDA entry is an executable model generated from the PIM. The PDM is

used to generate a structural platform model with virtual processing, memory, and

communication elements. These are used to model the resource constraints of the execution

platform. The generated executable specification (Algorithm Domain), platform description

model (Architecture Domain), and platform mapping are shown in Figure 5.

Page 16: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 16

Figure 5: Example of an Executable Specification and a Platform Mapping

3.2.1 Executable application model

In our executable application description model we perform a separation of behavior

(computation) and protocol (communication). Our concurrent building blocks are tasks or

processes that contain a behavioral and a protocol part.

The behavior part describes the function or algorithm to be executed, written in sequential

C/C++ code. This description is independent from an implementation in either HW or SW.

Behaviors can be composed of functions and describe a pure sequential execution order. This

enables reuse of existing software descriptions. Moreover, the tools mentioned in the concept

and motivation above allows synthesis to a C/C++ representation. An abstract task describes

a “Runnable” i.e. a process. Each abstract task contains a single behavior. Abstract tasks can

either be active or passive. An active task starts running immediately after its activation and

can either be blocked through a communication request or when its computation is finished.

An active task can be (self) triggered again after a certain amount of time (time triggered task,

or periodic task). Passive tasks can only be triggered by active tasks through explicit requests.

A passive task cannot trigger itself and it cannot trigger any other passive task.

The protocol part describes communication among behaviors. It is realized through a port that

allows active tasks to call service on another passive task's behavior. These calls are blocking,

i.e. the caller's behaviour can be continued after the service call has been completed. When

multiple active tasks are requesting a service call of the same passive task a scheduling action

is required. More details about this can be found in [5][6]. These service calls abstract from a

certain communication protocol implementation in either HW or SW.

3.2.2 System input stimuli

In order to examine and analyze the parallel application description model under a certain

workload scenario, it needs to be stimulated accordingly. The system stimuli might originate

from user interaction or communication with other components that are part of the system's

Page 17: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 17

environment. These stimuli describe use-case scenarios and can be derived from a

UML/MARTE use-case specification or from an environment model in Matlab/Simulink.

3.2.3 User constrained HW/SW separation & mapping

The user-constrained HW/SW separation and mapping defines the binding of tasks from the

application model to execution resources of the architecture/platform description model.

Active and passive tasks can be mapped to execution resources, while passive tasks can only

be mapped to memories.

3.2.4 Architecture/Platform description

The platform description model is composed independently from the application model. It is a

pure structural and extra-executable representation of the execution platform consisting of

execution resources (like SW processors, DSPs or ASICs), memories, communication

resources (like shared buses), and pre-existing IP components. In addition, constraints that

have a direct influence on the timing and power consumption are represented in the

component's meta-description. For SW processors it is the instruction set architecture (ISA)

including its pipeline behavior, power modes, data and instruction cache models, and bus

interfaces. For custom HW the used RT component library and for communication resources

scheduling policies for shared media have to be specified.

The principal result for Magillem is the integration of our IP-XACT tool-chain in the

COMPLEX flow. More specifically our tools are now capable to import high level description

coming from MDE (Model Driven Engineering) world and link these data with low level

description of the models of hardware components (SystemC models like in Synopsys tools).

This work is crucial because hhigh-level modeling and specification languages are becoming

commonly used in the embedded systems industry. UML/MARTE models are now used to

address design complexity, but the gap from high-level description to the detailed description

of IP components or design is too wide. Moreover, the generation of simulated platforms is a

very tedious and error-prone task, as it depends on a given platform architecture. The high

level description of the platforms doesn’t contain all required information for the generation.

If we consider, in the context of the COMPLEX project, the direct generation of the Synopsys

Virtual Platform given an UML/MARTE high-level description (see Figure 6) the following

issues are raised:

The lack of transformation techniques between components descriptions in the

UML/MARTE editor given PCT scripts (1).

The lack of transformation techniques between an UML description of the platform

and its generation technique given scripts (3).

All the transformation techniques are manual (1) (3) (4).

The referencing of components between the different levels is not maintained.

Page 18: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 18

Figure 6: The Generation model flow without IP-XACT

The IP-XACT tool-chain addresses the construction issues of abstract model levels with its

relevant data, especially in the high-level model, that is, the gap to be felled between

specification and implementation, and automation of the transformations. The IP-XACT is

based on a centric representation of the data, using the new IEEE 1685 (IP-XACT) standard.

The purpose of the COMPLEX IP-XACT tool-chain is to provide a mechanism for generating

a SystemC description of virtual platforms from a high-level UML/MARTE description

through IP-XACT. IP-XACT is an XML schema for describing the HW system architecture.

Thus, it represents a good centric schema as it handles the structural description of all the

system, from functional level down to implementation levels, and managing the hierarchical

dependencies. The IP-XACT tool chain transformation mechanisms take into account

transformation rules that are defined to generate IP-XACT descriptions from UML MARTE

designs.

The IP-XACT-based transformation steps are as follows (see Figure 7):

1- High level specifications are derived given platform components PCT scripts.

2- IP-XACT description of components is derived given the high-level specifications.

Each IP-XACT description references the corresponding PCT scripts initial reference.

3- UML models of components are derived given the IP-XACT components descriptions.

4- UML/MARTE users define components assembly for the platform architecture.

5- The architecture specification is derived from MARTE to IP-XACT.

6- The transformation mechanism from IP-XACT to the Synopsys Virtual Platform is

done by providing the PCT scripts of the design elements.

Page 19: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 19

uP

BUS

UART

Periph.

uP

BUS

UART

Periph.

NameTypePortsInterfaces

NameTypePortsInterfaces

NameTypePortsInterfaces

NameTypePortsInterfaces

NameTypePortsInterfaces

NameTypePortsInterfaces

UML

IP-XACT

PCT-Script

IP description in library IP Assembly

Specifications NameTypePortsInterfaces

NameTypePortsInterfacesFileSet

NameTypePortsInterfaces

3

2 4

5

Ref.

Ref.

6

Script

Ref.

Ref.

ScriptScriptScript

Script

1

Figure 7: Information links between abstraction layers

The IP-XACT specification is a backbone for federating the heterogeneous data manipulated

by design, implementation or verification teams/tools. We propose a centric description of the

architecture for also federating tools of the framework.

3.3 Estimation & Model Generation

In order to allow a fast simulation and estimation, we create annotated C/C++/SystemC code.

This annotated code contains information about the timing and power of each component.

Power and timing information for each of them is obtained using existing and sophisticated

tools.

As depicted in the motivation a realistic system consists of components of different type e.g.,

custom hard- and software as well as IP-components, like communication infrastructure. Each

component is estimated individually, using an appropriate tool. Based on the estimation an

augmented version of the component is created, containing a power and timing model of the

component. From these annotated components a virtual prototype is generated. This prototype

is used to estimate the power and timing of the overall system. In the next four paragraphs we

describe these steps in more detail.

3.3.1 Hardware/Software task separation

Depending on the user-defined mapping, each behavior of the parallel application description

is estimated with an appropriate technique. The tools, used for HW & SW estimation and

characterization, perform a simulation-based estimation. Thus, each component must be

simulated and characterized individually. The system is split into individual components. The

surrounding system serves as testbench/test environment during the simulation. This way we

can simulate each component individually and still obtain estimates, which correspond to the

behavior of the overall system.

Page 20: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 20

In this generation flow, the SMOG tool (developed by OFFIS) is in charge of task separation

and virtual generation tasks, including the synthesis of the TLM interfaces that are needed to

interconnect the system components, and specifically the TLM2 IP components. SWAT tool

is in charge of SW estimation, while HW estimation is performed through PowerOpt+.

Therefore this integration enables a performance simulation based on native simulation of

software and post-synthesis custom HW estimation.

Figure 8: System generation front-end, task separation and interface synthesis.

More details auf the hardware/software task separation can be found at [14].

3.3.2 Custom Hardware estimation

Timing and power estimation of application specific hardware designs can be done at nearly

every level of abstraction from transistor- up to behavioral level at ESL. Since we address

behavioral tasks that are mapped to hardware and are meant to be implemented in custom

ASIC hardware, we only address behavioral-level estimation here [7].

To consider the challenges of ASIC power-modeling mentioned in Section 2 OFFIS

combined synthesis with cycle-accurate simulation at RT-level and a subsequent phase of

basic-block identification and power/timing annotation. Although extensive power estimation

at RT-level is very time consuming and thus not applicable in HW/SW co-simulation, it can

be used as characterization approach for higher level estimation. This is why we apply lower-

level estimation provided by the OFFIS PowerOpt+ tool to a small but typical testbench and

derive cycle-averaged power estimates. These power values will then be further abstracted to

basic block level and annotated to the internal control- and dataflow graph-representation. We

differentiate between dynamic and static power as well as its source (e.g., functional units,

controller, or clock tree). Leakage power at RT-level is nearly independent on data pattern [8]

Page 21: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 21

(variation of 15 %) and thus it mainly depends on elapsed time whereas dynamic power

depends on the testbench stimuli.

Figure 9: Hardware characterisation flow

For the proposed estimation and characterisation flow, as it is shown in Figure 9 the

components of the design that should be implemented as full-custom hardware are given as a

synthesisable C description, respectively. This description is transformed by the synthesis tool

into a control and data flow graph (CDFG), containing all information about the functional

behaviour. During high-level synthesis this CDFG is transformed into a RT data path. The RT

data path consists of a set of parallel running processes. Each process has its own local

controller and private registers. A design may have several instances of the same process. All

instances perform the same behaviour, but each instance must be estimated and characterised

individually, in order to consider different data dependencies of the individual process

instances. Inter-process communication is performed using a simple hand-shake protocol. For

the generated data path a floor planning is performed, too.

For the functional model of the design, the RT data path is analysed and hardware basic

blocks are identified. The functional model will cover all aspects related to the behaviour of

the design e.g., dynamic power dissipation or timing. The extra-functional model is created by

analysing the synthesis artefacts caused by module selection or by the floor planning, for

example. These artefacts include leakage, clock-tree and controller power, for example. Both,

the functional model in terms of identified hardware basic blocks, as well as the non-

functional design-characteristics, are used to create an augmented SystemC description of the

overall design's full-custom hardware parts.

HW taskC

DF

GB

asic

blo

ck in

form

atio

n

SystemC frontend

High-level synthesis

Hardware basic block

identification

Hardware basic block

characterisation(dyn. Power, timing, …)

Controller synthesis

Design characterisation(leakage, clock-tree power, controller power,…)

Block annotated C++ writer

Augmented SystemC(HW-BAC++)

RT data path

Ex

isti

ng

flo

wN

ew

ap

pro

ac

h

Page 22: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 22

The presented custom hardware timing and power estimation technique supports the creation

of virtual prototypes for embedded full-custom hardware modules. Based on the automatically

generated cycle-accurate functional description at register transfer level a characterization of

the module is performed and a high-level, C++-based virtual prototype is generated. It is

augmented with RT level accurate power and timing information. First experiments on data

intensive hardware accelerators show a fast and accurate estimation of power properties with

a total error of about 3.6 % and a speed-up of approximately 192 compared to an RT-level

estimation, while obtaining cycle accurate timing information. These properties support early

design space exploration [9].

3.3.3 Software estimation

At the intermediate level of the COMPLEX tool chain, and integrated with software

generation, hardware model characterization and design space exploration, the detailed

software estimation toolset SWAT (SoftWare Analysis Toolset) plays a key role. Its main goal

is to provide more accurate and detailed performance and energy estimates with respect to

those obtained using higher-level and more abstract models [10].

More specifically, SWAT is a collection of tools for embedded software execution time and

energy consumption estimation and optimization. Each tool performs elementary operation

such as static and dynamic model construction, energy estimation, software analysis and

reporting, back-annotation and so on. Such tools have been organized into "core" flows

designed to have seamless mechanisms for the integration with other tools of the COMPLEX

flow. In detail, the SWAT tool-chain implements modelling, estimation and optimization

techniques for embedded software applications written in pure C code. The tool chain is

organized into a front-end, responsible for the modelling phase (target processor model,

source static model), a set of “core” flows implementing the different functionalities of

SWAT, and a post-processing engine, necessary to analyse the execution traces (event traces).

Figure 10: SWAT flow

Page 23: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 23

Figure 10 depicts the SWAT flow and is described in the following paragraphs:

Target processor characterization flow. This flow has the goal of expressing the execution

time and energy consumption characteristics of the target core in terms of LLVM instructions.

The input of this flow is an instruction-level characterization of the target processor (provided

by the vendor) and the output is an abstract model expressed in terms of the LLVM

instruction-set.

Estimation flow. It provides the functionality for performing dynamic estimation of the

execution time and energy consumption of a given application executed with a specific set of

data. The models involved in this process are data-independent and thus using different set of

data does not require changing the models but only re-running the instrumented application.

The input is a set of C source files and the target processor model as derived by the

characterization flow, and the output is an overall estimate of execution time and power.

Analysis and back-annotation flow. The models generated in the front-end phase of the

estimation flow can be analysed in further detail do derive different static and dynamic

metrics. Such metrics are then summarized either in the form of an html report or as back-

annotated source code.

Optimization flow. Three are the types of optimization implemented: (i) an experimental C-to-

C optimization implemented on top of the LLVM-opt tool integrated with MOST tool to

perform iterative compilation; (ii) a power optimization on the selection of the CPU operating

mode (both in terms of voltage and frequency) to be assigned to each function/group of

functions; (iii) a suggestion-oriented optimization used to annotate the critical portions of the

source code with the most suitable transformations to be applied.

Instrumentation and trace flow. This flow has the goal of tracing specific information during

the execution of the application. The flow is split into a static, rule-based instrumentation

phase and in a dynamic, optional execution phase. If used as a standalone tool, both phases

are executed and the resulting event trace is fed to a post processor to collect statistics. If used

in conjunction with other COMPLEX tools, only the instrumentation phase is necessary. This

phase produces as output a binary library (or set of object files) implementing the

instrumented version of the application. Such a library is the linked with other part of the

system’s executable models to allow a complete system simulation and estimation.

The SW power estimation methodology has been applied to several benchmarks and the

estimates compared with the most accurate available figures, i.e. those obtained with a target-

specific, power-enabled instruction-set simulator [11]. The absolute estimation errors obtained

on a set of several benchmarks ranges from less than 2% up to 13%, with an average of 6%.

Regarding memory, PoliTo has focused on the optimization of the memory sub-system of a

typical embedded device, where energy (primarily) but also other metrics such as reliability

(in terms of device aging) have been considered as optimization metrics.

From the architectural standpoint, the optimization of the memory subsystem relies on a

single architectural transformation consisting of implementing the address space as a multi-

bank memory instead of a single-bank monolithic one. We speak therefore of memory

partitioning for what concerns the basic transformation used in our optimization strategy.

Partitioning is beneficial for energy (and other metrics) because the distribution of accesses to

the memory is not uniform: because of the well-known locality principle, some locations are

accessed more often than others. This property, used together with the fact that the

Page 24: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 24

performance and energy cost of accessing a memory increases proportionally with its size,

allows to optimize the common case (a classical low-power optimization principle) by

accessing a smaller block (one bank) for most of the time. The non-accessed banks can be

then power-managed to reduce energy.

Figure 11 shows a conceptual drawing of the partitioning idea for the case of 2 partitions.

Figure 11: Conceptual idea of memory partitioning to improve energy and performance

It is clear that the second configuration is more convenient as a larger number of accesses fits

into a smaller memory. The implementation of the above architecture requires minimal

encoding circuitry to drive an address to the correct sub-block. Notice that the partitions are

mutually exclusive, so only one of the banks is active at any given time.

In COMPLEX, we have implemented the above scheme by generalizing it into an

optimization tool (MEMOPT) that derives the energy-optimal partitions by analysis of the

trace of the memory accesses. The tool (see Figure 12) takes the dynamic execution trace of

the program in input, and based on a set of command-line switches, computes an optimal

(energy-wise) partition of the scratchpad. Additional inputs include (see dotted arrows on the

right side of the box) technological data and the address range. The tool returns

power/energy/performance figures (metrics) and, as a side output, the resulting memory

configuration (dotted arrow).

The tool has only one option suitable for exploration (Max), that is, the number of maximum

sub-blocks in which the memory can be split. This is required to avoid arbitrarily fine

partitioning that will have too high partitioning overhead.

Page 25: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 25

Figure 12: Inputs and outputs of the MEMOPT Tool

The MEMOPT tool can also operate as a pure modeling tool and it is structured in such a way

that it can invoked to run optimization or simply to yield the energy and performance/aging

cost, for a given input trace.

Figure 13: MEMOPT View as a Simulation and Optimization tool

Figure 13 shows the internal architecture for the MEMOPT tool. On the left side, the

“simulator” contains the memory models (for the various metrics); based on the input trace,

and the technological data (there are in different models for different technology libraries) and

the size of the memory, the tool returns the total (for the executed trace) execution time,

lifetime, and consumed energy and power.

This modular architecture also allows easy integration into the COMPLEX flow. Specifically,

MEMOPT has been used into the main loop of the Design Space Exploration (DSE) tool

developed in COMPLEX. Figure 14 shows the conceptual integration of MEMOPT (in the

“optimization” version).

Page 26: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 26

Figure 14: Integration of MEMOPT with MOST

3.3.4 Pre-existing IP & virtual component models

Despite custom HW and SW, pre-existing or third-party components must be considered.

Models for communication infrastructure like buses are provided by different vendors and

typically are provided as parameterizable soft-macros, allowing an adaption to the system to

be built. The macros already contain timing information but typically no information about

power. Thus, communication power is estimated based on the TLM-2.0 transport calls.

Calculation accounts for the size of transferred data, type of access, as well as duration of the

communication. Interruptions and re-arbitration events are also considered and must be

delivered by the communication model.

General IP components delivered by third-party vendors cannot be estimated like custom HW

and SW components. System-level representatives of these IP's are typically provided as

black-box executable models (e.g. API to a compiled object-file). These black-box modules

typically contain timing but no power information. In order to obtain at least approximated

power values a simple wrapper or monitor is used, which monitors the components in- and

output. This information is used to control a power state machine (PSM) [12] inside the

monitor. Power states and power values of the PSM are either obtained from the component's

data-sheet or estimated manually.

Latest results on an abstraction methodology for generating time- and power-annotated TLM

models from synthesisable RTL descriptions can be found in [13]. The proposed techniques

allow the integration of existing RTL IP components into virtual platforms for early software

development and platform design, configuration, and exploration. With the proposed

approach, IP models can be natively integrated into SystemC TLM-2.0 platforms and

executed 10-1000 times faster compared to state-of-the-art RTL simulators. The abstraction

methodology guarantees preservation of the behavior and timing of the RTL models. Target

technology dependent power properties of IP components are represented as power state-

machines and integrated into the abstracted TLM models. The experimental results show a

relative error less than 10% of the abstracted model's power consumption compared to state-

of-the-art RTL power simulators [15].

3.3.5 Virtual system generation

During generation of the virtual system annotated source from (f) as well as the selected

models from (i) are combined to a virtual prototype. The example in Figure 15 shows the

virtual platform model obtained from the mapping specified in Figure 5.

Page 27: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 27

Figure 15: Virtual Platform generated from example shown in Figure 5

The timing and power annotated execution-models are integrated with timing and power

characterized platform models. In the example in Figure 15 these platform models are: a

TLM-2.0 router with bus protocol and power model, and a system memory model. For the

integration of the annotated task behavior with the TLM communication network we provide

communication interface (IF) templates. These interfaces translate the function calls of the

active tasks into TLM transaction containers. For passive tasks we synthesize a memory

interface which decouples the TLM transactions from the activation of the behavior. That

means the transaction is stored in the memory completely, before the passive task is activated.

These interface templates are timing and power characterized for the chosen platform.

More details about the task separation and virtual system generation can be found in [14].

Virtual platforms [23] together with SystemC Transaction Level Modelling (TLM) [27] are

becoming the key solution to address embedded software development and verification in

parallel with hardware development. The large adoption of virtual platforms is constrained by

some open issues:

1) Presence of legacy RTL (low level) IP blocks to be integrated into the virtual

platform which is usually modelled at transaction level (high level) to speed up

simulation. Hand-made transactors to adapt RTL blocks are inefficient and error-

prone. Model re-writing is expensive.

2) Presence of HW models not described in a HW description language. For instance,

Model-Driven Design is gaining attention for the development of embedded

components. Tools like Matlab/Simulink/Stateflow [28] can be used to describe

the event-driven behaviour of hardware components.

Page 28: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 28

3) Modelling of external communications in case of networked embedded systems.

Embedded applications are becoming more distributed being based on devices

which interact together over wired/wireless channels by using protocols like WiFi,

Ethernet, TCP/IP, CAN, FlexRay, and ZigBee. To achieve further optimization,

embedded software should be tested in the full network scenario.

HIFSuite is a set of tools and application programming interfaces (APIs) that provides support

for modelling and verification of HW/SW systems [25]. The core of HIFSuite is the

Heterogeneous Intermediate Format (HIF) language upon which a set of front-end and back-

end tools have been developed to allow the conversion of HDL code into HIF code and vice-

versa. HIFSuite allows designers to manipulate and integrate heterogeneous components

implemented by using different hardware description languages (HDLs). Moreover, HIFSuite

includes tools, which rely on HIF APIs, for manipulating HIF descriptions in order to support

model abstraction and post-refinement verification.

Figure 16: HIFSuite architecture.

HIFSuite plays two roles in the extension of virtual platforms:

HIFSuite translates IP core descriptions in any of the available HDL front-ends to

SystemC/TLM.

HIFSuite abstracts digital components from RTL to TLM [24].

As depicted in Figure 16, HIFSuite consists of:

The HIF core language and manipulation API: a set of HIF objects corresponding to

traditional HDL constructs as, for example, processes variable/signal declarations,

sequential and concurrent statements, etc.

Front-end tools that parse Stateflow, VHDL, Verilog and SystemC (RTL and TLM)

descriptions and generate the corresponding HIF representations.

Back-end tools that convert HIF models into SystemC models.

A set of tools developed upon the HIF APIs that manipulate HIF code to support

modelling and verification of HW/SW systems. In particular A2T is a tool that

automatically abstracts RTL IPs into TLM models.

Page 29: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 29

3.4 Simulation

3.4.1 Pre-optimized power controller

IMEC’s main contribution consists of a light-weight run-time resource management

framework for embedded heterogeneous multi-core platforms. It allows dynamic adaptation to

changing application contexts and transparent optimization of the platform resource usage

following a distributed and hierarchical approach. The framework consists of a Global

Resource Manager (GRM) that is running in parallel with the central manager of the

application on the host processor of the platform. The operating points of the GRM are

identified in a design-space exploration phase as a set of Pareto-optimal configurations of the

application and their impacts with regards to the quality of experience, performance, and

energy consumption.

The pre-optimized power controller implements a framework for dynamic adaptation to

changing context and transparent optimization of platform resource usage [18]. It follows a

distributed and hierarchical approach. On the one hand, a Global Resource Manager (GRM) is

loaded on the host processor of the platform. It is a software task running in parallel with the

application. It is a middleware providing a bridge between the application, the user and the

platform. It conforms to practices of each Local Resource Manager (LRM) in each platform

IP core (e.g., HW block or SW processor). It is used to adapt both platform and application at

run time and to find global and optimal trade-offs in application mapping based on a given

optimization goal. On the other hand, each IP core can execute its own resource management

without any restriction, through an LRM. Such an LRM encapsulates the local policies and

mechanisms used to initiate, monitor and control computation on its IP core.

In contrast to the collaboration between the GRM and the LRMs, the GRM collaboration with

application and user is visible to the application developer and is performed as follows. First,

the QoS requirements and the optimization goal are set by the user. The goal is then translated

into an abstract and mathematical function, called utility function (e.g., performance, power

consumption, battery life, QoS weighted combination of them). Then, at run time, the GRM

manages and optimizes the application mapping taking into account the possible application

configurations explored at design time, the platform resources currently available, the QoS

requirements, and the utility function.

The framework has a generic and structured architecture that is valid for a broad range of

design flows, platform, and application domains:

It supports a holistic view of all platform resources which can be a heterogeneous mix

of different kinds of IP cores (e.g. hardware accelerators, FPGA’s, multi-core

processors, ...), memories, batteries, ...

It transparently optimizes the overall platform resource usage and application

mapping. Energy consumption is controlled parameters that the platform provides, e.g.

dynamic voltage scaling, frequency scaling, and activation/deactivation of IP cores.

It dynamically adapts to changing contexts, taking into account constraints on the

resource usage, while optimizing the overall quality of experience (QoE).

The optimizations are steered via heuristics that can be easily customized to the

application context. Typically parameters taken into account to optimize the QoE are

video resolution, video frame rate, audio frequency etc. In principle, it is even possible

Page 30: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 30

to let the user adapt the heuristics at run-time, e.g. by changing the weight factors

associated with certain quality aspects.

The GRM framework implementation is very lightweight and therefore ideally suited for

embedded platforms:

Most of the optimization complexity is covered by a design-time exploration and

characterization of the operating points. This alleviates the run-time decision making,

which is implemented using lightweight data structures and optimization algorithms.

Data memory, instruction memory, and processing requirements are very small

compared to the typical requirements for embedded applications.

It can easily be instantiated for a given target platform and application domain. The

required modifications to the applications are minimal and the communication

protocol is simple and lightweight, leading to almost negligible processing overhead.

3.4.2 Timing & power aware executable virtual system prototype in SystemC

During system execution the annotated timing and power information is collected. Depending

on the workload model different execution paths, leading to different timing and power

values, are possible. After simulation, the collected information can be illustrated in a power-

over-time diagram or can be used for a power-breakdown. Our annotations can be traced at

different levels of granularity to allow a user-defined

trade-off between performance and accuracy.

Figure 17 depicts the different timing and power

evaluation levels. On the most abstract level we only

consider analysis on task granularity. This can be easily

performed between active and passive tasks with

blocking communication relation. Execution time and

power of the passive task can simply be inlined and

accumulated with time and power of the active task.

The next level of granularity works on communication

granularity. Power and timing of computation nodes in

the communication graph are accumulated and only

traced at the time points of communication. For a

deeper analysis of the timing and power behavior traces

on basic block granularity of a CDFG is also possible.

A possible new simulation view provided by the

COMPLEX project is the so-called Network View

depicted in Figure 18 and enabled by the SystemC

Network Simulation Library (SCNSL) [16]. It is an

extension of SystemC to allow modeling packet-based

networks such as wireless networks, Ethernet, and field

bus. It supports the simulation of packet transmission,

reception, contention on the channel and wireless path

loss. This way, the virtual platform can be used to

simulate networked embedded systems in a realistic communication scenario [26].

The advantages of SCNSL are:

Figure 17: Annotation hierarchy

C

Process

Graph

Com-

munication

Graph

Control Data

Flow Graph

CCommuni-

cation

Computation

Basic Block

Condition

true false

Shared

Data

Port

Interface

per basic block:

1) # cycles

2) average

switched

capacitance

Page 31: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 31

simplicity: a single language/tool, i.e. SystemC, is used to model both the system (i.e.

CPU, memory, peripherals) and the communication network;

efficiency: faster simulations can be performed since no external network simulator is

required;

re-use of SystemC IP blocks

scalability: support of different abstraction levels in the design description

openness: several tools available for SystemC can be exploited seamlessly

extensibility: the use of standard SystemC and the source code availability guarantee

the extensibility of the library to meet design-specific constraints

Figure 18: Network View: the virtual platform of a networked embedded system is simulated together

with a model of the network

According to Figure 18, the traditional virtual platform, made of CPU, HW blocks and bus,

can be extended by wrapping it into a network node exchanging packets with other nodes

through a channel by using well-known protocols such as IEEE 802.15.4. The library also

provides traffic sources and sinks to generate concurrent flows with predefined statistical

behaviour. All these elements are provided by SCNSL and they can be instantiated in the code

as traditional SystemC blocks. SCNSL also allows improving timing and power analysis by

introducing the effect of communications which have a direct impact on timing behavior and

power consumption of the system under design [17].

3.5 Exploration & Optimization

The Exploration and Optimization phase creates a feedback loop between the performance

estimation part (including MDA Design Entry, Executable Specification, Estimation & Model

Generation and Simulation phases) and the parameters configuration of the target system.

In particular, the loop is closed by acquiring the system simulation traces to be post processed

before being presented either to the user or to an automatic framework for exploration and

optimization. Then, according to the information gathered by the previous simulation, a new

Page 32: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 32

HW/SW system configuration will be selected within the design space. In the next paragraph

we describe these steps in more detail.

3.5.1 Simulation trace

The basic infrastructure for behavioural timing and power annotations within the COMPLEX

framework is based on a flexible and generic tracing interface. This tracing framework can be

used to trace arbitrary user data over time, including non-functional properties like power

consumption. It is used within the simulation platform to provide the outputs required by the

user and the exploration and optimization framework.

To enable both, a user-driven analysis based on the visual presentation of simulation results as

well as the integration into an automated exploration loop as performed by MOST, different

kinds of post-processing back-ends are supported. Especially for the generation of compact

performance metrics, a dedicated reporting and accumulation mechanism is necessary. This

usually implies on-the-fly pre-processing of the intermediate power values, for instance to

compute the overall energy consumption of a complete simulation of a particular application

scenario or configuration. Secondly, the integration with the graphical analysis framework of

the Synopsys Virtual Platform solution has been implemented. This is then usually based on

more detailed traces of values/events over time. Both aspects are addressed by the tracing and

post-processing mechanisms developed by OFFIS.

Since the COMPLEX simulation models are mostly based on SystemC TLM-2.0, the built-in

tracing capabilities of SystemC (based on sc_trace) are not sufficient, because they cannot

directly cope with temporal decoupling and local time offsets. In all of the estimation

techniques developed, such temporal decoupling techniques are used within the BAC++

simulation models to improve the simulation performance by reducing the number of

synchronisations with the SystemC discrete-event simulation kernel. The underlying core

technique of the annotation API presented to the user is based on so-called timed value

streams. These streams support local simulation time offsets to record values “in the future”

based on a more flexible time handling, either via (value,starting time) or

(value,duration) tuples. An additional, block-based annotation API is available as well,

required to relate source code structures with abstract execution times and (potentially

multiple) data streams.

The overall architecture of the COMPLEX trace generation framework is sketched in Figure

19. The power model within the observer of each augmented component in the simulation

(custom hardware, software, or Power State Machine-enabled IP) provides a given set of

default streams (static/dynamic/overall power, current power mode), which can be selected by

the user according to the hierarchical name of the component.

Additional, user-defined streams can be defined as well. In that case, the driving user

processes are responsible for time synchronisation of the stream object(s), while the default

streams of the power observers handle this transparently.

Page 33: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 33

Figure 19: Sketch of the BAC++ trace generator

Before being processed by the final analysis backend, optional pre-processor objects can be

applied to one or more input streams in order to get the desired output streams. This pre-

processing is useful for data reduction, compositing, and other intermediate value/time

transformations.

3.5.2 Exploration & optimization

Starting from the definition of the design space, the exploration and optimization step

iteratively generates an instance of the design space based on the knowledge acquired by the

post-processing of the simulation traces of previous selected configurations.

The exploration phase is a step in the design flow that is needed for surfing the design space

(changing the system parameters) in order to find the optimal system configurations among

all the possible alternatives that are part of the design space. Moreover, the design space

exploration loop is also used to determine some knowledge about the system parameters (such

as the main effects, interaction effects) and design space (such as, configuration distribution

with respect to the system performance). This phase can be done by using a user centric DSE

or an automatic DSE phase.

The goal in using an automatic design space exploration and optimization tool is in the fact

that it should be able to automatically interact with system models in order to avoid the

intervention of the designer for the DSE phase (except for the analysis of the results) once the

target problem is formally defined.

The Multi-Objective System Tune (MOST) tool is a tool for discrete optimization specifically

designed for enabling design space exploration of hardware/software architectures and is the

COMPLEX tool for the automatic design space exploration phase. MOST is a design space

exploration tool that helps driving the designer towards near-optimal solutions to the

architectural exploration problem, by supporting the exploration phase with state of the art

Design of Experiments (DoEs), optimization heuristics and Response Surface Models

(RSMs). The final product of the framework is a Pareto set of configurations within the

design evaluation space of the given architecture and analysis on the effects of design space

parameters on to the objective functions. One of the goals of MOST is to provide a command

line interface to construct automated exploration strategies. Those strategies are implemented

Page 34: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 34

by means of command scripts interpreted by the tool without the need of manual intervention.

This structure can easily support the batch execution of complex strategies.

The support provided by MOST is crucial because the exploration phase is mainly composed

by repetitive tasks composed by the configuration of the simulated platforms and analysis of

the results. This task is very tedious and error-prone task; MOST tool is used to move the

designer effort on more high-level tasks in the DSE phase such as final results analysis. The

effectiveness of the Automatic DSE framework has been demonstrated within the project

through the effective integration of all the COMPLEX Use Cases, driving the exploration by

using knobs and parameters coming from both platform and application.

Figure 20: MOST tool

3.5.3 Design space definition

The design space is defined by the list of tunable parameters available on the HW/SW

platform. Moreover, it includes the set of possible values of each parameter and the rules

defining some cuts within the design space eventually due to interferences between

parameters. The design space definition represents the degrees of freedom that the designer or

an automatic tool can have for tuning the HW/SW platform.

3.5.4 Design space instance parameters

A design space instance is a valid configuration of parameters within the design space defined

before selected or by the designer or by an automatic tool. It is composed of a value for each

parameter of the design space, ranging within the set of available levels. Those values will be

used to fill the right parameter values in several stages of the design-flow (see Figure 2).

In the project, the list of parameters ranges at the MDA Design Entry and Executable

Specification levels from functional reimplementation to mapping of HW/SW tasks and IP

selection, while at Estimation and Model Generation level from IP and memories

configuration to selection of embedded software optimizations and run-time management

strategies.

Page 35: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 35

4 Potential impact and the main dissemination activities and exploitation of results

A wide variety of different methods and tools are developed with the COMPLEX project and

are integrated into a common COMPLEX reference framework, as shown in Figure 21.

Figure 21: The COMPLEX framework and reference tool set

The COMPLEX Framework is expected to enhance the European Union embedded system

engineering capabilities by developing the first industrially applicable framework for HW/SW

co-design and power management in combination with platform-based design space

exploration.

Figure 22 shows the overall exploitation strategy and timeframe for the COMPLEX

Framework. During the three years of project work, exploitation activities have been

prepared. The exploitation itself will start after the completion of the last project year.

After the end of the project the consortium aims to maintain the COMPLEX reference

framework and enable the EDA partners to turn their COMPLEX compatible tools into

products. Moreover, the COMPLEX reference framework can be re-implemented by EDA

companies to make it more stable and usable to external companies. A commercial

COMPLEX Framework re-implementation and exploitation is preferably foreseen for an

EDA partner of the COMPLEX consortium. Last but not least, COMPLEX demonstrator

designs can either be the basis for new industrial designs, or motivate new companies (e.g.

through the publication of success stories) to perform their next design with the COMPLEX

framework.

Page 36: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 36

Figure 22: Exploitation strategy for the COMPLEX Framework

After the successful completion of the COMPLEX project after 3 years, platform provides

(ST-I, ST-PRC, SNPS) are enabled to:

provide power and timing characterized IP cores or whole platforms,

provide platform-specific tool chains, augmented for estimating and modelling

execution artefacts of the platform,

and support the design of future platforms which cover the needs of future

applications/workloads.

EDA tool providers (SNPS, EDALab, MDS) benefit from:

added value through point-tool integration in COMPLEX framework,

creation of market-ready tools starting from the technology brought in COMPLEX,

additional customers and application areas,

and strengthening of their spin-off.

Application and System Integrators (Thales, GMV) are able to:

build products with higher quality through early consideration of platform artefacts

(timing, power & memory size),

get faster access to accurate virtual platforms,

and thus save costs of wrong platform selection/re-designs,

which decreases time-to-market due to early and confident application benchmarking.

Research Institutes and Universities (OFFIS, PoliMi, PoliTo, UC, IMEC) are able to perform:

networking with industrial communities,

education of students and engineers,

and dissemination through open-source tools.

Page 37: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 37

The industrial participants in the COMPLEX project are acting in the following markets

which are expected to be impacted after the achievement of the COMPLEX main objectives:

Mobile Communication (Thales)

Security (Thales)

Space Applications (communication & positioning systems) (GMV)

E-Health (Remote Monitoring), Wireless Sensor Networks (ST)

The following main tools have been developed during the COMPLEX project:

Tool Name Type Provider Exploitation

Hifsuite Tool for abstraction/synthesis EDALAB EDA Tools

(proprietary license)

Magillem S-CAD Architecture specification and

assembly

MAGILLEM EDA Tools

(proprietary license)

SCSNL WSN systemc TLM library EDALAB Open source

(sourceforge)

Synopsys VP

(Virtualizer)

Simulation environment Synopsys EDA Tools

(proprietary license)

MOST Automatic Design Space

Exploration Tool

POLIMI EDA Tool

(Proprietary license)

SWAT SW power estimation and

optimization

POLIMI Open Source

UML/MARTE

TCTool

UML/MARTE front-end UC Publicly available

(upon request)

SCOPE+ SystemC fast estimation

framework

UC Open source

GRM Global Run-time Resource

Management

IMEC Further development

SMOG HW/SW task separation tool OFFIS Further development

PowerOpt+ Behavioral synthesis tool OFFIS Further development

MMCO/MEMOPT Memory Modeling

Characterization and

Optimization

POLITO Publicly available

(upon request)

4.1 Exploitable Use-Cases

4.1.1 Use-Case 1 – Networked embedded system

ST-I and ST-PRC participate in the COMPLEX design flow in the development of power-

annotated SystemC TLM virtual platform. They also have a relevant role in the definition of

tools requirements and into the validation of COMPLEX methods and tools, by leading the

implementation of a wireless sensor network subsystem. The application comes from the

health care domain. It is a virtual machine oriented to data processing in body sensor

networks. A whole Wireless Sensor Network scenario is proposed as use case in the

Page 38: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 38

COMPLEX project. The figure shows the architecture of the use case that is composed by

several nodes described at different detail levels. Node 0 is described by fully detailing its

inner architecture: the CPU of the SoC executes application SW, the operating system,

drivers, and the interrupt service routines. The SoC is a ReISC SoC, a project developed in

STM; the chip, at 90 nm technology, was taped-out at the end of 2009.

Figure 23: WSN embedded system

The ideal simulation of wireless sensor networks has to include both system modelling of the

node and the network behavior. Numerous simulation tools have been introduced for wireless

sensor and they are generally categorized into network and node simulators and emulators.

While emerging WSN SOCs requires early evaluation of global performance when SOCs are

still in their design flow, or to develop application running on the node that can be refined and

optimized including constraints derived by the physical implementation of node and the

network aspects. The state of the art simulators focus either on modelling the protocol stack

and the concurrency among nodes in the network, or on modelling the underlying hardware.

In this use case all aspects related to the WSN embedded system, i.e. namely SW, HW, and

NW are modelled by leveraging of SystemC language and in particular by using SystemC

Network Simulation Library (SCNSL). SystemC has a great flexibility in describing both HW

and SW and NW components at different level of abstraction and provides libraries for

Transaction Level Modeling (TLM) and verication. We further show, how starting from an

application modeled in Stateflow, and by using a SystemC code generator, and the virtual

model of the SOC, a complete global performance analysis has been carried out based on a

unified simulation environment without a need for the use of multiple tools for System and

Network modeling or co-simulation techniques. We mix a Virtual Platform (VP) model of a

node and a purely SystemC model of the application plus protocol stack for the other nodes,

thus improving at the same time, the scalability and the precision of the approach. In our

approach, the use of the model-based design ensures the generation of both models (the

SystemC code for the nodes and the C code to be run on the processor model in the VP) from

the same Stateflow model.

This holistic simulation approach, starting from a model-driven application design, is a

complete scenario modelled and verified with experimental results on the network side and

power analysis results on the node side.

Application SW

FreeRTOS

Periph . drivers

Power Mng .

CPU HW1

HW2

HW4

Radio Interface

Radio channel

NODE 1 NODE 2N …

HW3

Page 39: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 39

The SWAT estimation flow has been proven to be much more efficient than ISS-based

analysis, namely more than 400 times faster (These results refer to a ReISC III core with 1.0

V power supply and operating at 50 MHz). This speed-up, combined with a satisfactory

accuracy, allows integrating the SWAT methodology within a design-space exploration

framework, in particular the MOST DSE engine.

For this use-case PoliTo has developed integrated of a full design environment for low-energy

Wireless Sensor Network applications with the ST-I virtual prototyping environment and the

SCNSL library from EDALab.

This design environment can be used to jointly optimize the platform parameters, the network

parameters and protocols, and the application, with the goal of minimizing total energy

consumption under functional and performance constraints. The environment has been

demonstrated successfully with this use-case. This integration has allowed testing the

MEMOPT with a realistic, industry-strength application on the ST embedded platform.

Figure 24 shows an example of execution of MEMOPT as a standalone tool, applied to a

memory trace referred to the architecture of Use Case 1. Notice that energy and lifetime

savings are significant thanks to the high non-uniformity of the memory access distribution.

Figure 24: MMCO used as an energy and lifetime optimization tool

4.1.2 Use-Case 2 – Battery powered multi-core system

The effectiveness of the GRM framework has been demonstrated on a POSIX-based

implementation of an industrial audio-driven video surveillance application, mapped on

different platforms: an x86-based platform running at 800 MHz, and an ARM-based TI

OMAP 4460 platform running at 700 MHz.

The following figure illustrates the evolution of the energy per frame for two platform

constraints (different energy budgets, same battery duration) with and without the GRM. Due

to an optimized adaptive selection of application configurations, our GRM allows optimizing

Page 40: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 40

the QoS of the application while keeping the platform battery alive for the required duration.

In contrast, this cannot be ensured without a resource management framework.

Figure 25 Energy-per-frame evolution with and without GRM

Analysis of the GRM run-time efficiency confirms that the run-time overhead is negligible:

The initialization of the GRM framework takes less than 10 ms.

The time to perform application configuration selection and to reconfigure the

platform is in the order of 1 ms, on both target platforms.

The total processing overhead due to the use of the GRM is about 1.16% of the total

application run-time on the x86 platform, and 0.6% on the TI OMAP platform.

The memory overhead for the GRM implementation is also limited: the GRM library code

size is about 100KB for the x86 platform, whereas the platform and application-dependent

data structures require only a few KB of memory.

In combination, the experiments show that the proposed combined approach of design-time

exploration of application configurations and their dynamic run-time management improves

the overall QeE of the system, with no significant impact on the application and no significant

run-time overhead.

Synopsys participated in the COMPLEX project to integrate the virtual platform simulation

technology into the COMPLEX design space exploration flow and to support the validation of

the use cases. Next to that Synopsys also contributed through the development of a modelling

technology that supports exploring application mapping on multicore systems and the

expected impact on the memory subsystem architecture as well as on the power consumption

of the design. Through the participation in the COMPLEX project Synopsys has been able to

obtain 2 key results:

1. First of all by integrating a virtual platform as a simulation technology for early

software development and validation into the COMPLEX design space exploration

Page 41: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 41

flow for use case 3 we were able to validate the capabilities of a generic virtual

platform in the context of a complete and elaborate design methodology.

2. Secondly a multicore optimization technology has been perfected that enables early

design space exploration experiments for multicore designs. The technology has been

used and tested in a commercial use case and was extended with power modelling and

analysis capabilities.

Internally Synopsys has continued the work on these results in parallel to the COMPLEX

project and these have resulted in the following concrete products:

1. The demonstrator that was developed in the COMPLEX project has been productized

as part of the Synopsys Virtualizer product portfolio. A derivative of the demonstrator

platform is now delivered as a starter-kit which can be used by customers to tune the

virtual platform to their own design needs, but also as a methodology demonstration

vehicle that can be used by Synopsys’ partners to integrate their solutions and

capabilities into. A first version of this product was released mid-2012, meanwhile a

number of variants have been made available. An example for such a variant was

announced on march 21, 2012: here

Figure 26: Synopsys Ecosystem for mobile platforms

2. Early 2011 Synopsys announced new technology for Optimizing Multicore Systems:

(http://synopsys.mediaroom.com/index.php?s=43&item=896). Which has been

selected as one of the 5 EDA products in EDN Hot 100 products of 2011

(http://www.edn.com/article/519945-

EDN_Hot_100_products_of_2011_EDA_IP_storage.php). This technology is using

the task based virtual platform modelling techniques which are used and extended in

the COMPLEX project. In 2013 this product will include a productized version of the

power modelling technology that was developed during the COMPLEX project.

Page 42: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 42

Figure 27: Synopsys task based virtual platform modelling techniques

4.1.3 Use-Case 3 – Model-based space application

The main goal of the exercise performed by GMV in the exercising of the COMPLEX

methods and tools is focused on the verification of the methodology and the associated tools.

As a result, it is possible to infer from the results of the Design Space Exploration (DSE)

execution the optimal architecture and the most suitable partitioning for the hardware and

software systems. For such purpose GMV has developed a domain specific use case. The use

case is an on-board distributed application in the context of Space Situational Awareness

(SSA) consisting of an object survey, tracking and imaging system represented in Figure 28.

Attitude and Altitude sensors

Image capturing and

filtering

Object survey, tracking,

hazard analysis

Optical Device

GPS

ReceiverStartracker

Antenna

Figure 28: Space Situational Awareness system

This use case defines several components interconnected among them, which, based on

cyclical processes that present quite different computational requirements depending on the

system inputs are able to track the space situation. These characteristics, combined with the

Page 43: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 43

power requires present on space systems, makes critical the disposal of a powerful design

infrastructure capable of helping the designer to define the most suitable system capable of

fulfilling all the imposed constraints.

Using the entire framework, the design space exploration tools has demonstrated to ability to

find the optimal architecture given the functional and non-functional properties, and

requirement specifications incrusted in the system model. COMPLEX tools favour the

refinement of the system models and guide architects and designers through the different

design choices: system function modelling, platform resource modelling, HW/SW separation,

and assessment of extra-functional properties, optimisation and others.

Specifically, the UML/MARTE COMPLEX flow developed for the Use-Case 3 has

demonstrated that additional improvements to the general exploration flow described in the

previous section are possible. Specifically, the UML/MARTE COMPLEX flow removes the

MDA entry from the DSE loop, and avoids the recompilation of the executable performance

model for each iteration. This way, a significant speed-up of the DSE loop has been achieved.

4.2 Socio-economic impact and the wider societal implications of the project

This information can be found in the “Report on the Wider Societal Implications of the

Project” publically available a [21].

4.3 Main dissemination activities

This information can be found in the “Final Report on Standardization and Dissemination

Activities” publically available at [22].

Page 44: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 44

5 Contact information

Title: COdesign and power Management in PLatform-based design space

Exploration

Acronym: COMPLEX

Project website: http://complex.offis.de

List of

Contractors:

Name (Contact name & E-Mail): Short

name:

Country:

OFFIS e.V. Philipp A. Hartmann ([email protected])

OFFIS Germany

STMicroelectronics srl. Sara Bocchio ([email protected])

ST-I Italy

STMicroelectronics Beijing R&D Inc. Chris Wu ([email protected])

ST-PRC China

Thales Communications SA Sylvie Raynaud ([email protected])

Thales France

GMV Aerospace and Defence SA Carmen Lomba ([email protected])

GMV Spain

SNPS Belgium NV Bart Vanthournout ([email protected])

SNPS Belgium

ChipVision Design Systems AG (until end of 2010)

CV Germany

EDALab srl Davide Quaglia ([email protected])

EDALab Italy

Magillem Design Services SAS Emmanuel Vaumorin ([email protected])

MDS France

Politecnico di Milano William Fornaciari ([email protected])

PoliMi Italy

Universidad de Cantabria Eugenio Villar ([email protected])

UC Spain

Politecnico di Torino Enrico Macii ([email protected])

PoliTo Italy

Interuniversitair Micro-Electronica

Centrum vzw Eddy de Greef ([email protected])

IMEC Belgium

European Electronic Chips & Systems

design Initiative Adam Morawiec ([email protected])

ECSI France

Co-ordinator

Contact:

Kim Grüttner

OFFIS - R&D Division Transportation

Escherweg 2 - 26121 Oldenburg - Germany

Phone/Fax.: +49 441 9722-228/-278

E-Mail: [email protected]

Page 45: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 45

6 References

[1] Gianluca Palermo, Carlo Brandolese, Francisco Ferrero, Fernando Herrera, Gunnar

Schomaker, Claus Brunzema, Kim Grüttner, Kai Hylla, Bart Vanthournout, Davide

Quaglia, Luciano Lavagno, Massimo Poncino, Emanuel Vaumorin, Chantal Couvreur

and Saif Ali Butt. Definition of application, stimuli and platform specification, and

definition of tool interfaces. Tech. Rep. COMPLEX/PoliMi/R/D1.2.1/1.1, COMPLEX

project deliverable (October 2010) URL http://complex.offis.de/docs/8

[2] F. Herrera, H. Posadas, P. Penil, E. Villar, F. Ferrero, R. Valencia, An MDD

Methodology for Specification of Embed ded Systems and Automatic Generation of

Fast Configurable and Executable Performance Models, in: International Conference

on Hardware/Software Codesign and System Synthesis, CODES+ISSS’2012,

Tampere, FI, 2012.

[3] F. Ferrero, R. Valencia, F. Herrera, E. Villar, L. Lavagno, D. Quaglia, System

specification methodology using MARTE and Stateflow, Tech. Rep.

COMPLEX/GMV/R/D2.1.1/1.1, COMPLEX project deliverable (Dec. 2010). URL

http://complex.offis.de/docs/11

[4] F. Herrera, P. Penil, E. Villar, F. Ferrero, R. Valencia, L. Lavagno, D. Quaglia,

SystemC generation tools from MARTE and Stateflow, Tech. Rep.

COMPLEX/UC/P/D2.1.2/1.0, COMPLEX project deliverable (Jun. 2011). URL

http://complex.offis.de/docs/21

[5] K. Grüttner, C. Grabbe, F. Oppenheimer, W. Nebel, Object Oriented Design and

Synthesis of Communication in Hardware-/Software Systems with OSSS, in:

Proceedings of the SASIMI 2007

[6] K. Grüttner, H. Andreas, P. A. Hartmann, A. Schallenberg, C. Brunzema, OSSS - A

Library for Synthesisable System Level Models in SystemC™ (2008). URL

http://www.system-synthesis.org

[7] K. Hylla, P. Gonzalez, P. Sanchez, F. Herrera, Final report on custom hardware

estimation and model generation, Tech. Rep. COMPLEX/OFFIS/P/D2.4.2/1.0,

COMPLEX project deliverable (Jan. 2012). URL http://complex.offis.de/docs/33

[8] D. Helms, G. Ehmen, W. Nebel, Analysis and Modeling of Subthreshold Leakage of

RT-Components under PTV and State Variation, Proceedings on International

Symposium on Low Power Electronics and Design.

[9] Kai Hylla, Philipp A Hartmann, Domenik Helms and Wolfgang Nebel. Early Power &

Timing Estimation of Custom Hardware Blocks based on Automatically Generated

Combinatorial Macros. In 16. Workshop Methoden und Beschreibungssprachen zur

Modellierung und Verifikation von Schaltungen und Systemen (MBMV'2013).

Rostock, Germany, March 2013.

[10] C. Brandolese, G. Palermo, W. Fornaciari, H. Posadas, F. Herrera, P. Penil, E. Villar,

F. Ferrero, R. Valencia, B. Vanthournout, Final report on embedded software

estimation and model generation, Tech. Rep. COMPLEX/PoliMi/R/D2.2.2/1.0,

COMPLEX project deliverable (Jan. 2012). URL http://complex.offis.de/docs/31

[11] C. Brandolese, W. Fornaciari, Software Energy Optimization Through Fine-Grained

Function-Level Voltage and Frequency Scaling, in: International Conference on

Hardware/Software Codesign and System Synthesis, CODES+ISSS’2012, Tampere,

FI, 2012.

[12] D. Lorenz, P. A. Hartmann, K. Grüttner, W. Nebel, Non–invasive Power Simulation at

System–Level with SystemC, in: International Workshop on Power and Timing

Page 46: COdesign and power Management in PLatform- based design ... · 4.1.2 Use-Case 2 – Battery powered multi-core system 39 4.1.3 Use-Case 3 ... which then allows to study different

COMPLEX/OFFIS/R/D6.2.5/1.0 Public

Final publishable summary report

Page 46

Modeling, Optimization and Simulation, PATMOS’2012, Newcastle upon Tyne, UK,

2012.

[13] D. Lorenz, K. Grüttner, N. Bombieri, V. Guarnieri, S. Bocchio, From RTL IP to

Functional System-Level Models with Extra-Functional Properties, in: International

Conference on Hardware/Software Codesign and System Synthesis,

CODES+ISSS’2012, 2012.

[14] Emmanuel Vaumorin, Bart Vanthournout, Sara Bocchio, Davide Quaglia, Fernando

Herrera, Pablo Peñil del Campo, Eugenio Villar, Kai Hylla, Tiemo Fandrey, Philipp A.

Hartmann, Final report and tools on virtual system generation, Tech. Rep.

COMPLEX/MDS/R/D2.5.3/1.0, COMPLEX project deliverable (December 2012).

[15] S. Bocchio, P. A. Hartmann, D. Lorenz, D. Quaglia, Final report and tools on platform

IP components estimation and model generation, Tech. Rep. COMPLEX/ST-

I/P/D2.3.2/1.1, COMPLEX project deliverable (May 2012). URL

http://complex.offis.de/docs/34

[16] SystemC Network Simulation Library v.2 (2012). URL

http://sourceforge.net/projects/scnsl

[17] P. Sayyah, M. Lazarescu, D. Quaglia, E. Ebeid, S. Bocchio, A. Rosti, Network-aware

Design-Space Exploration of a Power-Efficient Embedded Application, in:

International Conference on Hardware/Software Codesign and System Synthesis,

CODES+ISSS’2012, Tampere, FI, 2012

[18] C. Ykman-Couvreur, P. A. Hartmann, G. Palermo, F. ColasBigey, L. San, Run-time

Resource Management based on Design Space Exploration, in: International

Conference on Hardware/Software Codesign and System Synthesis,

CODES+ISSS’2012, Tampere, FI, 2012

[19] Website. www.acceleo.org Nov., 2010.

[20] OMG. MOF Model to Text Language. Jan., 2008.

[21] Kim Grüttner, Report on the Wider Societal Implications of the Project, Tech. Rep.

COMPLEX/Partner Name/R/D6.2.6/1.0 COMPLEX project deliverable (March 2013)

[22] Adam Morawiec, Ana Pinzari, Final Report on Standardization and Dissemination

Activities, Tech. Rep. COMPLEX/ECSI/R/D5.2.3/1.0 COMPLEX project deliverable

(March 2013).

[23] Open Virtual Platforms, Virtual platforms for software development, URL:

http://www.ovpworld.org

[24] N. Bombieri, F. Fummi, and G. Pravadelli, Automatic Abstraction of RTL IPs into

Equivalent TLM Descriptions, IEEE Transactions on Computers, vol. 60 , n. 12 ,

2011, pp. 1730-1743.

[25] EDALab, HIFSuite web site, http://www.hifsuite.com

[26] P. Sayyah, F. Stefanni, M. Lazarescu and D. Quaglia, SystemC model generation for

realistic simulation of networked embedded systems, 15th Euromicro Conference on

Digital System Design (DSD), Sept. 5-8, 2012.

[27] IEEE, “IEEE Std 1666 - 2011 IEEE Standard SystemC Language Reference Manual”,

2012. URL http://standards.ieee.org/findstds/standard/1666-2011.html

[28] TheMathWorks, MATLAB and Simulink for Technical Computing, URL:

http://www.mathworks.com