Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Public
FP7-ICT-2009- 4 (247999) COMPLEX
COdesign and power Management in PLatform-
based design space EXploration
Project Duration 2009-12-01 – 2013-03-31 Type IP
WP no. Deliverable no. Lead participant
WP6 D6.2.5 OFFIS
Final publishable summary report
Prepared by Kim Grüttner (OFFIS), all contributors
Issued by OFFIS
Document Number/Rev. COMPLEX/OFFIS/R/D6.2.5/1.0
Classification COMPLEX Public
Submission Date 2013-05-08
Due Date 2013-03-31
Project co-funded by the European Commission within the Seventh Framework Programme (2007-2013)
© Copyright 2013 OFFIS e.V., STMicroelectronics srl., STMicroelectronics Beijing
R&D Inc, Thales Communications SA, GMV Aerospace and Defence SA, SNPS Belgium
NV, EDALab srl, Magillem Design Services SAS, Politecnico di Milano, Universidad de
Cantabria, Politecnico di Torino, Interuniversitair Micro-Electronica Centrum vzw, European
Electronic Chips & Systems design Initiative.
This document may be copied freely for use in the public domain. Sections of it may
be copied provided that acknowledgement is given of this original work. No responsibility is
assumed by COMPLEX or its members for any aplication or design, nor for any
infringements of patents or rights of others which may result from the use of this document.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 2
History of Changes
ED. REV. DATE PAGES REASON FOR CHANGES
KG 1.0 2013-05-06 45 Final document
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 3
Contents
1 Executive summary 4
2 Summary description of project context and objectives 5
2.1 Motivation and context 5
2.2 Objectives 7
3 Description of the main S&T results/foregrounds 9
3.1 MDA Design Entry 10
3.1.1 Viewpoints 10
3.1.2 Component-based design 11
3.1.3 Use-cases, scenarios, verification 11
3.1.4 Support of Design-Space Exploration 11
3.1.5 MDE tools and flow 12
3.2 Executable Specification 15
3.2.1 Executable application model 16
3.2.2 System input stimuli 16
3.2.3 User constrained HW/SW separation & mapping 17
3.2.4 Architecture/Platform description 17
3.3 Estimation & Model Generation 19
3.3.1 Hardware/Software task separation 19
3.3.2 Custom Hardware estimation 20
3.3.3 Software estimation 22
3.3.4 Pre-existing IP & virtual component models 26
3.3.5 Virtual system generation 26
3.4 Simulation 29
3.4.1 Pre-optimized power controller 29
3.4.2 Timing & power aware executable virtual system prototype in SystemC 30
3.5 Exploration & Optimization 31
3.5.1 Simulation trace 32
3.5.2 Exploration & optimization 33
3.5.3 Design space definition 34
3.5.4 Design space instance parameters 34
4 Potential impact and the main dissemination activities and exploitation of results 35
4.1 Exploitable Use-Cases 37
4.1.1 Use-Case 1 – Networked embedded system 37
4.1.2 Use-Case 2 – Battery powered multi-core system 39
4.1.3 Use-Case 3 – Model-based space application 42
4.2 Socio-economic impact and the wider societal implications of the project 43
4.3 Main dissemination activities 43
5 Contact information 44
6 References 45
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 4
1 Executive summary
The consideration of an embedded device's power consumption and its management is
increasingly important nowadays. Currently, it is not easily possible to integrate power
information already during the platform exploration phase. In this integrated project,
integrated device manufacturers, system integrators, Electronic Design Automation (EDA)
vendors and research partners collaboratively worked on solving the design challenges of
today's heterogeneous HW/SW systems regarding power and complexity.
The main objective of the COMPLEX project was to increase the competitiveness of the
European semiconductor, system integrator and EDA industry by addressing the problem of
platform-based design space exploration (DSE) under consideration of power and
performance constraints early in the design process. High performance usually causes high
power consumption. A main challenge in today’s embedded system design is to find the
perfect balance between performance and power. This balance cannot be found efficiently and
at high quality, because until now no generic framework for accurately and jointly estimating
performance and power consumption starting at the algorithmic level has been available.
As a result, we propose a reference framework and design flow concept that combines
system-level power optimization techniques with platform-based rapid prototyping. Virtual
executable prototypes are generated from MARTE/UML and functional C/C++ descriptions,
which then allows to study different platforms, mapping alternatives, and power management
strategies.
Our proposed flow combines system-level timing and power estimation techniques available
in commercial tools with platform-based rapid prototyping. We propose an efficient code
annotation technique for timing and power properties enabling fast host execution as well as
adaptive collection of power traces. Combined with a flexible design-space exploration (DSE)
approach our flow allows a trade-off analysis between different platforms, mapping
alternatives, and optimization techniques, based on domain-specific workload scenarios.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 5
2 Summary description of project context and objectives
2.1 Motivation and context
High performance usually causes high power consumption. Especially in embedded system
design today one main objective is to find the perfect balance between performance and
power for a given design. One strategy is to optimise the performance everywhere where
speed is absolutely needed and to design everything else for low power. This helps to achieve
the ultimate goal of handheld embedded devices, ensuring the required performance with the
longest possible battery life.
Figure 1 shows the key challenges for the embedded mobile device industry. The performance
growth of mobile phones per generation from analogue (1G) over GSM (2G) to high
bandwidth (3G and 4G), clearly exceeds Moore’s law of technology development. Instead it
follows the much faster Shannon’s law of application development, predicting a complexity
doubling in 8.5 month. Since more than 30 years now the integration density of integrated
circuits approximately doubles every 18 months (Moore’s Law). In contrast, battery makers
need 5 to 10 years to achieve comparable increase in power density, and memory access time
performance doubles every 12 years only.
Figure 1: Key Technology Gaps
The gaps in this figure define the challenges which the industry is facing: 1. Algorithmic
complexity gap, 2. Microprocessor and memory bandwidth gap, and 3. Power reduction gap.
In order to keep up with the rapid technological advances, system design methodologies and
EDA support were always forced to evolve in the past. Without the support of design
methodologies and appropriate tool support the design gaps are becoming larger. In the
COMPLEX project we addressed all three gaps:
1G
2G
2.5G
3G
4G
1
10
100
1.000
10.000
1.000.000
10.000.000
100.000
Performance Shannon‘s law
2x in 8.5 months
Moore‘s law
2x in 18 months
1980 1984 1988 1992 1996 2000 2004 2008 2012 2016 2020
Memory access time
2x in 12 years
Eveready`s law
(Battery energy density)
2x in 10 years
Power
reduction
Algorithmic
complexity
CPU-memory
bandwidth
Source: Jan M. Rabaey
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 6
We observe a rising complexity of applications and execution platforms. The gap
between these complexities boosts the uncertainty of platform selection and
application to platform mapping, and requires EDA tools and methods that are fast
enough to cope with the very size of recent algorithms.
Since the power reduction gap is the next main limiting factor, a balance between
performance and power needs to be found early in the design process. This can only
be performed under explicit consideration of the application. To ensure a sufficient
power control, tools and methods have to be not only fast enough, but also accurate in
prediction and efficient in optimization.
The gap of memory access times requires smarter memory organization. Since this
influences both power and performance a multi-objective design space exploration is
required.
In custom hardware design, advances in performance and power consumption have been
mainly influenced by new technologies and an evolution in design methodology. The latter
was achieved by several important steps in climbing up the level of abstraction for the design
entry. All these steps gave the productivity in hardware design a great boost and made it
possible to manage the steadily growing complexity of integrated circuits. Software
processing units like microcontrollers, SIMD processors or DSPs have been made more
efficient. Modern platforms support advanced power management capabilities with dynamic
frequency scaling (sometimes independent for processing and memory subsystem) and power
islands that needs to be effectively controlled to maximise the optimization potential.
For these reasons it was high time to initiate the next ground-breaking step in addressing the
formulated challenges in a holistic approach. Thus, there is a need for a new evolving design
entry at system level where hardware and software are described in the same way. The latest
trends in software engineering and hardware design have some commonalities which might be
not apparent on the first look. In the hardware and software world we observe the introduction
of methodologies that separate the functionality/algorithm from the concrete implementation
platform. The HW world calls this evolution ESL (Electronic System Level); the SW world
calls this model-based design or MDE (Model-Driven Engineering), with Model-Driven
Architecture (MDA). Both follow the Y-chart approach (separation of functionality – what –
from the implementation platform – how).
The availability of this new design entry in conjunction with the traditional bottom-up
approach defines a new viewpoint for the design of embedded systems: How to find a
mapping and implementation of an application onto an execution platform that fulfils all
functional and extra-functional requirements at minimal cost. To avoid expensive redesigns
and costly code modifications, the platform decision should be done before investing money
into a concrete target platform. For this purpose reliable information about the execution
behaviour of the application running on the platform in terms of functionality, performance
and power consumption are absolutely mandatory.
Until now no generic framework for accurately and jointly estimating performance and power
consumption of complete embedded systems has been available at the algorithmic level.
Available point-tools and predesigned system components need to be bundled and properly
integrated into a holistic framework for platform-based design space exploration:
Behavioural synthesis for the generation of custom hardware from algorithmic
descriptions
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 7
Dedicated embedded SW compilers for the generation of executables from algorithmic
descriptions
Portfolio of HW intellectual property, ranging from microprocessors, DSPs,
memories, on-chip communication structures, communication peripherals, and domain
specific accelerators.
Virtual platforms for early system simulation, analysis and integration of HW and SW
components without using costly test-chips.
The motivation of COMPLEX was to build a framework on top of these assets that supports a
software-like design entry, integrates platforms and software development tool-chains from
different European providers, and incorporates European EDA tools and know-how in the
area of power and timing estimation of HW, SW and run-time power management. The
project outcome is the connection of this framework to the next-generation system
specification and design methodology, the automatic generation of an efficient executable
virtual system, giving accurate and reliable timing and power information, and the integration
of an automatic design space exploration for finding the optimal design space instance
parameters.
2.2 Objectives
The primary scientific and technical objective of COMPLEX was to develop an innovative,
highly efficient and productive design methodology and a holistic framework for iteratively
exploring the design space of embedded HW/SW systems. This objective has a strategic
dimension, since platform providers, EDA providers and system integrators would benefit
from this framework likewise. These companies are seeking short to mid-term consolidation
and growth of their market shares in business sectors such as telecom, consumer and
automotive electronics, in which the European industry holds world-wide technical excellence
and commercial leadership.
The R&D activities which have been performed in COMPLEX targeted new modelling and
specification methodologies by using software like MDA design entry for system design as
well as the integration of HW and SW timing and power estimation in efficient virtual system
simulation, and also multi-objective design-space exploration under consideration of run-time
management for power and performance optimizations. Specifically, the scientific and
technical results of the project are:
Highly efficient and productive design methodology and holistic framework for
iterative design space exploration of embedded HW/SW systems. The resulting
framework is platform vendor and application domain independent, provides open
interfaces, and enables the integration of new industry players.
Combination and augmentation of well-established ESL synthesis & analysis tools
into a seamless design flow enabling performance & power aware virtual prototyping
from a combined HW/SW perspective
Proposal of an interface to the next-generation model-driven SW design approach
(MARTE/UML) and the industry standard Matlab/Simulink model-based design
environment. This seamless design entry lowers existing barriers between HW and
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 8
SW developers, allowing SW designers to take more influence on the exploration of
the HW platform.
Multi-objective co-exploration to assess the design quality and to optimize the system
platform with respect to performance and power.
Fast simulation and assessment of the platform at ESL with up to bus-cycle accuracy
at the earliest instant in the design iteration.
Optimization benefits from run-time mode adaptation techniques, such as dynamic
power management or application adaptation to varying workloads.
Demonstration of the accuracy and ease of integration of existing EDA tools within
the new methodology and framework by comparison with state-of-the-art reference
methodologies.
Demonstration of the applicability and effectiveness of the new methodology and
framework through validation against measured data and/or available power and
performance characterized virtual platforms.
Demonstration of the usability and effectiveness of new design methodologies, tools
and framework by their application to industry-strength design cases made available
by some of the project partners.
Distinguishing feature of the R&D approach of COMPLEX is that it unifies the development
and integration of next-generation MDA design-entry with platform-based design, existing
EDA techniques and tools for estimation and model generation for virtual system prototypes,
and a multi-objective design-space exploration technique and tool. This enables a synergic
approach to a holistic embedded HW/SW virtual system prototyping approach regardless of
the target platform and application domain.
The COMPLEX design framework has been developed by research, industry and EDA
partners, ensuring its usability in realistic, industry-strength design flows and environments,
thus allowing the industrial partners to take advantage of the new solutions during the course
of the project and to apply the new tools for production purposes shortly after project end.
The technical objectives highlighted above constitute a prerequisite for the commercial targets
of the industrial partners, which are geared towards an improvement of their (and their
customers) competitiveness in the world-wide market of electronic products and applications.
One additional objective of COMPLEX has been the Europe-wide dissemination of the
valuable know-how and competence that each single partner has acquired through R&D effort
performed by all the researchers, designers, application, and EDA engineers participating in
the project.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 9
3 Description of the main S&T results/foregrounds
The design framework proposed in the COMPLEX project is illustrated in Figure 2. As
described in the motivation and objectives above, we follow the PBD approach with a
separation of application (a) (e), architecture (d) (g), and mapping description (c). The
architecture/platform consists of pre-existing IP components like processors, buses, hardware
accelerators and memories, while the application describes how these resources are used to
implement certain system functionality. For the specification of different domain-specific
application workload scenarios, specified as use-cases (b) we propose to generate a system
input stimuli specification (f) for triggering the executable system model.
Figure 2 : The COMPLEX Reference Framework
The most important property of the proposed framework is that timing and power
characterisation is separated from application specification and development. This separation
allows platform providers to offer timing and power characterized virtual platform component
models (IPs) (k). Together with the estimated custom HW (i) and SW components (j) timing
and power aware executable virtual system prototype (n) can be generated.
Based on the simulation trace (o), obtained from executing the generated platform model,
analysis tools (p) can either generate a report or a visualization of the power consumption per
system component over time (q). The application of metrics on the trace is used to drive an
automatic or semi-automatic exploration and optimization process (r) that modifies different
design parameters (t) in a pre-defined design-space (s). These parameters can be applied on
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 10
the MDA design entry model, executable SystemC model, or the estimation and model
generation tools.
The following sections give a more detailed description the different phases of our proposed
rapid prototyping framework. The definition of application, stimuli and platform
specification, and definition of tool interfaces can be found in [1].
3.1 MDA Design Entry
The COMPLEX modeling entry is supported by the COMPLEX UML/MARTE modelling
methodology [2][3] that includes a toolset fully integrated in the Eclipse framework [4]. This
toolset automates the generation of the code which serves to generate the performance
executable model. The UML/MARTE specification models both the system and the input
stimuli environment.
Among all the features, the following will be described in the next paragraphs:
Viewpoints for separation of functional and extra-functional concerns
Component-based design approach
Explicit support for Design-Space Exploration (DSE)
3.1.1 Viewpoints
The COMPLEX UML/MARTE methodology enables the specification of the different facets
of the system in different model viewpoints. The COMPLEX model viewpoints are the Data
Model View, the Functional View, the Communications and Concurrency (CC) View, the
Platform View, the Architectural View, and the Verification View. These views enable
separation of concerns and thus raise the level of abstraction as each view focuses on a
specific aspect of interest of the system. The COMPLEX UML/MARTE methodology also
defines the relationships among these views, and a workflow which guarantees them. This
enables the designer to build a synthetic model (avoiding possible redundancies and thus
coherence checks) and enables a cooperative workflow where the application and platform
can be captured in parallel.
The separation of concerns is given at several levels of the model design. The system (i.e.
application mapped onto platform resources) is separated from the environment. Within the
system model, the platform specification (i.e. processing resources, operating system) is
separated from the model of the application.
Finally, within the application, data structures, functionality (interfaces and classes) and
application components are also separately captured.
The extra-functional properties of the application are specified on the CC view, while the
platform extra-functional properties are described in the Platform view. The Architectural
view provides information about the allocation of application components onto the platform
components and the DSE parameters and metrics can be reflected.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 11
3.1.2 Component-based design
The COMPLEX UML/MARTE methodology follows also a component-based approach. At
the application level, the designer encapsulates the functionality using application components
in the CC View. A system application is captured as a component, and instances of application
components are used to capture the application architecture. This component represents the
Platform Independent Model (PIM) according to the MDA paradigm.
The platform architecture is captured in the Platform view by means of instances of SW and
HW platform components, e.g. RTOS component instances, and instances of HW processor
components. This architecture represents the Platform Description Model (PDM) according to
the MDA paradigm.
The component is the elemental unit used in the COMPLEX UML/MARTE methodology for
deploying functions onto processing resources (i.e. microprocessors, FPGA) in the
Architectural view. For instance, application components can be mapped onto the platform
components which represent processing resources.
3.1.3 Use-cases, scenarios, verification
Finally, the COMPLEX UML/MARTE methodology allows designers to specify the input
stimuli in a separated view, the Verification view.
As mentioned, this view supports the specification of a set of environmental components and
how they connect with the system component. This view provides a description of the
interactions (through sequence diagrams) among the environmental components and the
system component as the sequence of ordered messages in the context of a use case scenario.
Timing information and ordering constraints among environment events are captured in the
sequence diagrams, so that it enables the documentation and generation of realistic use cases.
This is crucial for the dynamic performance estimation enabled by the executable model
derived from the UML/MARTE system model.
The methodology enables the definition of multiple scenarios that represent different use
cases of the system. Designers may choose any scenario from those modeled in the
Verification view to generate the performance executable model and explore the design space.
3.1.4 Support of Design-Space Exploration
The COMPLEX UML/MARTE methodology has been explicitly designed for supporting
design space exploration. Specifically, the methodology supports the specification of a design
space, i.e., a set of design solutions, rather than a single design. The description of such
design space is enabled by means of defining: a set of architectural mappings (allocation
space), a range of values for platform attributes (parameters of the space), and several
platform architectures (architecture space).
In order to specify this design space, the methodology relies on the MARTE profile and
proposes new stereotypes for the missing semantics (i.e. DSE, IP-XACT concepts), which are
proposed as a necessary enhancement of current capabilities of MARTE for embedded system
design. Moreover, the COMPLEX UML/MARTE methodology supports also explicit
constraints and rules that can be used by the designers to limit the space of solutions to those
that are of main interest.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 12
The methodology also supports the specification of system local metrics. In contrast to global
system metrics, such as total power consumption, system dependent metrics depend on each
specific system model. For example, the latency metric for servicing a specific function of an
application component or the miss rate of the instruction cache of a given processor of the
platform. This feature provides a capability of paramount importance: all the aspects of the
system with some impact on its final performance: application and platform, SW and HW,
architectures and architectural mappings, component attributes and different types of metrics,
form now part of the DSE loop and therefore can be optimized at once, under a real holistic
approach.
3.1.5 MDE tools and flow
GMV and University of Cantabria have participated together in the development of the
COMPLEX design framework based on UML/MARTE and Eclipse technologies. Both
partners have defined a methodology for UML/MARTE modelling oriented to the support of
design space exploration and a set to generators capable of mapping the information described
in the model to the inputs required by the tools of the flow.
In order to evaluate the different design space points, the University of Cantabria have
developed the SCoPE+ simulator. This simulator provides all the required resources needed to
simulate and evaluate the performance of the platform independent code when executed under
the different configurations described in the system model, including different allocations
(HW and SW) and communication semantics.
All these tools has been integrated with the design space exploration tool MOST, developed
by POLIMI, generating an automatic exploration framework, which can be controlled by an
Eclipse GUI, minimizing the effort an knowledge required by the user.
Finally, as part of the exercising of the COMPLEX methods and tools, GMV has
implemented a space domain use case in order to demonstrate the proposed COMPLEX
solutions.
As it is well-known, for on-board systems, performance and power consumption are critical
requirements. Thus, the UML/MARTE evaluation framework covers these goals in the scope
of COMPLEX, managing them at different levels:
The Model Driven Engineering (MDE) methodology developed within COMPLEX by GMV
and UC provides techniques, methods and tools to model both the system functions and the
hardware platform. Moreover, it allows the specification of the system functionality allocation
to platform resources, enabling the exploration of different allocation schemes. The modelling
methodology offers to designers a set of advanced features specifically suited for enabling the
Design Space Exploration (DSE) which make possible to describe a complete space of
possible design solutions to be explored.
By means of UML/MARTE system models, designers and system architects are allowed to
model system functional and non-functional properties, as well as the requirements the system
must fulfil. Additionally, they can also define the system stimuli environment to enable the
later system simulation and performance and power estimation.
For that methodology, new modelling stereotypes have been defined in order to enable
designers to capture design space parameters used for defining different design alternatives
and to organize all the information in different, simpler views. These different design
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 13
alternatives define different HW resources characteristics (i.e. frequency), or different
application-HW/SW resources allocations.
According to the information captured in the COMPLEX UML/MARTE model, some code
generators have been implemented in order to extract all the relevant information, which
enables the execution of the DSE process.
In order to execute SCoPE+ simulations, the eclipse generators create the system object files
and the XML files required as inputs. To compile the object files with the system code, the
generators create different wrappers to integrate and communicate the components, using the
macros provided by SCoPE+ for such purpose. At the same time, they create the Makefiles
required to compile and link these files together with the functional code provided by the user,
creating the executable files. Additionally the “System Description” XML file is generated.
This file include the definition of the different HW/SW components used for implemented the
HW/SW platform and the description of the application components and the mapping of these
application components onto the HW/SW platform resources.
For automatic exploration, a file describing the design space is also created. The “Design
Space” XML file includes the specification of all the DSE parameters and DSE rules required
for DSE process. The DSE parameters are exploration variables which covers HW
characteristics (frequency, memory size…) and application-HW/SW resources allocation. The
DSE rules enable to constraint all the possible combinations of these DSE parameters
according to logical expressions.
Figure 3: COMPLEX UML/MARTE-based design flow
In addition to that, UC has developed the Stimuli code generator. The Stimuli Scenarios code
generator produces the necessary infrastructure to excite the system during the simulation and
performance analysis, and producing the skeletons of the stimuli scenarios so that the
verification engineer can insert the necessary behaviour. The stimuli scenarios enable a DSE
loop to simulate some significant execution scenarios to obtain representative system metrics.
The Stimuli scenarios code generator has been developed as a set of generation templates
written in the standard MTL language. The Stimuli scenarios code generator is included in the
plug-in and integrated in the Eclipse application as an extension plug-in. By means of the
COMPLEX Eclipse Application
Model-Driven Architecture (MDA) Entry
PDM PIM
PSM
Final platform
(after DSE)
Makefile
XML
Performance
Analysis
SCoPE+Containers
Source
Code
Design
Space
Exploration
MOST
Script
DSE Loop
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 14
plug-in extension, the stimuli code generator can be triggered; obtaining the corresponding
files that enables the simulation of the set of stimuli scenarios modelled by UML/MARTE
models.
Finally, An IP/XACT generator has been developed and integrated in the COMPLEX plug-in.
The MARTIX generator is a tool able to automatically produce the IP/XACT description of
the HW platform from a COMPLEX UML/MARTE description of the system. This option
enables the user to select the use of SCoPE+ proprietary XML format or IP-XACT format for
connecting the Eclipse application to the simulator, which also enables the connection of the
UML/MARTE frameworks with other tools, such as Synopsys.
All these code generators have been developed as a set of generation templates written in the
standard MTL language [20]. The development has been done through Acceleo [19], a code
generation framework fully integrated in Eclipse. Using that, these XML generators have
been implemented as an independent plugins integrated in the COMPLEX Eclipse
Application as a plug-in extension.
MDE level process is fully integrated in an Eclipse based framework through the COMPLEX
Eclipse Application. For that purpose, a GUI has been developed adding to Eclipse the menus
and options required to perform the entire process.
Figure 4: COMPLEX Eclipse Application GUI
The MDA methodology and tools developed in the scope of COMPLEX directly impacts the
development of the system by providing early estimation of performances and power
consumption.
The SCoPE+ tool is a high-level, fast simulator developed to provide early timed virtual
platforms where designers can perform the SW design process considering HW
characteristics. SCoPE+ tool is capable of obtaining the expected performance metrics of the
simulated configurations, providing the user with the needed information to optimize the final
product. SCoPE+ tool works on top of SystemC, so specific HW components can be
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 15
integrated in the internal TLM HW platform model, together with the generic components
provided by the tool.
SCoPE+ is based on the previous SCoPE tool, a virtual simulator based on annotated native
simulation. However it adds novel critical features to improve the design process at early
design steps. Previous fast simulators, such as SCoPE, are oriented to enable easy exploration
of the design space. However, they have a great drawback: they require a completely refined
SW code for performing the simulations. Inter-thread, inter-process or distributed
communications must be completely fixed in the source code before performing any
simulation. As a result, the real exploration effort can be huge, since the development of all
the codes required to provide communication and concurrency in the system for each explored
configuration can represent a tremendous work.
SCoPE+ plus overcomes this limitation since it only requires the platform-independent
functional code of the system components. All additional elements for providing
communication and concurrency in the system are automatically generated at simulation time
from the information obtained from the system description. For such purpose a new interface
has been generated, capable of receiving and managing all the information contained in the
UML model. From that information, all the system wrappers required to interconnect the
functional components are automatically created for each simulation, adapted to the
configuration to be explored on each execution.
As a result SCoPE+ provides new features, enabling the definition of multiple possibilities in
the UML model that can be simulated later without additional user effort. This is especially
interesting when considering communication and concurrency features. System functional
services can be identified as cyclical, with different periods, or sporadic. Communications can
be defined as synchronous or asynchronous, protected or not, etc. Additionally,
communications are modelled differently depending on the allocation of client and server
components.
At the same time, SCoPE+ enables considering different implementation alternatives during
the exploration. Allocations of the same components to different processor types, which imply
different annotations, can be handled by the tool in a single exploration. Additionally, HW
allocations of the different components can be simulated together with SW allocations of this
and other components.
SCoPE+ has been connected with the UML/MARTE flow in order to automatically receive
all the information of the high-level model, without requiring intermediate manual operation.
It has also been integrated with MOST, which automatically launches all the SCoPE+
executions required to perform the explorations required by the user. All the compilation
scripts are also generated from the UML/MARTE model, so the use of the SCoPE+ can be
hidden by the COMPLEX Eclipse Application GUI, since no manual intervention is required
during the simulation process.
3.2 Executable Specification
The output of the MDA entry is an executable model generated from the PIM. The PDM is
used to generate a structural platform model with virtual processing, memory, and
communication elements. These are used to model the resource constraints of the execution
platform. The generated executable specification (Algorithm Domain), platform description
model (Architecture Domain), and platform mapping are shown in Figure 5.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 16
Figure 5: Example of an Executable Specification and a Platform Mapping
3.2.1 Executable application model
In our executable application description model we perform a separation of behavior
(computation) and protocol (communication). Our concurrent building blocks are tasks or
processes that contain a behavioral and a protocol part.
The behavior part describes the function or algorithm to be executed, written in sequential
C/C++ code. This description is independent from an implementation in either HW or SW.
Behaviors can be composed of functions and describe a pure sequential execution order. This
enables reuse of existing software descriptions. Moreover, the tools mentioned in the concept
and motivation above allows synthesis to a C/C++ representation. An abstract task describes
a “Runnable” i.e. a process. Each abstract task contains a single behavior. Abstract tasks can
either be active or passive. An active task starts running immediately after its activation and
can either be blocked through a communication request or when its computation is finished.
An active task can be (self) triggered again after a certain amount of time (time triggered task,
or periodic task). Passive tasks can only be triggered by active tasks through explicit requests.
A passive task cannot trigger itself and it cannot trigger any other passive task.
The protocol part describes communication among behaviors. It is realized through a port that
allows active tasks to call service on another passive task's behavior. These calls are blocking,
i.e. the caller's behaviour can be continued after the service call has been completed. When
multiple active tasks are requesting a service call of the same passive task a scheduling action
is required. More details about this can be found in [5][6]. These service calls abstract from a
certain communication protocol implementation in either HW or SW.
3.2.2 System input stimuli
In order to examine and analyze the parallel application description model under a certain
workload scenario, it needs to be stimulated accordingly. The system stimuli might originate
from user interaction or communication with other components that are part of the system's
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 17
environment. These stimuli describe use-case scenarios and can be derived from a
UML/MARTE use-case specification or from an environment model in Matlab/Simulink.
3.2.3 User constrained HW/SW separation & mapping
The user-constrained HW/SW separation and mapping defines the binding of tasks from the
application model to execution resources of the architecture/platform description model.
Active and passive tasks can be mapped to execution resources, while passive tasks can only
be mapped to memories.
3.2.4 Architecture/Platform description
The platform description model is composed independently from the application model. It is a
pure structural and extra-executable representation of the execution platform consisting of
execution resources (like SW processors, DSPs or ASICs), memories, communication
resources (like shared buses), and pre-existing IP components. In addition, constraints that
have a direct influence on the timing and power consumption are represented in the
component's meta-description. For SW processors it is the instruction set architecture (ISA)
including its pipeline behavior, power modes, data and instruction cache models, and bus
interfaces. For custom HW the used RT component library and for communication resources
scheduling policies for shared media have to be specified.
The principal result for Magillem is the integration of our IP-XACT tool-chain in the
COMPLEX flow. More specifically our tools are now capable to import high level description
coming from MDE (Model Driven Engineering) world and link these data with low level
description of the models of hardware components (SystemC models like in Synopsys tools).
This work is crucial because hhigh-level modeling and specification languages are becoming
commonly used in the embedded systems industry. UML/MARTE models are now used to
address design complexity, but the gap from high-level description to the detailed description
of IP components or design is too wide. Moreover, the generation of simulated platforms is a
very tedious and error-prone task, as it depends on a given platform architecture. The high
level description of the platforms doesn’t contain all required information for the generation.
If we consider, in the context of the COMPLEX project, the direct generation of the Synopsys
Virtual Platform given an UML/MARTE high-level description (see Figure 6) the following
issues are raised:
The lack of transformation techniques between components descriptions in the
UML/MARTE editor given PCT scripts (1).
The lack of transformation techniques between an UML description of the platform
and its generation technique given scripts (3).
All the transformation techniques are manual (1) (3) (4).
The referencing of components between the different levels is not maintained.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 18
Figure 6: The Generation model flow without IP-XACT
The IP-XACT tool-chain addresses the construction issues of abstract model levels with its
relevant data, especially in the high-level model, that is, the gap to be felled between
specification and implementation, and automation of the transformations. The IP-XACT is
based on a centric representation of the data, using the new IEEE 1685 (IP-XACT) standard.
The purpose of the COMPLEX IP-XACT tool-chain is to provide a mechanism for generating
a SystemC description of virtual platforms from a high-level UML/MARTE description
through IP-XACT. IP-XACT is an XML schema for describing the HW system architecture.
Thus, it represents a good centric schema as it handles the structural description of all the
system, from functional level down to implementation levels, and managing the hierarchical
dependencies. The IP-XACT tool chain transformation mechanisms take into account
transformation rules that are defined to generate IP-XACT descriptions from UML MARTE
designs.
The IP-XACT-based transformation steps are as follows (see Figure 7):
1- High level specifications are derived given platform components PCT scripts.
2- IP-XACT description of components is derived given the high-level specifications.
Each IP-XACT description references the corresponding PCT scripts initial reference.
3- UML models of components are derived given the IP-XACT components descriptions.
4- UML/MARTE users define components assembly for the platform architecture.
5- The architecture specification is derived from MARTE to IP-XACT.
6- The transformation mechanism from IP-XACT to the Synopsys Virtual Platform is
done by providing the PCT scripts of the design elements.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 19
uP
BUS
UART
Periph.
uP
BUS
UART
Periph.
NameTypePortsInterfaces
NameTypePortsInterfaces
NameTypePortsInterfaces
NameTypePortsInterfaces
NameTypePortsInterfaces
NameTypePortsInterfaces
UML
IP-XACT
PCT-Script
IP description in library IP Assembly
Specifications NameTypePortsInterfaces
NameTypePortsInterfacesFileSet
NameTypePortsInterfaces
3
2 4
5
Ref.
Ref.
6
Script
Ref.
Ref.
ScriptScriptScript
Script
1
Figure 7: Information links between abstraction layers
The IP-XACT specification is a backbone for federating the heterogeneous data manipulated
by design, implementation or verification teams/tools. We propose a centric description of the
architecture for also federating tools of the framework.
3.3 Estimation & Model Generation
In order to allow a fast simulation and estimation, we create annotated C/C++/SystemC code.
This annotated code contains information about the timing and power of each component.
Power and timing information for each of them is obtained using existing and sophisticated
tools.
As depicted in the motivation a realistic system consists of components of different type e.g.,
custom hard- and software as well as IP-components, like communication infrastructure. Each
component is estimated individually, using an appropriate tool. Based on the estimation an
augmented version of the component is created, containing a power and timing model of the
component. From these annotated components a virtual prototype is generated. This prototype
is used to estimate the power and timing of the overall system. In the next four paragraphs we
describe these steps in more detail.
3.3.1 Hardware/Software task separation
Depending on the user-defined mapping, each behavior of the parallel application description
is estimated with an appropriate technique. The tools, used for HW & SW estimation and
characterization, perform a simulation-based estimation. Thus, each component must be
simulated and characterized individually. The system is split into individual components. The
surrounding system serves as testbench/test environment during the simulation. This way we
can simulate each component individually and still obtain estimates, which correspond to the
behavior of the overall system.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 20
In this generation flow, the SMOG tool (developed by OFFIS) is in charge of task separation
and virtual generation tasks, including the synthesis of the TLM interfaces that are needed to
interconnect the system components, and specifically the TLM2 IP components. SWAT tool
is in charge of SW estimation, while HW estimation is performed through PowerOpt+.
Therefore this integration enables a performance simulation based on native simulation of
software and post-synthesis custom HW estimation.
Figure 8: System generation front-end, task separation and interface synthesis.
More details auf the hardware/software task separation can be found at [14].
3.3.2 Custom Hardware estimation
Timing and power estimation of application specific hardware designs can be done at nearly
every level of abstraction from transistor- up to behavioral level at ESL. Since we address
behavioral tasks that are mapped to hardware and are meant to be implemented in custom
ASIC hardware, we only address behavioral-level estimation here [7].
To consider the challenges of ASIC power-modeling mentioned in Section 2 OFFIS
combined synthesis with cycle-accurate simulation at RT-level and a subsequent phase of
basic-block identification and power/timing annotation. Although extensive power estimation
at RT-level is very time consuming and thus not applicable in HW/SW co-simulation, it can
be used as characterization approach for higher level estimation. This is why we apply lower-
level estimation provided by the OFFIS PowerOpt+ tool to a small but typical testbench and
derive cycle-averaged power estimates. These power values will then be further abstracted to
basic block level and annotated to the internal control- and dataflow graph-representation. We
differentiate between dynamic and static power as well as its source (e.g., functional units,
controller, or clock tree). Leakage power at RT-level is nearly independent on data pattern [8]
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 21
(variation of 15 %) and thus it mainly depends on elapsed time whereas dynamic power
depends on the testbench stimuli.
Figure 9: Hardware characterisation flow
For the proposed estimation and characterisation flow, as it is shown in Figure 9 the
components of the design that should be implemented as full-custom hardware are given as a
synthesisable C description, respectively. This description is transformed by the synthesis tool
into a control and data flow graph (CDFG), containing all information about the functional
behaviour. During high-level synthesis this CDFG is transformed into a RT data path. The RT
data path consists of a set of parallel running processes. Each process has its own local
controller and private registers. A design may have several instances of the same process. All
instances perform the same behaviour, but each instance must be estimated and characterised
individually, in order to consider different data dependencies of the individual process
instances. Inter-process communication is performed using a simple hand-shake protocol. For
the generated data path a floor planning is performed, too.
For the functional model of the design, the RT data path is analysed and hardware basic
blocks are identified. The functional model will cover all aspects related to the behaviour of
the design e.g., dynamic power dissipation or timing. The extra-functional model is created by
analysing the synthesis artefacts caused by module selection or by the floor planning, for
example. These artefacts include leakage, clock-tree and controller power, for example. Both,
the functional model in terms of identified hardware basic blocks, as well as the non-
functional design-characteristics, are used to create an augmented SystemC description of the
overall design's full-custom hardware parts.
HW taskC
DF
GB
asic
blo
ck in
form
atio
n
SystemC frontend
High-level synthesis
Hardware basic block
identification
Hardware basic block
characterisation(dyn. Power, timing, …)
Controller synthesis
Design characterisation(leakage, clock-tree power, controller power,…)
Block annotated C++ writer
Augmented SystemC(HW-BAC++)
RT data path
Ex
isti
ng
flo
wN
ew
ap
pro
ac
h
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 22
The presented custom hardware timing and power estimation technique supports the creation
of virtual prototypes for embedded full-custom hardware modules. Based on the automatically
generated cycle-accurate functional description at register transfer level a characterization of
the module is performed and a high-level, C++-based virtual prototype is generated. It is
augmented with RT level accurate power and timing information. First experiments on data
intensive hardware accelerators show a fast and accurate estimation of power properties with
a total error of about 3.6 % and a speed-up of approximately 192 compared to an RT-level
estimation, while obtaining cycle accurate timing information. These properties support early
design space exploration [9].
3.3.3 Software estimation
At the intermediate level of the COMPLEX tool chain, and integrated with software
generation, hardware model characterization and design space exploration, the detailed
software estimation toolset SWAT (SoftWare Analysis Toolset) plays a key role. Its main goal
is to provide more accurate and detailed performance and energy estimates with respect to
those obtained using higher-level and more abstract models [10].
More specifically, SWAT is a collection of tools for embedded software execution time and
energy consumption estimation and optimization. Each tool performs elementary operation
such as static and dynamic model construction, energy estimation, software analysis and
reporting, back-annotation and so on. Such tools have been organized into "core" flows
designed to have seamless mechanisms for the integration with other tools of the COMPLEX
flow. In detail, the SWAT tool-chain implements modelling, estimation and optimization
techniques for embedded software applications written in pure C code. The tool chain is
organized into a front-end, responsible for the modelling phase (target processor model,
source static model), a set of “core” flows implementing the different functionalities of
SWAT, and a post-processing engine, necessary to analyse the execution traces (event traces).
Figure 10: SWAT flow
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 23
Figure 10 depicts the SWAT flow and is described in the following paragraphs:
Target processor characterization flow. This flow has the goal of expressing the execution
time and energy consumption characteristics of the target core in terms of LLVM instructions.
The input of this flow is an instruction-level characterization of the target processor (provided
by the vendor) and the output is an abstract model expressed in terms of the LLVM
instruction-set.
Estimation flow. It provides the functionality for performing dynamic estimation of the
execution time and energy consumption of a given application executed with a specific set of
data. The models involved in this process are data-independent and thus using different set of
data does not require changing the models but only re-running the instrumented application.
The input is a set of C source files and the target processor model as derived by the
characterization flow, and the output is an overall estimate of execution time and power.
Analysis and back-annotation flow. The models generated in the front-end phase of the
estimation flow can be analysed in further detail do derive different static and dynamic
metrics. Such metrics are then summarized either in the form of an html report or as back-
annotated source code.
Optimization flow. Three are the types of optimization implemented: (i) an experimental C-to-
C optimization implemented on top of the LLVM-opt tool integrated with MOST tool to
perform iterative compilation; (ii) a power optimization on the selection of the CPU operating
mode (both in terms of voltage and frequency) to be assigned to each function/group of
functions; (iii) a suggestion-oriented optimization used to annotate the critical portions of the
source code with the most suitable transformations to be applied.
Instrumentation and trace flow. This flow has the goal of tracing specific information during
the execution of the application. The flow is split into a static, rule-based instrumentation
phase and in a dynamic, optional execution phase. If used as a standalone tool, both phases
are executed and the resulting event trace is fed to a post processor to collect statistics. If used
in conjunction with other COMPLEX tools, only the instrumentation phase is necessary. This
phase produces as output a binary library (or set of object files) implementing the
instrumented version of the application. Such a library is the linked with other part of the
system’s executable models to allow a complete system simulation and estimation.
The SW power estimation methodology has been applied to several benchmarks and the
estimates compared with the most accurate available figures, i.e. those obtained with a target-
specific, power-enabled instruction-set simulator [11]. The absolute estimation errors obtained
on a set of several benchmarks ranges from less than 2% up to 13%, with an average of 6%.
Regarding memory, PoliTo has focused on the optimization of the memory sub-system of a
typical embedded device, where energy (primarily) but also other metrics such as reliability
(in terms of device aging) have been considered as optimization metrics.
From the architectural standpoint, the optimization of the memory subsystem relies on a
single architectural transformation consisting of implementing the address space as a multi-
bank memory instead of a single-bank monolithic one. We speak therefore of memory
partitioning for what concerns the basic transformation used in our optimization strategy.
Partitioning is beneficial for energy (and other metrics) because the distribution of accesses to
the memory is not uniform: because of the well-known locality principle, some locations are
accessed more often than others. This property, used together with the fact that the
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 24
performance and energy cost of accessing a memory increases proportionally with its size,
allows to optimize the common case (a classical low-power optimization principle) by
accessing a smaller block (one bank) for most of the time. The non-accessed banks can be
then power-managed to reduce energy.
Figure 11 shows a conceptual drawing of the partitioning idea for the case of 2 partitions.
Figure 11: Conceptual idea of memory partitioning to improve energy and performance
It is clear that the second configuration is more convenient as a larger number of accesses fits
into a smaller memory. The implementation of the above architecture requires minimal
encoding circuitry to drive an address to the correct sub-block. Notice that the partitions are
mutually exclusive, so only one of the banks is active at any given time.
In COMPLEX, we have implemented the above scheme by generalizing it into an
optimization tool (MEMOPT) that derives the energy-optimal partitions by analysis of the
trace of the memory accesses. The tool (see Figure 12) takes the dynamic execution trace of
the program in input, and based on a set of command-line switches, computes an optimal
(energy-wise) partition of the scratchpad. Additional inputs include (see dotted arrows on the
right side of the box) technological data and the address range. The tool returns
power/energy/performance figures (metrics) and, as a side output, the resulting memory
configuration (dotted arrow).
The tool has only one option suitable for exploration (Max), that is, the number of maximum
sub-blocks in which the memory can be split. This is required to avoid arbitrarily fine
partitioning that will have too high partitioning overhead.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 25
Figure 12: Inputs and outputs of the MEMOPT Tool
The MEMOPT tool can also operate as a pure modeling tool and it is structured in such a way
that it can invoked to run optimization or simply to yield the energy and performance/aging
cost, for a given input trace.
Figure 13: MEMOPT View as a Simulation and Optimization tool
Figure 13 shows the internal architecture for the MEMOPT tool. On the left side, the
“simulator” contains the memory models (for the various metrics); based on the input trace,
and the technological data (there are in different models for different technology libraries) and
the size of the memory, the tool returns the total (for the executed trace) execution time,
lifetime, and consumed energy and power.
This modular architecture also allows easy integration into the COMPLEX flow. Specifically,
MEMOPT has been used into the main loop of the Design Space Exploration (DSE) tool
developed in COMPLEX. Figure 14 shows the conceptual integration of MEMOPT (in the
“optimization” version).
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 26
Figure 14: Integration of MEMOPT with MOST
3.3.4 Pre-existing IP & virtual component models
Despite custom HW and SW, pre-existing or third-party components must be considered.
Models for communication infrastructure like buses are provided by different vendors and
typically are provided as parameterizable soft-macros, allowing an adaption to the system to
be built. The macros already contain timing information but typically no information about
power. Thus, communication power is estimated based on the TLM-2.0 transport calls.
Calculation accounts for the size of transferred data, type of access, as well as duration of the
communication. Interruptions and re-arbitration events are also considered and must be
delivered by the communication model.
General IP components delivered by third-party vendors cannot be estimated like custom HW
and SW components. System-level representatives of these IP's are typically provided as
black-box executable models (e.g. API to a compiled object-file). These black-box modules
typically contain timing but no power information. In order to obtain at least approximated
power values a simple wrapper or monitor is used, which monitors the components in- and
output. This information is used to control a power state machine (PSM) [12] inside the
monitor. Power states and power values of the PSM are either obtained from the component's
data-sheet or estimated manually.
Latest results on an abstraction methodology for generating time- and power-annotated TLM
models from synthesisable RTL descriptions can be found in [13]. The proposed techniques
allow the integration of existing RTL IP components into virtual platforms for early software
development and platform design, configuration, and exploration. With the proposed
approach, IP models can be natively integrated into SystemC TLM-2.0 platforms and
executed 10-1000 times faster compared to state-of-the-art RTL simulators. The abstraction
methodology guarantees preservation of the behavior and timing of the RTL models. Target
technology dependent power properties of IP components are represented as power state-
machines and integrated into the abstracted TLM models. The experimental results show a
relative error less than 10% of the abstracted model's power consumption compared to state-
of-the-art RTL power simulators [15].
3.3.5 Virtual system generation
During generation of the virtual system annotated source from (f) as well as the selected
models from (i) are combined to a virtual prototype. The example in Figure 15 shows the
virtual platform model obtained from the mapping specified in Figure 5.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 27
Figure 15: Virtual Platform generated from example shown in Figure 5
The timing and power annotated execution-models are integrated with timing and power
characterized platform models. In the example in Figure 15 these platform models are: a
TLM-2.0 router with bus protocol and power model, and a system memory model. For the
integration of the annotated task behavior with the TLM communication network we provide
communication interface (IF) templates. These interfaces translate the function calls of the
active tasks into TLM transaction containers. For passive tasks we synthesize a memory
interface which decouples the TLM transactions from the activation of the behavior. That
means the transaction is stored in the memory completely, before the passive task is activated.
These interface templates are timing and power characterized for the chosen platform.
More details about the task separation and virtual system generation can be found in [14].
Virtual platforms [23] together with SystemC Transaction Level Modelling (TLM) [27] are
becoming the key solution to address embedded software development and verification in
parallel with hardware development. The large adoption of virtual platforms is constrained by
some open issues:
1) Presence of legacy RTL (low level) IP blocks to be integrated into the virtual
platform which is usually modelled at transaction level (high level) to speed up
simulation. Hand-made transactors to adapt RTL blocks are inefficient and error-
prone. Model re-writing is expensive.
2) Presence of HW models not described in a HW description language. For instance,
Model-Driven Design is gaining attention for the development of embedded
components. Tools like Matlab/Simulink/Stateflow [28] can be used to describe
the event-driven behaviour of hardware components.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 28
3) Modelling of external communications in case of networked embedded systems.
Embedded applications are becoming more distributed being based on devices
which interact together over wired/wireless channels by using protocols like WiFi,
Ethernet, TCP/IP, CAN, FlexRay, and ZigBee. To achieve further optimization,
embedded software should be tested in the full network scenario.
HIFSuite is a set of tools and application programming interfaces (APIs) that provides support
for modelling and verification of HW/SW systems [25]. The core of HIFSuite is the
Heterogeneous Intermediate Format (HIF) language upon which a set of front-end and back-
end tools have been developed to allow the conversion of HDL code into HIF code and vice-
versa. HIFSuite allows designers to manipulate and integrate heterogeneous components
implemented by using different hardware description languages (HDLs). Moreover, HIFSuite
includes tools, which rely on HIF APIs, for manipulating HIF descriptions in order to support
model abstraction and post-refinement verification.
Figure 16: HIFSuite architecture.
HIFSuite plays two roles in the extension of virtual platforms:
HIFSuite translates IP core descriptions in any of the available HDL front-ends to
SystemC/TLM.
HIFSuite abstracts digital components from RTL to TLM [24].
As depicted in Figure 16, HIFSuite consists of:
The HIF core language and manipulation API: a set of HIF objects corresponding to
traditional HDL constructs as, for example, processes variable/signal declarations,
sequential and concurrent statements, etc.
Front-end tools that parse Stateflow, VHDL, Verilog and SystemC (RTL and TLM)
descriptions and generate the corresponding HIF representations.
Back-end tools that convert HIF models into SystemC models.
A set of tools developed upon the HIF APIs that manipulate HIF code to support
modelling and verification of HW/SW systems. In particular A2T is a tool that
automatically abstracts RTL IPs into TLM models.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 29
3.4 Simulation
3.4.1 Pre-optimized power controller
IMEC’s main contribution consists of a light-weight run-time resource management
framework for embedded heterogeneous multi-core platforms. It allows dynamic adaptation to
changing application contexts and transparent optimization of the platform resource usage
following a distributed and hierarchical approach. The framework consists of a Global
Resource Manager (GRM) that is running in parallel with the central manager of the
application on the host processor of the platform. The operating points of the GRM are
identified in a design-space exploration phase as a set of Pareto-optimal configurations of the
application and their impacts with regards to the quality of experience, performance, and
energy consumption.
The pre-optimized power controller implements a framework for dynamic adaptation to
changing context and transparent optimization of platform resource usage [18]. It follows a
distributed and hierarchical approach. On the one hand, a Global Resource Manager (GRM) is
loaded on the host processor of the platform. It is a software task running in parallel with the
application. It is a middleware providing a bridge between the application, the user and the
platform. It conforms to practices of each Local Resource Manager (LRM) in each platform
IP core (e.g., HW block or SW processor). It is used to adapt both platform and application at
run time and to find global and optimal trade-offs in application mapping based on a given
optimization goal. On the other hand, each IP core can execute its own resource management
without any restriction, through an LRM. Such an LRM encapsulates the local policies and
mechanisms used to initiate, monitor and control computation on its IP core.
In contrast to the collaboration between the GRM and the LRMs, the GRM collaboration with
application and user is visible to the application developer and is performed as follows. First,
the QoS requirements and the optimization goal are set by the user. The goal is then translated
into an abstract and mathematical function, called utility function (e.g., performance, power
consumption, battery life, QoS weighted combination of them). Then, at run time, the GRM
manages and optimizes the application mapping taking into account the possible application
configurations explored at design time, the platform resources currently available, the QoS
requirements, and the utility function.
The framework has a generic and structured architecture that is valid for a broad range of
design flows, platform, and application domains:
It supports a holistic view of all platform resources which can be a heterogeneous mix
of different kinds of IP cores (e.g. hardware accelerators, FPGA’s, multi-core
processors, ...), memories, batteries, ...
It transparently optimizes the overall platform resource usage and application
mapping. Energy consumption is controlled parameters that the platform provides, e.g.
dynamic voltage scaling, frequency scaling, and activation/deactivation of IP cores.
It dynamically adapts to changing contexts, taking into account constraints on the
resource usage, while optimizing the overall quality of experience (QoE).
The optimizations are steered via heuristics that can be easily customized to the
application context. Typically parameters taken into account to optimize the QoE are
video resolution, video frame rate, audio frequency etc. In principle, it is even possible
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 30
to let the user adapt the heuristics at run-time, e.g. by changing the weight factors
associated with certain quality aspects.
The GRM framework implementation is very lightweight and therefore ideally suited for
embedded platforms:
Most of the optimization complexity is covered by a design-time exploration and
characterization of the operating points. This alleviates the run-time decision making,
which is implemented using lightweight data structures and optimization algorithms.
Data memory, instruction memory, and processing requirements are very small
compared to the typical requirements for embedded applications.
It can easily be instantiated for a given target platform and application domain. The
required modifications to the applications are minimal and the communication
protocol is simple and lightweight, leading to almost negligible processing overhead.
3.4.2 Timing & power aware executable virtual system prototype in SystemC
During system execution the annotated timing and power information is collected. Depending
on the workload model different execution paths, leading to different timing and power
values, are possible. After simulation, the collected information can be illustrated in a power-
over-time diagram or can be used for a power-breakdown. Our annotations can be traced at
different levels of granularity to allow a user-defined
trade-off between performance and accuracy.
Figure 17 depicts the different timing and power
evaluation levels. On the most abstract level we only
consider analysis on task granularity. This can be easily
performed between active and passive tasks with
blocking communication relation. Execution time and
power of the passive task can simply be inlined and
accumulated with time and power of the active task.
The next level of granularity works on communication
granularity. Power and timing of computation nodes in
the communication graph are accumulated and only
traced at the time points of communication. For a
deeper analysis of the timing and power behavior traces
on basic block granularity of a CDFG is also possible.
A possible new simulation view provided by the
COMPLEX project is the so-called Network View
depicted in Figure 18 and enabled by the SystemC
Network Simulation Library (SCNSL) [16]. It is an
extension of SystemC to allow modeling packet-based
networks such as wireless networks, Ethernet, and field
bus. It supports the simulation of packet transmission,
reception, contention on the channel and wireless path
loss. This way, the virtual platform can be used to
simulate networked embedded systems in a realistic communication scenario [26].
The advantages of SCNSL are:
Figure 17: Annotation hierarchy
C
Process
Graph
Com-
munication
Graph
Control Data
Flow Graph
CCommuni-
cation
Computation
Basic Block
Condition
true false
Shared
Data
Port
Interface
per basic block:
1) # cycles
2) average
switched
capacitance
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 31
simplicity: a single language/tool, i.e. SystemC, is used to model both the system (i.e.
CPU, memory, peripherals) and the communication network;
efficiency: faster simulations can be performed since no external network simulator is
required;
re-use of SystemC IP blocks
scalability: support of different abstraction levels in the design description
openness: several tools available for SystemC can be exploited seamlessly
extensibility: the use of standard SystemC and the source code availability guarantee
the extensibility of the library to meet design-specific constraints
Figure 18: Network View: the virtual platform of a networked embedded system is simulated together
with a model of the network
According to Figure 18, the traditional virtual platform, made of CPU, HW blocks and bus,
can be extended by wrapping it into a network node exchanging packets with other nodes
through a channel by using well-known protocols such as IEEE 802.15.4. The library also
provides traffic sources and sinks to generate concurrent flows with predefined statistical
behaviour. All these elements are provided by SCNSL and they can be instantiated in the code
as traditional SystemC blocks. SCNSL also allows improving timing and power analysis by
introducing the effect of communications which have a direct impact on timing behavior and
power consumption of the system under design [17].
3.5 Exploration & Optimization
The Exploration and Optimization phase creates a feedback loop between the performance
estimation part (including MDA Design Entry, Executable Specification, Estimation & Model
Generation and Simulation phases) and the parameters configuration of the target system.
In particular, the loop is closed by acquiring the system simulation traces to be post processed
before being presented either to the user or to an automatic framework for exploration and
optimization. Then, according to the information gathered by the previous simulation, a new
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 32
HW/SW system configuration will be selected within the design space. In the next paragraph
we describe these steps in more detail.
3.5.1 Simulation trace
The basic infrastructure for behavioural timing and power annotations within the COMPLEX
framework is based on a flexible and generic tracing interface. This tracing framework can be
used to trace arbitrary user data over time, including non-functional properties like power
consumption. It is used within the simulation platform to provide the outputs required by the
user and the exploration and optimization framework.
To enable both, a user-driven analysis based on the visual presentation of simulation results as
well as the integration into an automated exploration loop as performed by MOST, different
kinds of post-processing back-ends are supported. Especially for the generation of compact
performance metrics, a dedicated reporting and accumulation mechanism is necessary. This
usually implies on-the-fly pre-processing of the intermediate power values, for instance to
compute the overall energy consumption of a complete simulation of a particular application
scenario or configuration. Secondly, the integration with the graphical analysis framework of
the Synopsys Virtual Platform solution has been implemented. This is then usually based on
more detailed traces of values/events over time. Both aspects are addressed by the tracing and
post-processing mechanisms developed by OFFIS.
Since the COMPLEX simulation models are mostly based on SystemC TLM-2.0, the built-in
tracing capabilities of SystemC (based on sc_trace) are not sufficient, because they cannot
directly cope with temporal decoupling and local time offsets. In all of the estimation
techniques developed, such temporal decoupling techniques are used within the BAC++
simulation models to improve the simulation performance by reducing the number of
synchronisations with the SystemC discrete-event simulation kernel. The underlying core
technique of the annotation API presented to the user is based on so-called timed value
streams. These streams support local simulation time offsets to record values “in the future”
based on a more flexible time handling, either via (value,starting time) or
(value,duration) tuples. An additional, block-based annotation API is available as well,
required to relate source code structures with abstract execution times and (potentially
multiple) data streams.
The overall architecture of the COMPLEX trace generation framework is sketched in Figure
19. The power model within the observer of each augmented component in the simulation
(custom hardware, software, or Power State Machine-enabled IP) provides a given set of
default streams (static/dynamic/overall power, current power mode), which can be selected by
the user according to the hierarchical name of the component.
Additional, user-defined streams can be defined as well. In that case, the driving user
processes are responsible for time synchronisation of the stream object(s), while the default
streams of the power observers handle this transparently.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 33
Figure 19: Sketch of the BAC++ trace generator
Before being processed by the final analysis backend, optional pre-processor objects can be
applied to one or more input streams in order to get the desired output streams. This pre-
processing is useful for data reduction, compositing, and other intermediate value/time
transformations.
3.5.2 Exploration & optimization
Starting from the definition of the design space, the exploration and optimization step
iteratively generates an instance of the design space based on the knowledge acquired by the
post-processing of the simulation traces of previous selected configurations.
The exploration phase is a step in the design flow that is needed for surfing the design space
(changing the system parameters) in order to find the optimal system configurations among
all the possible alternatives that are part of the design space. Moreover, the design space
exploration loop is also used to determine some knowledge about the system parameters (such
as the main effects, interaction effects) and design space (such as, configuration distribution
with respect to the system performance). This phase can be done by using a user centric DSE
or an automatic DSE phase.
The goal in using an automatic design space exploration and optimization tool is in the fact
that it should be able to automatically interact with system models in order to avoid the
intervention of the designer for the DSE phase (except for the analysis of the results) once the
target problem is formally defined.
The Multi-Objective System Tune (MOST) tool is a tool for discrete optimization specifically
designed for enabling design space exploration of hardware/software architectures and is the
COMPLEX tool for the automatic design space exploration phase. MOST is a design space
exploration tool that helps driving the designer towards near-optimal solutions to the
architectural exploration problem, by supporting the exploration phase with state of the art
Design of Experiments (DoEs), optimization heuristics and Response Surface Models
(RSMs). The final product of the framework is a Pareto set of configurations within the
design evaluation space of the given architecture and analysis on the effects of design space
parameters on to the objective functions. One of the goals of MOST is to provide a command
line interface to construct automated exploration strategies. Those strategies are implemented
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 34
by means of command scripts interpreted by the tool without the need of manual intervention.
This structure can easily support the batch execution of complex strategies.
The support provided by MOST is crucial because the exploration phase is mainly composed
by repetitive tasks composed by the configuration of the simulated platforms and analysis of
the results. This task is very tedious and error-prone task; MOST tool is used to move the
designer effort on more high-level tasks in the DSE phase such as final results analysis. The
effectiveness of the Automatic DSE framework has been demonstrated within the project
through the effective integration of all the COMPLEX Use Cases, driving the exploration by
using knobs and parameters coming from both platform and application.
Figure 20: MOST tool
3.5.3 Design space definition
The design space is defined by the list of tunable parameters available on the HW/SW
platform. Moreover, it includes the set of possible values of each parameter and the rules
defining some cuts within the design space eventually due to interferences between
parameters. The design space definition represents the degrees of freedom that the designer or
an automatic tool can have for tuning the HW/SW platform.
3.5.4 Design space instance parameters
A design space instance is a valid configuration of parameters within the design space defined
before selected or by the designer or by an automatic tool. It is composed of a value for each
parameter of the design space, ranging within the set of available levels. Those values will be
used to fill the right parameter values in several stages of the design-flow (see Figure 2).
In the project, the list of parameters ranges at the MDA Design Entry and Executable
Specification levels from functional reimplementation to mapping of HW/SW tasks and IP
selection, while at Estimation and Model Generation level from IP and memories
configuration to selection of embedded software optimizations and run-time management
strategies.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 35
4 Potential impact and the main dissemination activities and exploitation of results
A wide variety of different methods and tools are developed with the COMPLEX project and
are integrated into a common COMPLEX reference framework, as shown in Figure 21.
Figure 21: The COMPLEX framework and reference tool set
The COMPLEX Framework is expected to enhance the European Union embedded system
engineering capabilities by developing the first industrially applicable framework for HW/SW
co-design and power management in combination with platform-based design space
exploration.
Figure 22 shows the overall exploitation strategy and timeframe for the COMPLEX
Framework. During the three years of project work, exploitation activities have been
prepared. The exploitation itself will start after the completion of the last project year.
After the end of the project the consortium aims to maintain the COMPLEX reference
framework and enable the EDA partners to turn their COMPLEX compatible tools into
products. Moreover, the COMPLEX reference framework can be re-implemented by EDA
companies to make it more stable and usable to external companies. A commercial
COMPLEX Framework re-implementation and exploitation is preferably foreseen for an
EDA partner of the COMPLEX consortium. Last but not least, COMPLEX demonstrator
designs can either be the basis for new industrial designs, or motivate new companies (e.g.
through the publication of success stories) to perform their next design with the COMPLEX
framework.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 36
Figure 22: Exploitation strategy for the COMPLEX Framework
After the successful completion of the COMPLEX project after 3 years, platform provides
(ST-I, ST-PRC, SNPS) are enabled to:
provide power and timing characterized IP cores or whole platforms,
provide platform-specific tool chains, augmented for estimating and modelling
execution artefacts of the platform,
and support the design of future platforms which cover the needs of future
applications/workloads.
EDA tool providers (SNPS, EDALab, MDS) benefit from:
added value through point-tool integration in COMPLEX framework,
creation of market-ready tools starting from the technology brought in COMPLEX,
additional customers and application areas,
and strengthening of their spin-off.
Application and System Integrators (Thales, GMV) are able to:
build products with higher quality through early consideration of platform artefacts
(timing, power & memory size),
get faster access to accurate virtual platforms,
and thus save costs of wrong platform selection/re-designs,
which decreases time-to-market due to early and confident application benchmarking.
Research Institutes and Universities (OFFIS, PoliMi, PoliTo, UC, IMEC) are able to perform:
networking with industrial communities,
education of students and engineers,
and dissemination through open-source tools.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 37
The industrial participants in the COMPLEX project are acting in the following markets
which are expected to be impacted after the achievement of the COMPLEX main objectives:
Mobile Communication (Thales)
Security (Thales)
Space Applications (communication & positioning systems) (GMV)
E-Health (Remote Monitoring), Wireless Sensor Networks (ST)
The following main tools have been developed during the COMPLEX project:
Tool Name Type Provider Exploitation
Hifsuite Tool for abstraction/synthesis EDALAB EDA Tools
(proprietary license)
Magillem S-CAD Architecture specification and
assembly
MAGILLEM EDA Tools
(proprietary license)
SCSNL WSN systemc TLM library EDALAB Open source
(sourceforge)
Synopsys VP
(Virtualizer)
Simulation environment Synopsys EDA Tools
(proprietary license)
MOST Automatic Design Space
Exploration Tool
POLIMI EDA Tool
(Proprietary license)
SWAT SW power estimation and
optimization
POLIMI Open Source
UML/MARTE
TCTool
UML/MARTE front-end UC Publicly available
(upon request)
SCOPE+ SystemC fast estimation
framework
UC Open source
GRM Global Run-time Resource
Management
IMEC Further development
SMOG HW/SW task separation tool OFFIS Further development
PowerOpt+ Behavioral synthesis tool OFFIS Further development
MMCO/MEMOPT Memory Modeling
Characterization and
Optimization
POLITO Publicly available
(upon request)
4.1 Exploitable Use-Cases
4.1.1 Use-Case 1 – Networked embedded system
ST-I and ST-PRC participate in the COMPLEX design flow in the development of power-
annotated SystemC TLM virtual platform. They also have a relevant role in the definition of
tools requirements and into the validation of COMPLEX methods and tools, by leading the
implementation of a wireless sensor network subsystem. The application comes from the
health care domain. It is a virtual machine oriented to data processing in body sensor
networks. A whole Wireless Sensor Network scenario is proposed as use case in the
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 38
COMPLEX project. The figure shows the architecture of the use case that is composed by
several nodes described at different detail levels. Node 0 is described by fully detailing its
inner architecture: the CPU of the SoC executes application SW, the operating system,
drivers, and the interrupt service routines. The SoC is a ReISC SoC, a project developed in
STM; the chip, at 90 nm technology, was taped-out at the end of 2009.
Figure 23: WSN embedded system
The ideal simulation of wireless sensor networks has to include both system modelling of the
node and the network behavior. Numerous simulation tools have been introduced for wireless
sensor and they are generally categorized into network and node simulators and emulators.
While emerging WSN SOCs requires early evaluation of global performance when SOCs are
still in their design flow, or to develop application running on the node that can be refined and
optimized including constraints derived by the physical implementation of node and the
network aspects. The state of the art simulators focus either on modelling the protocol stack
and the concurrency among nodes in the network, or on modelling the underlying hardware.
In this use case all aspects related to the WSN embedded system, i.e. namely SW, HW, and
NW are modelled by leveraging of SystemC language and in particular by using SystemC
Network Simulation Library (SCNSL). SystemC has a great flexibility in describing both HW
and SW and NW components at different level of abstraction and provides libraries for
Transaction Level Modeling (TLM) and verication. We further show, how starting from an
application modeled in Stateflow, and by using a SystemC code generator, and the virtual
model of the SOC, a complete global performance analysis has been carried out based on a
unified simulation environment without a need for the use of multiple tools for System and
Network modeling or co-simulation techniques. We mix a Virtual Platform (VP) model of a
node and a purely SystemC model of the application plus protocol stack for the other nodes,
thus improving at the same time, the scalability and the precision of the approach. In our
approach, the use of the model-based design ensures the generation of both models (the
SystemC code for the nodes and the C code to be run on the processor model in the VP) from
the same Stateflow model.
This holistic simulation approach, starting from a model-driven application design, is a
complete scenario modelled and verified with experimental results on the network side and
power analysis results on the node side.
Application SW
FreeRTOS
Periph . drivers
Power Mng .
CPU HW1
HW2
HW4
Radio Interface
Radio channel
NODE 1 NODE 2N …
HW3
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 39
The SWAT estimation flow has been proven to be much more efficient than ISS-based
analysis, namely more than 400 times faster (These results refer to a ReISC III core with 1.0
V power supply and operating at 50 MHz). This speed-up, combined with a satisfactory
accuracy, allows integrating the SWAT methodology within a design-space exploration
framework, in particular the MOST DSE engine.
For this use-case PoliTo has developed integrated of a full design environment for low-energy
Wireless Sensor Network applications with the ST-I virtual prototyping environment and the
SCNSL library from EDALab.
This design environment can be used to jointly optimize the platform parameters, the network
parameters and protocols, and the application, with the goal of minimizing total energy
consumption under functional and performance constraints. The environment has been
demonstrated successfully with this use-case. This integration has allowed testing the
MEMOPT with a realistic, industry-strength application on the ST embedded platform.
Figure 24 shows an example of execution of MEMOPT as a standalone tool, applied to a
memory trace referred to the architecture of Use Case 1. Notice that energy and lifetime
savings are significant thanks to the high non-uniformity of the memory access distribution.
Figure 24: MMCO used as an energy and lifetime optimization tool
4.1.2 Use-Case 2 – Battery powered multi-core system
The effectiveness of the GRM framework has been demonstrated on a POSIX-based
implementation of an industrial audio-driven video surveillance application, mapped on
different platforms: an x86-based platform running at 800 MHz, and an ARM-based TI
OMAP 4460 platform running at 700 MHz.
The following figure illustrates the evolution of the energy per frame for two platform
constraints (different energy budgets, same battery duration) with and without the GRM. Due
to an optimized adaptive selection of application configurations, our GRM allows optimizing
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 40
the QoS of the application while keeping the platform battery alive for the required duration.
In contrast, this cannot be ensured without a resource management framework.
Figure 25 Energy-per-frame evolution with and without GRM
Analysis of the GRM run-time efficiency confirms that the run-time overhead is negligible:
The initialization of the GRM framework takes less than 10 ms.
The time to perform application configuration selection and to reconfigure the
platform is in the order of 1 ms, on both target platforms.
The total processing overhead due to the use of the GRM is about 1.16% of the total
application run-time on the x86 platform, and 0.6% on the TI OMAP platform.
The memory overhead for the GRM implementation is also limited: the GRM library code
size is about 100KB for the x86 platform, whereas the platform and application-dependent
data structures require only a few KB of memory.
In combination, the experiments show that the proposed combined approach of design-time
exploration of application configurations and their dynamic run-time management improves
the overall QeE of the system, with no significant impact on the application and no significant
run-time overhead.
Synopsys participated in the COMPLEX project to integrate the virtual platform simulation
technology into the COMPLEX design space exploration flow and to support the validation of
the use cases. Next to that Synopsys also contributed through the development of a modelling
technology that supports exploring application mapping on multicore systems and the
expected impact on the memory subsystem architecture as well as on the power consumption
of the design. Through the participation in the COMPLEX project Synopsys has been able to
obtain 2 key results:
1. First of all by integrating a virtual platform as a simulation technology for early
software development and validation into the COMPLEX design space exploration
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 41
flow for use case 3 we were able to validate the capabilities of a generic virtual
platform in the context of a complete and elaborate design methodology.
2. Secondly a multicore optimization technology has been perfected that enables early
design space exploration experiments for multicore designs. The technology has been
used and tested in a commercial use case and was extended with power modelling and
analysis capabilities.
Internally Synopsys has continued the work on these results in parallel to the COMPLEX
project and these have resulted in the following concrete products:
1. The demonstrator that was developed in the COMPLEX project has been productized
as part of the Synopsys Virtualizer product portfolio. A derivative of the demonstrator
platform is now delivered as a starter-kit which can be used by customers to tune the
virtual platform to their own design needs, but also as a methodology demonstration
vehicle that can be used by Synopsys’ partners to integrate their solutions and
capabilities into. A first version of this product was released mid-2012, meanwhile a
number of variants have been made available. An example for such a variant was
announced on march 21, 2012: here
Figure 26: Synopsys Ecosystem for mobile platforms
2. Early 2011 Synopsys announced new technology for Optimizing Multicore Systems:
(http://synopsys.mediaroom.com/index.php?s=43&item=896). Which has been
selected as one of the 5 EDA products in EDN Hot 100 products of 2011
(http://www.edn.com/article/519945-
EDN_Hot_100_products_of_2011_EDA_IP_storage.php). This technology is using
the task based virtual platform modelling techniques which are used and extended in
the COMPLEX project. In 2013 this product will include a productized version of the
power modelling technology that was developed during the COMPLEX project.
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 42
Figure 27: Synopsys task based virtual platform modelling techniques
4.1.3 Use-Case 3 – Model-based space application
The main goal of the exercise performed by GMV in the exercising of the COMPLEX
methods and tools is focused on the verification of the methodology and the associated tools.
As a result, it is possible to infer from the results of the Design Space Exploration (DSE)
execution the optimal architecture and the most suitable partitioning for the hardware and
software systems. For such purpose GMV has developed a domain specific use case. The use
case is an on-board distributed application in the context of Space Situational Awareness
(SSA) consisting of an object survey, tracking and imaging system represented in Figure 28.
Attitude and Altitude sensors
Image capturing and
filtering
Object survey, tracking,
hazard analysis
Optical Device
GPS
ReceiverStartracker
Antenna
Figure 28: Space Situational Awareness system
This use case defines several components interconnected among them, which, based on
cyclical processes that present quite different computational requirements depending on the
system inputs are able to track the space situation. These characteristics, combined with the
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 43
power requires present on space systems, makes critical the disposal of a powerful design
infrastructure capable of helping the designer to define the most suitable system capable of
fulfilling all the imposed constraints.
Using the entire framework, the design space exploration tools has demonstrated to ability to
find the optimal architecture given the functional and non-functional properties, and
requirement specifications incrusted in the system model. COMPLEX tools favour the
refinement of the system models and guide architects and designers through the different
design choices: system function modelling, platform resource modelling, HW/SW separation,
and assessment of extra-functional properties, optimisation and others.
Specifically, the UML/MARTE COMPLEX flow developed for the Use-Case 3 has
demonstrated that additional improvements to the general exploration flow described in the
previous section are possible. Specifically, the UML/MARTE COMPLEX flow removes the
MDA entry from the DSE loop, and avoids the recompilation of the executable performance
model for each iteration. This way, a significant speed-up of the DSE loop has been achieved.
4.2 Socio-economic impact and the wider societal implications of the project
This information can be found in the “Report on the Wider Societal Implications of the
Project” publically available a [21].
4.3 Main dissemination activities
This information can be found in the “Final Report on Standardization and Dissemination
Activities” publically available at [22].
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 44
5 Contact information
Title: COdesign and power Management in PLatform-based design space
Exploration
Acronym: COMPLEX
Project website: http://complex.offis.de
List of
Contractors:
Name (Contact name & E-Mail): Short
name:
Country:
OFFIS e.V. Philipp A. Hartmann ([email protected])
OFFIS Germany
STMicroelectronics srl. Sara Bocchio ([email protected])
ST-I Italy
STMicroelectronics Beijing R&D Inc. Chris Wu ([email protected])
ST-PRC China
Thales Communications SA Sylvie Raynaud ([email protected])
Thales France
GMV Aerospace and Defence SA Carmen Lomba ([email protected])
GMV Spain
SNPS Belgium NV Bart Vanthournout ([email protected])
SNPS Belgium
ChipVision Design Systems AG (until end of 2010)
CV Germany
EDALab srl Davide Quaglia ([email protected])
EDALab Italy
Magillem Design Services SAS Emmanuel Vaumorin ([email protected])
MDS France
Politecnico di Milano William Fornaciari ([email protected])
PoliMi Italy
Universidad de Cantabria Eugenio Villar ([email protected])
UC Spain
Politecnico di Torino Enrico Macii ([email protected])
PoliTo Italy
Interuniversitair Micro-Electronica
Centrum vzw Eddy de Greef ([email protected])
IMEC Belgium
European Electronic Chips & Systems
design Initiative Adam Morawiec ([email protected])
ECSI France
Co-ordinator
Contact:
Kim Grüttner
OFFIS - R&D Division Transportation
Escherweg 2 - 26121 Oldenburg - Germany
Phone/Fax.: +49 441 9722-228/-278
E-Mail: [email protected]
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 45
6 References
[1] Gianluca Palermo, Carlo Brandolese, Francisco Ferrero, Fernando Herrera, Gunnar
Schomaker, Claus Brunzema, Kim Grüttner, Kai Hylla, Bart Vanthournout, Davide
Quaglia, Luciano Lavagno, Massimo Poncino, Emanuel Vaumorin, Chantal Couvreur
and Saif Ali Butt. Definition of application, stimuli and platform specification, and
definition of tool interfaces. Tech. Rep. COMPLEX/PoliMi/R/D1.2.1/1.1, COMPLEX
project deliverable (October 2010) URL http://complex.offis.de/docs/8
[2] F. Herrera, H. Posadas, P. Penil, E. Villar, F. Ferrero, R. Valencia, An MDD
Methodology for Specification of Embed ded Systems and Automatic Generation of
Fast Configurable and Executable Performance Models, in: International Conference
on Hardware/Software Codesign and System Synthesis, CODES+ISSS’2012,
Tampere, FI, 2012.
[3] F. Ferrero, R. Valencia, F. Herrera, E. Villar, L. Lavagno, D. Quaglia, System
specification methodology using MARTE and Stateflow, Tech. Rep.
COMPLEX/GMV/R/D2.1.1/1.1, COMPLEX project deliverable (Dec. 2010). URL
http://complex.offis.de/docs/11
[4] F. Herrera, P. Penil, E. Villar, F. Ferrero, R. Valencia, L. Lavagno, D. Quaglia,
SystemC generation tools from MARTE and Stateflow, Tech. Rep.
COMPLEX/UC/P/D2.1.2/1.0, COMPLEX project deliverable (Jun. 2011). URL
http://complex.offis.de/docs/21
[5] K. Grüttner, C. Grabbe, F. Oppenheimer, W. Nebel, Object Oriented Design and
Synthesis of Communication in Hardware-/Software Systems with OSSS, in:
Proceedings of the SASIMI 2007
[6] K. Grüttner, H. Andreas, P. A. Hartmann, A. Schallenberg, C. Brunzema, OSSS - A
Library for Synthesisable System Level Models in SystemC™ (2008). URL
http://www.system-synthesis.org
[7] K. Hylla, P. Gonzalez, P. Sanchez, F. Herrera, Final report on custom hardware
estimation and model generation, Tech. Rep. COMPLEX/OFFIS/P/D2.4.2/1.0,
COMPLEX project deliverable (Jan. 2012). URL http://complex.offis.de/docs/33
[8] D. Helms, G. Ehmen, W. Nebel, Analysis and Modeling of Subthreshold Leakage of
RT-Components under PTV and State Variation, Proceedings on International
Symposium on Low Power Electronics and Design.
[9] Kai Hylla, Philipp A Hartmann, Domenik Helms and Wolfgang Nebel. Early Power &
Timing Estimation of Custom Hardware Blocks based on Automatically Generated
Combinatorial Macros. In 16. Workshop Methoden und Beschreibungssprachen zur
Modellierung und Verifikation von Schaltungen und Systemen (MBMV'2013).
Rostock, Germany, March 2013.
[10] C. Brandolese, G. Palermo, W. Fornaciari, H. Posadas, F. Herrera, P. Penil, E. Villar,
F. Ferrero, R. Valencia, B. Vanthournout, Final report on embedded software
estimation and model generation, Tech. Rep. COMPLEX/PoliMi/R/D2.2.2/1.0,
COMPLEX project deliverable (Jan. 2012). URL http://complex.offis.de/docs/31
[11] C. Brandolese, W. Fornaciari, Software Energy Optimization Through Fine-Grained
Function-Level Voltage and Frequency Scaling, in: International Conference on
Hardware/Software Codesign and System Synthesis, CODES+ISSS’2012, Tampere,
FI, 2012.
[12] D. Lorenz, P. A. Hartmann, K. Grüttner, W. Nebel, Non–invasive Power Simulation at
System–Level with SystemC, in: International Workshop on Power and Timing
COMPLEX/OFFIS/R/D6.2.5/1.0 Public
Final publishable summary report
Page 46
Modeling, Optimization and Simulation, PATMOS’2012, Newcastle upon Tyne, UK,
2012.
[13] D. Lorenz, K. Grüttner, N. Bombieri, V. Guarnieri, S. Bocchio, From RTL IP to
Functional System-Level Models with Extra-Functional Properties, in: International
Conference on Hardware/Software Codesign and System Synthesis,
CODES+ISSS’2012, 2012.
[14] Emmanuel Vaumorin, Bart Vanthournout, Sara Bocchio, Davide Quaglia, Fernando
Herrera, Pablo Peñil del Campo, Eugenio Villar, Kai Hylla, Tiemo Fandrey, Philipp A.
Hartmann, Final report and tools on virtual system generation, Tech. Rep.
COMPLEX/MDS/R/D2.5.3/1.0, COMPLEX project deliverable (December 2012).
[15] S. Bocchio, P. A. Hartmann, D. Lorenz, D. Quaglia, Final report and tools on platform
IP components estimation and model generation, Tech. Rep. COMPLEX/ST-
I/P/D2.3.2/1.1, COMPLEX project deliverable (May 2012). URL
http://complex.offis.de/docs/34
[16] SystemC Network Simulation Library v.2 (2012). URL
http://sourceforge.net/projects/scnsl
[17] P. Sayyah, M. Lazarescu, D. Quaglia, E. Ebeid, S. Bocchio, A. Rosti, Network-aware
Design-Space Exploration of a Power-Efficient Embedded Application, in:
International Conference on Hardware/Software Codesign and System Synthesis,
CODES+ISSS’2012, Tampere, FI, 2012
[18] C. Ykman-Couvreur, P. A. Hartmann, G. Palermo, F. ColasBigey, L. San, Run-time
Resource Management based on Design Space Exploration, in: International
Conference on Hardware/Software Codesign and System Synthesis,
CODES+ISSS’2012, Tampere, FI, 2012
[19] Website. www.acceleo.org Nov., 2010.
[20] OMG. MOF Model to Text Language. Jan., 2008.
[21] Kim Grüttner, Report on the Wider Societal Implications of the Project, Tech. Rep.
COMPLEX/Partner Name/R/D6.2.6/1.0 COMPLEX project deliverable (March 2013)
[22] Adam Morawiec, Ana Pinzari, Final Report on Standardization and Dissemination
Activities, Tech. Rep. COMPLEX/ECSI/R/D5.2.3/1.0 COMPLEX project deliverable
(March 2013).
[23] Open Virtual Platforms, Virtual platforms for software development, URL:
http://www.ovpworld.org
[24] N. Bombieri, F. Fummi, and G. Pravadelli, Automatic Abstraction of RTL IPs into
Equivalent TLM Descriptions, IEEE Transactions on Computers, vol. 60 , n. 12 ,
2011, pp. 1730-1743.
[25] EDALab, HIFSuite web site, http://www.hifsuite.com
[26] P. Sayyah, F. Stefanni, M. Lazarescu and D. Quaglia, SystemC model generation for
realistic simulation of networked embedded systems, 15th Euromicro Conference on
Digital System Design (DSD), Sept. 5-8, 2012.
[27] IEEE, “IEEE Std 1666 - 2011 IEEE Standard SystemC Language Reference Manual”,
2012. URL http://standards.ieee.org/findstds/standard/1666-2011.html
[28] TheMathWorks, MATLAB and Simulink for Technical Computing, URL:
http://www.mathworks.com