27
N.10 - 15 th July 2015 - Page 1 - Internet of Things (IoT) has become a popular expression. It indicates the network of physical objects or “things”, each characterized by a number of capabilities that include communication, sensing, actuation and computation, designed to interact with the environment and with other things. How different is this from a natural ecosystem? A natural ecosystem is an ensemble that includes all of the living things (plants, animals and organisms) in a given area, interacting with each other, and with the environment. In this perspective, the IoT presents itself as a hybrid ecosystem, i.e. the extension of this ensemble to the presence of mobile, autonomous, communicating devices. There is little doubt that in the near future we will assist at the transformation of our natural ecosystems due to the increasing presence of artificial objects that will coexist and interact with the living organisms in our present ecosystems. How is the device powering issue relevant to this perspective? In a parallel with natural ecosystems, we can envision a scenario where cooperation-competition mechanisms will arise in order to secure the finite amount of energy available. How to deal with this? Let us think about it… (L.G.) ICT - ENERGY L E T T E R S EU Projects ENTRA Pag. 2 LANDAUER Pag. 3 PROTEUS Pag. 4 ICT Energy events Doctoral Symposium Pag. 3 Micro Energy Day 2015 Pag. 5 Summer School 2015 Pag. 6 Conferences ICT 2015 Innovate, Connect, Transform Pag. 7 Scientific Papers Pag. 9 This newsletter is edited with the collaboration of the ICT-ENERGY editorial board composed by: - L. Gammaitoni, NiPS Laboratory, University of Perugia (IT) - L. Worschech, University of Wuerzburg (Ger) - J. Ahopelto, VTT (FI) - C. Sotomayor Torres, ICN and ICREA Barcellona (SP) - F. Marchesoni, University of Camerino (IT) - G. Abadal Berini, Universitat Autònoma de Barcelona (SP) - D. Paul, University of Glasgow (UK) - G. Fagas, Tyndall National Institute University College Cork (IR) - S. Lombardi, NiPS Laboratory, University of Perugia (IT) - J.P. Gallagher, Roskilde Univ. (DK) - V. Heuveline, Ruprecht-Karls- universitaet Heidelberg (Ger) - A.C. Kestelman, Barcelona Supercomputing Center (SP) - D. Atienza, EPFL (CH) - D. Williams, Hitachi Europe limited (UK) - K. Eder, University of Bristol (UK) - K. G. Larsen, Aalborg Universitet (DK) - F. Cottone, NiPS Laboratory, University of Perugia (IT)

It indicates the network of physical objects or ICT ... · N. 1 0 th- 15 J u ly 2015 - P a g e 1 - Internet of Things (IoT) has become a popular expression. It indicates the network

  • Upload
    buidieu

  • View
    214

  • Download
    2

Embed Size (px)

Citation preview

N . 1 0 - 1 5 t h J u l y 2 0 1 5

- P a g e 1 -

Internet of Things (IoT) has become a popular expression.

It indicates the network of physical objects or “things”, each

characterized by a number of capabilities that include communication, sensing, actuation and

computation, designed to interact with the

environment and with other things. How different is

this from a natural ecosystem?

A natural ecosystem is an

ensemble that includes all of

the living things (plants,

animals and organisms) in a

given area, interacting with

each other, and with the

environment. In this perspective,

the IoT presents itself as a hybrid

ecosystem, i.e. the extension of

this ensemble to the presence of

mobile, autonomous, communicating devices. There is little

doubt that in the near future we will assist at the transformation

of our natural ecosystems due to the increasing presence of

artificial objects that will coexist and interact with the living

organisms in our present ecosystems. How is the device

powering issue relevant to this perspective? In a parallel with natural

ecosystems, we can envision a scenario where cooperation-competition

mechanisms will arise in order to secure the finite amount of energy

available. How to deal with this? Let us think about it… (L.G.)

ICT-ENERGYL E T T E R S

EU Projects

ENTRA – Pag. 2

LANDAUER – Pag. 3

PROTEUS – Pag. 4

ICT Energy events

Doctoral Symposium – Pag. 3

Micro Energy Day 2015 – Pag. 5

Summer School 2015 – Pag. 6

Conferences

ICT 2015 Innovate, Connect,

Transform – Pag. 7

Scientific Papers

Pag. 9

This newsletter is edited with the collaboration of the ICT-ENERGY editorial board composed by: - L. Gammaitoni, NiPS Laboratory,

University of Perugia (IT)

- L. Worschech, University of Wuerzburg

(Ger)

- J. Ahopelto, VTT (FI)

- C. Sotomayor Torres, ICN and ICREA

Barcellona (SP)

- F. Marchesoni, University of Camerino

(IT)

- G. Abadal Berini, Universitat Autònoma

de Barcelona (SP)

- D. Paul, University of Glasgow (UK)

- G. Fagas, Tyndall National Institute

University College Cork (IR)

- S. Lombardi, NiPS Laboratory,

University of Perugia (IT)

- J.P. Gallagher, Roskilde Univ. (DK)

- V. Heuveline, Ruprecht-Karls-

universitaet Heidelberg (Ger)

- A.C. Kestelman, Barcelona

Supercomputing Center (SP)

- D. Atienza, EPFL (CH)

- D. Williams, Hitachi Europe limited

(UK)

- K. Eder, University of Bristol (UK)

- K. G. Larsen, Aalborg Universitet (DK)

- F. Cottone, NiPS Laboratory,

University of Perugia (IT)

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 2 -

ENTRA project workshop on Energy-Aware Software Development

Malaga, 6-7, May 2015

The EU ENTRA project (http://entraproject.eu/) aims to promote energy-aware software development

using program analysis and modelling of energy consumption in computers. ENTRA is one of the FET MINECC

cluster of projects investigating how to achieve energy efficient ICT, and the ENTRA coordinator Roskilde

University, Denmark, is a partner in the ICT-Energy Coordination Action (http://www.ict-energy.eu/), as is

ENTRA partner the University of Bristol, UK.

A workshop in the historic city of Malaga on 6-7 May, sponsored by the ENTRA project, combined

presentations of ENTRA project results with invited talks from outside speakers. There are many ICT

applications where reduction of energy consumption is a critical design goal. The invited speakers covered

some of these, including wearable bio-medical devices (Hossein Mamaghanian from EPFL, Switzerland), a

topic targeted by the EU PHIDIAS project. Other applications presented by invited speakers were data

centres (José Manuel Moya from Universidad Politécnica de Madrid) and cloud computing in e-health (José

Ayala from Universidad Complutense de Madrid). There were also presentations from Sudipta Chattopadhyay

(Linköping University, Sweden) describing methods for statically identifying energy hotspots in programs

such as smartphone apps that drain the phone's battery, and from Maurizio Morisio (Politecnico di Torino,

Italy) who presented an agenda for energy-efficient software in green computing.

The overall challenge of energy-aware software development requires tools to bridge a large gap: the gap is

between hardware, where energy is consumed, and high-level programming languages and software

abstractions used in modern software engineering, which in many ways hide resource usage from the

programmer. The ENTRA project researchers presented recent progress in building such tools. IMDEA

Software researchers described recent advances in the CiaoPP resource analysis tools developed at IMDEA

Software. Handling energy analysis of multi-threaded programs was dealt with by researchers from Roskilde

University, Denmark and Bristol University, UK. Energy models derived from hardware, at different levels of

detail and abstraction, were presented by Bristol University, and an approach for modelling energy directly

from high-level source code was presented by Roskilde University. IMDEA Software researchers described

techniques for energy-optimal scheduling of tasks on multi-core computers using genetic algorithms, and

Roskilde University researchers presented a framework for probabilistic analysis of resource usage.

The ENTRA project cases studies are carried out using hardware produced by the industrial partner in the

ENTRA project, XMOS Ltd, UK. This hardware is used for embedded systems especially in automotive and

digital media applications. XMOS staff explained the energy-saving features of its latest chip and the

associated compiling techniques.

In summary, the combination of ENTRA research results along with invited talks stimulated discussion both

at the technical level and around the wider implications and applications of energy-efficient ICT and

energy-aware software engineering. (J.P. Gallagher)

The full workshop programme can be seen at

http://entraproject.eu/programme-entra-workshop-malaga-6-7-may-2015

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 3 -

Doctoral Symposium in Bristol on Sept. 16th 2015

The Barcelona Supercomputing Center is organizing a

Doctoral Symposium to be held in Bristol, on Sept. 16th,

2015.

The Doctoral Symposium is a part of the ICT-Energy

Clustering/Networking workshop 2015 (www.ict-

energy.eu). The event, organized in by the University of

Bristol, will be held on 14-16 September.

This symposium is a great opportunity for PhD students to present their thesis work to a broad audience in

the ICT community from both industry and academia. The forum may also help students to establish

contacts for entering into research related to minimization of energy from various layers. In addition,

representatives from industry and academia get a glance of state-of-the-art in the energy minimization

field.

Topics of interest include, but are not limited to:

- Power- and thermal-aware algorithms, software and hardware

- Sensing and monitoring - Power and thermal behavior and control

- Data centers optimization

- Smart grid and microgrids

- Low-power electronics and systems

- Power-efficient multi/many-core chip design

If you want to know more, please download the Symposium brochure here.

LANDAUER Project consortium meeting in Paris

A consortium meeting of the European project LANDAUER has been held in Paris on June 11-12.

During the meeting the five laboratories that

compose the consortium presented recent

progresses and discussed the steps toward the

end of the project in 2015.

For latest news and information, please visit

the LANDAUER project website

(www.landauer-project.eu).

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 4 -

The PROTEUS project launched

PROTEUS is a project funded under

the H2020 framework program for

research of the European

Commission.

It investigates the smart integration

of chemical sensors based on

carbon nanotube, MEMS based

physical sensors together with a

cognitive engine providing on the fly reconfigurability. Produced deviced will be tested in the context of

water monitoring.

PROTEUS mix competences from integrated smart systems area, Internet of Things, cloud based computing,

long range wireless sensors in the field of water utilities.

NIPS Laboratory group is part of the 9 partners of the PROTEUS consortium, which met in IFSTTAR on 10th

and 11th of February 2015 in Marne la Vallée, close to Paris, to kick-off the project.

This meeting was the opportunity to present the sense city testbed to the consortium. This testbed will

allow lab scale validation of PROTEUS solutions before the deployment in te field.

NiPS laboratory research group at Physics Department at University of Perugia, will carry out the

important research work of design, study, prototyping ant validation of novel hybrid energy harvesting

systems that will add full self-powering capability to wireless sensor network for water quality monitoring.

PROTEUS will run during 36 months, until January 2018.

http://www.proteus-sensor.eu/

(F. Cottone)

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 5 -

Micro Energy Day 2015

Discover smart solutions for mobile electronic energy needs.

Institutions from across Europe celebrated the Micro-Energy Day, during the European Sustainable Energy

Week, to raise awareness about energy use in mobile electronic devices. The Micro-Energy Day, proposed

since 2010 by NiPS Laboratory (www.nipslab.org) at University of Perugia to disseminate a new conceptual

approach to renewable energies at microscales, has grown, involving a larger number of partners all over

Europe and reaching students, families as well as politicians and media.

The 2015 edition of the Micro Energy Day, organized by the partners in the EU funded project ICT-Energy

(www.ict-energy.eu) coordinated by NiPS Lab, included several demonstrations in different European

countries.

In Perugia (Italy) a dissemination activity, held on 20th of

June, made use of several demonstrative devices: from

crystal radios, which can operate without power sources

apart from the collected signals, to small radios operated

by solar cells or thermoelectric generators, to wireless

energy transmitters. Together with the NiPS group, the

local amateur radio community and representatives of the

Perugia Electronic Engineering Department discussed with

the local audience and with the amateur community all

around the world the relevance of the scientific research

on microenergies.

Barcelona Supercomputing Center (Spain) organized a Live Twitter Chat in the ICT-Energy Twitter channel

on the 18th of June, using the hashtag #LessEnergyICT, while discussing with local visitors to the

MareNostrum supercomputer energy issues and the solutions proposed to solve them. The chat was focused

on the energy expenditures in Information and Comunication Technology (from small computing devices to

the exascale supercomputers).

In Cork (Ireland) school children from primary school took part

in a number of fun and interactive demonstrations designed to

encourage us all to become more energy aware, organized by

researchers of the Tyndall Institute. Very rare to see kids asking

so many questions at once!

Moreover, on 18-19 in Heidelberg (Germany), there were

specific lessons introducing the microenergy approach and the

theme „renewable microenergies physics“ to secondary level

pupils at the Ernst-Sigle-Gymnasium in Kornwestheim.

According to the International Energy Agency, consumer

electronics and computer equipment represent 15% of global

residential electricity consumption – and this figure is set to

triple if energy efficiency is not improved. Micro-Energy Day

seeks to highlight this issue and showcase projects under the ICT

Energy umbrella. You’re invited to be part of it!

Further information is available in the Micro-Energy Day website

(microenergyday.eu/) and ICT Energy website (www.ict-

energy.eu).

(S. Lombardi)

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 6 -

NiPS Summer School 2015

ICT-Energy

Energy consumption in future ICT devices

Fiuggi (Italy), 7-12 July 2015

The sixth edition of NiPS Summer School, organized by the Noise in Physical System Laboratory at University

of Perugia as part of the European FET Proactive Coordination Action ICT-Energy (www.ict-energy.eu), was

held in Fiuggi (www.nipslab.org/summerschool).

The 40 participants (graduate students, post-docs and

young researchers) attended 4 thematic groups of

lectures, aimed at teaching the bases of the science of

efficient ICT:

1) Basic on the physics of energy transformations at

micro and nanoscales

2) Introduction to energy harvesting and distributed

autonomous mobile devices

3) Software and energy aware computing

4) High performance computing and systems

Participants to the school had the opportunity to widely discuss with teachers, coming from the ICT-Energy

project Consortium, and colleagues, and to illustrate their research results during a evening Poster Session.

For more information and the slides of the lectures, please visit www.nipslab.org/summerschool2015.

Lectures and social events at NiPS2015 Summer School on ICT-Energy.

Organized by: Supported by:

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 7 -

ICT 2015 Innovate, Connect, Transform

The EC event dedicated to ICT – Lisbon 22-23 October, 2015

European Commission from time to time promotes a

large event dedicated to the world of ICT. The next

one (entitled with little fantasy ICT 2015) will happen

next October 22-23 in Lisbon, Portugal. This is a good

opportunity to do networking activities with other

participants, to interesting debates in the associated

conference, to hear the latest news on the European

Commission's new policies and initiatives with regard

to R&I in ICT and find information about funding

opportunities from the European Union.

The ICT-Energy community has submitted a proposal

for organising a “Networking Session”. Networking

sessions are informal discussion groups organised by

stakeholders participating in the ICT 2015 event. Their

goal is to enable a broad and multi-disciplinary

dialogue, help participants find partners for a

collaborative relationship enlarge and enhance

stakeholder communities and better connect research

to innovation for a Digital Europe.

ICT-Energy Networking Session proposal is entitled: ICT-Energy: decreasing energy consumption toward

zero-power devices.

The objective of the proposed session is twofold:

1) to bring together people from the world of research with people form the world of industry,

interested in lowering energy consumption in ICT devices. They often speak two different languages

and have different perspectives. In this session we will hear from both community in short, focused

communications.

2) to raise awareness in the ICT people (at large) in the importance of decreasing energy dissipation in

the future of ICT devices. This strategic bottleneck is often overlooked when people think at the

future of ICT. We need to put under focus the connection between the need for powering mobile

devices (in the internet-of-things scenario) with the need for decreasing heat production in future

microprocessors. Both these aspects rely on the science of energy transformation at micro and nano

scale. By addressing this very issue we will open novel perspectives for exascale computers and

micro wireless autonomous sensors at the same time.

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 8 -

If approved, the ICT-Energy Networking Session will have

the following format:

- Opening with a 2 min video on the role on energy

on the functioning of ICT devices.

- Then a speaker will briefly (5 min) introduce the

two sides of this topic: excess heat production in

microprocessors and low battery life for mobile

micro devices.

- Podium confrontation: one representative from industry and one from research will illustrate in a 5

min speech the 3 main “wishful thoughts” (industry rep) and the 3 main “things we know” (research

rep).

- Follows a question-and-answer session, much like in a political debate, with public, moderated by a

professional journalist/science communicator.

- During the session there will be a corner installation with video boot “speak-up your view”. Here

everybody has a 3 min speech opportunity for recording a message on this topic. The messages will

be subsequently made available through the web site of the (FET Proactive) ICT-Energy coordination

action (www.ict-energy.eu).

Who should attend the ICT-Energy Networking Session?

The main target is represented by ICT people coming both from industry and from research, generically

interested (or curious) about the problem of energy consumption in ICT devices and thus in their powering

problem.

Specifically we plan to involve people from the following areas:

- low-power microprocessor design

- microprocessor heat dissipation and cooling

- energy harvesting system design

- wireless sensor networks

- mobile communications

- energy aware software

- energy related material science

- green ICT world

ICT 2015 website: https://ec.europa.eu/digital-agenda/en/ict2015-innovate-connect-transform-lisbon-

20-22-october-2015

For further information and updates on the ICT-Energy Networking Session stay tuned to the ICT-Energy

project website (www.ict-energy.eu).

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 9 -

Towards super computers: EU project Exa2Green improves

energy efficiency in high performance computing

Engineering Mathematics and Computing Lab (EMCL),

Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Germany

Steinbeis-Europa-Zentrum, Steinbeis Innovation gGmbh, Germany

Scientific Computing Group, Department of Informatics, Hamburg University, Germany

High Performance and Computing Architectures (HPCA), University Jaume I, Castellon, Spain

Swiss National Supercomputing Centre (CSCS), ETH Zurich, Switzerland

IBM Research – Zurich, Foundations of Cognitive Solutions, Switzerland

Institute for Meteorology and Climate Research, Karlsruhe Institute of Technology, Germany

Abstract—The FP7-funded Exa2Green project is paving the

road to Exascale computing by improving energy efficiency in high

performance computing (HPC). Slowly approaching the end of the

project, the team can show quite remarkable results.

I. INTRODUCTION

EXASCALE computers, so-called supercomputers of our

future, are machines that will be capable of performing at least

1018

floating point operations per second. The work rate at

Exascale is inconceivably immense, offering the prospect of

transformative progress in energy, national security, the

environment, our economy, and fundamental scientific questions.

However, the path to Exascale involves numerous complex

challenges, with a key one being power consumption. The FP7-

funded Exa2Green project (www.exa2green-project.eu) has

taken up this challenge and is developing a radically new energy-

aware computing paradigm and programming methodology for

Exascale computing. The 3-year research project started in

November 2012.

Comprising an interdisciplinary research team of High

Performance Computing (HPC) experts from Germany,

Switzerland and Spain, the Exa2Green team focuses on three

main activities: first, developing tools for measuring the

performance and energy consumption of computations; second,

analysing existing, widely used computational kernels and

developing new energy-efficient algorithms; and finally,

optimising a compute-intensive climate model to achieve a

considerable reduction of energy consumption in climate

simulations. For this third element, Exa2Green is using the

COSMO-ART weather forecast model as an example of an

intensive simulation, whose energy profile is currently far from

optimal. In the following, we highlight some aspects of our

research in the different fields of work.

II. DESIGN OF TOOLS FOR POWER- AND ENERGY

ANALYSIS ON HPC SYSTEMS

University of Hamburg focused on designing tools for power and

energy analysis on HPC systems. A first goal was achieved with

the design of an integrated power-performance analysis

framework to trace and analyze the power and energy

consumption made by parallel scientific applications. This

framework comprises PMLib, a well-established package for

investigating power usage in HPC applications [4]. Its

implementation provides a general interface to utilize a wide

range of wattmeters, including i) external devices, such as

commercial

PDUs, WattsUp? Pro .Net, ZES Zimmer LMG450, etc., ii)

internal wattmeters, directly attached to the power lines leaving

from PSU, iii) commercial DAS from National Instruments (NI)

and, iv) integrated power measurement interfaces such as Intel

RAPL, NVIDIA NVML, IPMI, etc. The PMLib client side

provides a C library with a set of routines in order to measure the

application code. The traces obtained can be integrated into

existing profiling and tracing frameworks such as

Extrae+Paraver or VampirTrace+Vampir. On the top of this

framework, we have incorporated two useful plug-ins: a module

to detect power-related states and a tool to automatically detect

power bottlenecks [5, 6].

Fig. 1: Graphical representation of the power-performance analysis framework.

Additionally, we have desgined ArduPower a small, low-cost

and accurate DC wattmeter, specifically designed for HPC

infrastructures. ArduPower comprises of two basic hardware

components: a shield of 16 current sensors and an Arduino Mega

2560 processing board. This new wattmeter has been conceived

as a low-cost DAS to measure the instantaneous DC power

consumption of the internal components of computing systems,

wherever the power lines are accessible. In general, it offers a

spatially fine-grained power measurement by providing 16

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 1 0 -

channels to monitor power consumption of several parts, e.g.,

mainboard, HDD and GPU cards etc. simultaneously, with a

sampling rate varying from 480 to 5,880 Sa/s. The total

production cost of the wattmeter is approximately 100 €, so it

can be easily accommodated in moderate-/large-scale HPC

platforms in order to investigate power consumption of scientific

applications.

Fig 2.: ArduPower wattmeter comprising an Arduino Mega 2560 board and a power sensing shield with 16 current sensros.

III. SETUP FOR INNOVATIVE ENREGY EFFICIENT

ALGORITHMIC KERNELS

IBM Research – Zurich investigated power consumption of

several elementary computational kernels, the so-called

computational dwarfs. The list of analysed kernels ranges from

dense to sparse linear algebra like sparse matrix-vector

multiplication (SpMV) and conjugate gradient (CG) method, as

well as to spectral methods such as the fast Fourier transform [4,

5, 6]. All the kernels have been analyzed on modern HPC

architectures, such as the IBM Blue Gene/Q, the IBM POWER7

and the new IBM POWER8 system.

As an example, in Figure 1 we report measurements and

subsequent classification analysis for the SpMV, one of the most

widely used kernels to solve PDEs and modern engineering

problems. On top of these analysis, IBM Research - Zurich

developed accurate models for the characterization and time-

power-energy prediction of the same kernels, tested on multicore

architectures.

Fig. 1: SpMV performance analysis and classification, for different sparse matrices on IBM POWER8. Each color corresponds to a different class of

matrices. For more details see [4].

As an example, in Figure 2 we show a model of the power

consumption of the most critical sub-kernels of the CG

algorithm. These models represent not only a fundamental utility

tool for cluster and cloud systems (e.g., to predict costs and

optimize placement of jobs on the nodes), but also consist in a

solid base for the subsequent optimization of the kernels.

Fig. 2: Net power cost prediction of the main sub-kernels used during each

iteration of the CG algorithm on IBM POWER7. For more details see [5] Left: 1 thread per core. Right: 4 threads per core.

IV. DEVELOPMENT OF ENERGY-AWARE NUMERICAL

LINEAR ALGEBRA LIBRARIES

The researchers from the HPCA group (University Jaume I,

Spain) focused on developing techniques for energy conservation

in dense and sparse linear algebra problems. One of the primary

goals of this work consisted in assessing the performance, power

consumption and energy efficiency of the CG method on a large

variety of architectures that form the landscape of high

performance computing, such as general-purpose multicore

processors, graphics processing units (GPUs), and low-power

digital signal processors (DSPs) [8]. From this study, we

observed that the performance-per-power unit of the many-core

GPUs from NVIDIA can be matched by several low-power

devices, such as those Intel (Atom), ARM (Cortex-A9) and

Texas Instruments (C66); and even outperformed by more recent

architectures from ARM (Cortex-A15). In addition, while GPUs

deliver high energy efficiency with remarkable performance, the

less power-demanding embedded processors exhibit a much

lower dissipation and/or smaller memories. This reduces the

appeal of these inexpensive architectures for general-purpose

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 1 1 -

computing, but makes them suitable options for certain

applications. Finally, despite the fact that conventional general-

purpose processors attain neither the throughput of GPUs nor the

performance-per-watt of the low-power systems, they offer an

interesting balance between these two extremes.

Additionally, this effort targeted the efficient exploitation of

task-parallelism, via the OmpSs framework, during the execution

of the multilevel preconditioned CG sparse linear solver

underlying ILUPACK (Incomplete LU Package). For this

purpose, specialized implementations were developed for Non-

Uniform Memory Access (NUMA) platforms and many-core

hardware co-processors based on the Intel Xeon Phi and GPUs

[9]. One key advantage of the new parallel data-flow solver is

that the numerical method/software is decoupled from the

runtime. This feature was exploited to investigate the effect of

different options of the runtime, in particular, schedules that may

favor data locality as well as the integration of energy saving

mechanisms into the runtime scheduler.

Finally, as part of a third line of work, this WP ported the

SuperMatrix runtime task scheduler for dense linear algebra

algorithms to the many-threaded Intel Xeon Phi, paying special

attention to the performance and energy profile of the solution.

From the performance perspective, this work focused on the

implications of balancing task- and data-parallelism. From the

energy-aware point of view, a mechanism was introduced in the

runtime in order to reduce power consumption during the idle

periods inherent to task parallel executions, achieving significant

energy savings [10].

V. ADVANCING HARDWARE-AWARE ALGORITHMS

Recent work of the partner Heidelberg University addresses the

improvement of the energy consumption of linear systems

solvers, in particular for multigrid methods. Multigrid solvers

belong to the most efficient numerical methods for solving

symmetric positive definite linear systems. The computational

complexity is optimal with O(n) for sparse systems with n

unknowns. In multigrid methods, typically a sequence of

problems with decreasing size appears. To improve the energy

consumption, we developed a hardware-aware implementation

that manages the number of active compute nodes with respect to

the size of the linear equation system. It allows to decrease the

number of active compute nodes until the lowest grid level is

reached. Once the system is solved on the lowest grid level, the

number of active compute nodes is increased again,

simultaneously to the prolongation of the solution. The energy

improvement then stems from the fact, that for each multigrid

level, the optimal hardware configuration is chosen. The

efficiency of a multigrid method strongly depends on the

smoothing method employed. The role of the smoother is to

remove high frequency error contributions from the solution.

Classical relaxation schemes such as Jacobi or Gauss-Seidel

iteration and their damped variants are often used as smoothers.

In the context of high-performance computing (HPC), scalability

of all building blocks of the multigrid solver is crucial for good

parallel performance. At the same time, the smoothing properties

must be maintained to sustain the efficiency.The developed

routines and solver libraries are dedicated to reduce the energy

consumption of the solving process. This is achieved by

leveraging the power-saving techniques provided by the

hardware. In addition to the dynamic adjustment of the compute

node activity within a mutligrid cycle, the smoothers can be

executed on the devices in GPU-accelerated systems. The goal is

to exploit the superior ratio of compute performance per

consumed energy which GPUs can yield compared to CPUs [7].

But often, this demands for adapted algorithms tailored to the

hardware. One aspect is the introduction of asynchronism in the

algorithms. If poor performance is encountered in parallel

algorithms, the reason is often the gap between the parallelism

provided by modern hardware, and the parallelism inside the

algorithms. A higher degree of asynchronism may result in

additional work, but due to the more efficient hardware use and

the decreased communication, the runtime performance as well

as the energy consumption can benefit from this approach [8,9].

In order to fully exploit the opportunities for parallelization

and asynchronous techniques in distributed systems, we

seamlessly integrated our developments in the general purpose

parallel finite element package HiFlow3 [10]. This makes the

outcomes usable for a broad scientific and industrial community

and puts strong emphasis on the aspect of energy efficiency in

the area of high performance computing.

VI. SHOWCASE FOR ENERGY-OPTIMISED AEROSOL

CHEMISTRY PACKAGES

Swiss National Supercomputing Centre (CSCS) and

Karlsruhe Institute of Technology (KIT) focus on a regional

scale model system consisting of the COSMO weather forecast

model of the German weather service (DWD) heavily used in

Europe, online coupled with the chemical transport model ART

(Aerosols and Reactive Trace gases), developed at KIT. The

coupled chemistry-meteorology COSMO-ART model enables

the simulation of atmospheric chemistry for studying air quality

and the calculation of the interactions of gases and aerosols with

the state of the atmosphere on the regional to continental scale.

Nevertheless, since a large number of additional tracers and

processes have to be considered, the air quality prediction model

COSMO-ART is much more demanding than COSMO alone and

thus severely limited in terms of applicability and expensive in

terms of energy consumption.

A baseline benchmark of the application was initially

produced on the current, state-of-the-art target platform, a dual

socket Ivybridge cluster at CSCS, to provide a reference

baseline. The assessment of the energy footprint and

performance profile of COSMO-ART was realised on various

HPC platforms (Intel Xeon Ivy Bridge, Xeon Sandy Bridge,

Xeon Westmere) [15]. So far, two different frameworks were

deployed on the testbed HPC systems to analyse the

performance, power dissipation and energy consumption of the

baseline execution, namely the E3METER metering framework

and the high resolution power-performance tracing framework

designed by the University of Hamburg. Profiling has given

detailed insights into power and energy requirements and has

been particularly useful to detect power bottlenecks, basically

constituted by inefficient waiting methods. On the other hand,

the usage of energy-friendly MPI techniques has demonstrated

that it is possible to reduce the energy-to-solution ETS while

maintaining (or even decreasing) the time-to-solution (TTS).

One computationally highly intensive part within the air

quality model COSMO-ART is the numerical simulation of the

atmospheric gas-phase chemistry and more especially the

solution of the chemical kinetics. Therefore, the TTS and ETS

improvements of the port of the Kinetic PreProcessor (KPP) to

multi-threaded CPUs and accelerators (KPPA) were investigated

with a standalone test harness for atmospheric chemistry

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 1 2 -

simulations. Besides, a comparative study on Rosenbrock

integrators was conducted in order to investigate the potential of

algorithmic changes demonstrated in the PRACE 2IP WP8

within the solution of the chemical reaction kinetics, and identify

an optimal integrator for chemical kinetics.

Other approaches to optimise the energy footprint and

performance of the atmospheric modeling system are currently

considered, such as:

the use of evolving COSMO optimisations within the

PASC initiative,

the use of single-precision arithmetic,

the use of an optimised library for algebraic kernels in

the atmospheric chemical kinetics solver,

the use of transactional memory,

the use of CPU DVFS governors for memory-bounded

code sections,

the use of parallel I/O libraries and asynchronous I/O,

the use of an optimal MVAPICH2 execution mode

(busy-wait: polling or idle-wait: blocking) or

alternatively, the use of an optimal OpenMPI execution

mode (busy-wait: aggressive or idle-wait: degraded), the use of an optimal MPI/OpenMP hybrid parallelism

on several HPC platforms (e.g. Piz Dora at CSCS and

POWER8 at IBM),

the feasibility study on running on low power platforms,

such as ARM processors.

Current achievements and prospective changes indicate that

we are getting close to the ultimate target of a 5-fold reduction in

E2S, with respect to the baseline code. Such an implementation

would allow the community to investigate critical questions at

higher resolutions and over longer periods, at reduced cost to the

environment.

ACKNOWLEDGMENTS

Exa2Green is part of the 'FET (Future and Emerging

Technologies) Proactive Initiative: Minimising Energy

Consumption of Computing to the Limit'. We acknowledge the

financial support of the European Commission under grant

agreement no. 318793.

TRADEMARKS

IBM, the IBM logo, ibm.com, IBM POWER, IBM Blue

Gene, are trademarks or registered trademarks of International

Business Machines Corporation in the United States, other

countries, or both. Other product and service names might be

trademarks of IBM or other companies.

REFERENCES

[1] P. Alonso, R. M. Badia, M. Barreda, M. F. Dolz, J. Labarta, R. Mayo and

E.S. Quintana-Ortí and R.Reyes. Tools for power and energy analysis of parallel scientific applications. 41st International Conference on Parallel

Processing–ICPP, pages 420–429, 2012.

[2] hin , , t n, M. F. Dolz, G. Fabregat, R. Mayo

and E.S. Quintana-Ortí. An Integrated Framework for Power-Performance

Analysis of Parallel

Scientific Workloads. 3rd Int. Conf. on Smart Grids, Green Communications and

IT Energy-aware Technologies, pages 114-119, 2013.

[3] , t n, M. F. Dolz, R. Mayo, E. S. Quintana-Ortí. Automatic detection of power bottlenecks in parallel scientific

applicationss. Fourth International Conference on Energy-Aware High

Performance Computing, Computer Science Research and Development, Vol.29(3-4), pages 114-119, 2013.

[4] A. C. I. Malossi, Y. Ineichen, C. Bekas, A. Curioni, and E. S. Quintana-

O tı' Performance and energy-aware characterization of the sparse matrix-vector multiplication on multithreaded architectures. In Proc. of 43rd ICPP,

Minneapolis, MN, USA, Sep. 2014.

[5] A. C. I. Malossi, Y. Ineichen, C. Bekas, A. Curioni, and E. S. Quintana-O tı' Systematic derivation of time and power models for linear algebra

kernels on multicore architectures. Sustainable Computing: Informatics and

Systems (SUSCOM), 2015 [6] A. C. I. Malossi, Y. Ineichen, C. Bekas, A. Curioni. Fast Exponential

Computation on SIMD Architectures. In Proc. of HIPEAC-WAPCO,

Amsterdam NL, 2015 [7] M. Wlotzka, V. Heuveline: Block-asynchronous and Jacobi smoothers for a

multigrid solver on GPU-accelerated HPC systems, submitted to Auto-

Tuning for Multicore and GPU at IEEE MCSoC-15, Turin, Italy, Sep. 23-25, 2015

[8] H. Anzt, S. Tomov, J. Dongarra, V. Heuveline: A block-asynchronous

relaxation method for graphics processing units, J. Parallel and Distrib. Comput., vol. 73 (2013), pp. 1613-1626

[9] M. Wlotzka, V. Heuveline: Energy-aware mixed precision iterative

refinement for linear systems on GPU-accelerated multi-node HPC clusters, to be published in Proceedings of the 26th GI/ITG PARS Workshop,

Potsdam, Germany, May 7-8, 2015

[10] [4] H. Anzt, F. Wilhelm, J. P. Weiß, C. Subramanian, M. Schmidtobreick, M. Schick, S. Ronnas, S. Ritterbusch, A. Nestler, D. Lukarski, E. Ketelaer,

V. Heuveline, A. Helfrich-Schkarbanenko, T. Hahn, T. Gengenbach, M.

Baumann, W. Augustin and M. Wlotzka: HiFlow3: A Hardware-Aware Parallel Finite Element Package, in Holger Brunst, Matthias S. Müller,

Wolfgang E. Nagel and Michael M. Resch, eds., Tools for High

Performance.

[11] "Unveiling the performance-energy trade-off in iterative linear system

solvers for multithreaded processors". J. I. Aliaga, H. Anzt, M. Castillo, J.

Fernández, G. León, J. Pérez, E. S. Quintana. Concurrency and Computation: Practice & Experience, Vol. 27(4), pp. 885-904, 2

[12] "Exposing data dependencies to leverage task-parallelism with OmpSs in

the preconditioned CG method". J. I. Aliaga, R. M. Badia, M. Barreda, M. Bollhöfer, E. S. Quintana, in 26th Int. Symposium on Computer

Architecture and High Performance Computing-SBAC-PAD 2014, Paris

(France), October 2014. [13] “Balancing Task- and Data-Level Parallelism to Improve Performance and

Energy Consumption of Matrix Computations on the Intel Xeon Phi“

Manuel F. Dolz, Francisco D. Igual, Thomas Ludwig, Luis Piñuel, Enrique S. Quintana-Ortí, Computer and Electrical Engineering (CAEE) journal,

2015. To appear.

[14] Performance Computing 2011, pp. 139-151, Springer, 2012 [15] J. Charles, W. Sawyer, M. Dolz, S. Catalán: Evaluating the performance

and energy efficiency of the COSMO-ART model system, Computer

Science - Research and Development, July 2014

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 1 3 -

Power and Energy Characterization of Finite Element

Matrices on IBM Power Martin Wlotzka

1, A. Cristiano I. Malossi

2, Vincent Heuveline

1, Costas Bekas

2

1Interdisciplinary Center for Scientific Computing, Heidelberg University, Germany

2IBM Research – Zurich, Foundations of Cognitive Solutions, Switzerland

Abstract— In this work we investigate the performance and

energy characteristics of different finite element discretizations

arising from the numerical solution of PDEs. We measure and

compare performance of sparse matrix-vector multiplications and

we analyze the behavior obtained for different mesh properties as

the degree of the FE basis polynomial and the level of h-refinement.

This analysis is carried out on the new IBM Power 8 platform.

I. INTRODUCTION

Numerical simulations typically involve the solution of linear

systems of equations, where the linear systems generally arise

from the space (and time) discretization of the underlying

continuous model equations. This is valid also in the case of

nonlinear models, where Newton or other method yields a

sequence of linear problems. The resulting matrices are typically

sparse, in particular when they are derived from finite difference,

finite element or finite volume discretizations of partial

differential equations (PDEs). The properties of the system

matrices greatly affect the power and performance behaviour of

the solution process. In particular, the sparse matrix-vector

multiplication (SpMV) plays a key role since it is the dominating

computational part in each iteration of the widely used Krylov

subspace linear solvers such as conjugate gradient (CG) or

general minimal residual (GMRES) method [6], and it may also

appear in multigrid smoothers.

In [5] the authors investigated the fundamental principles of the

power and performance characteristics of the SpMV. A

classification was established based on an artificial training set of

matrices covering a large variety of sparse matrix structures.

From this classification, a prediction model for power and

performance was derived and successfully validated using real

matrices from different application fields.

In this work, we extent those results focusing on the specific type

of matrices resulting from finite element discretizations of PDEs.

In particular, we aim to analyze and understand the effect of

mesh refinement and polynomial degree, among others, on the

power and energy consumption of the SpMV.

The layout of this work is the following: in Section II, we

illustrate with a prototypical example the origin of the sparsity of

finite element matrices. In Section III, we describe our

experimental setup for the measurements on the Power 8

platform, and in Section IV we present the results and finally

conclusions are summarized in Section V.

II. FINITE ELEMENT MATRICES

In modern engineering simulations, the FEM (see, e.g., [4]) is

widely used to numerically approximate the solution of PDEs. In

this work, we use the Laplace equation as a prototype to measure

the basic features of the FEM, which are driven by the sparse

matrix properties. The Laplace equation in the domain reads

(1)

where is the solution vector. This equation can model simple

engineering problems, such as the temperature distribution on

solid surfaces and volumes. Equation (1) is generally closed by a

proper set of boundary conditions. However, in our analysis we

omit these conditions, as they impact only a few rows, and they

do not affect significantly the resulting matrix structure.

The existence and uniqueness of classical or weak solutions

under appropriate boundary conditions is guaranteed by theory

[1,2,3] and not analyzed here. The FEM seeks to find an

approximate solution as a linear combination of finite element

basis functions with coefficients such that

. (2)

The finite element basis functions are piece-wise polynomials on

a discrete computational mesh modeling the original domain

. For Lagrange elements, the basis functions are defined

through the values at regular points on the mesh cells, like

vertices or midpoints of edges.

Fig. 1: 1-D example with two piece-wise linear Lagrange-type finite element basis functions, so-called “h t fun tions” Each hat functions has the value 1 at

exactly one vertex, and the value 0 outside the adjacent mesh cells. Figure 1 shows a one-dimensional (1-D) example of linear basis

functions. Note that the basis functions have only a small

support, i.e., the region where the function is not zero, and

couplings occur only over adjacent cells. The matrix pattern then

follows from the variational formulation of (1), i.e.,

. (3)

Equation (3) represents a linear system of equations for the

coefficient vector which can be stated as

(4)

and the right hand side vector incorporates any boundary

condition terms which we omitted so far. The resulting matrix

is “stiffn ss m t ix” fo histo i sons, wh n this typ

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 1 4 -

of discretization was used for elasticity problems. is sparse

since the only non-zero entries occur where the basis

functions and share an overlapping support. The sparsity

pattern depends on the structure of the underlying computational

mesh, the finite element type, the number of finite element

variables, the mesh refinement, and the polynomial degree of the

elements.

We create a collection of 384 sparse matrix structures

resulting from different finite element spaces. The precise

definitions are given in Table 1. We use both quadrilateral (quad)

and triangular (tri) meshes for a 2-D square domain, and

hexahedrons and tetrahedrons for a 3-D cube domain. We start

from the coarsest possible meshes 1 and perform successive

uniform mesh refinement (also known as “h- fin m nt”) For

each individual mesh, we use finite elements with a polynomial

degree of to the highest possible degree . The number

of finite element variables is in the 2-D case, and

in the 3-D case. This includes the possibility to

approximate scalar functions and velocity fields. Table 1: Definition of the test matrices used in this work

cell type h-refinement pmax d

quad / tri up to 6 times 7 1 quad / tri up to 7 times 6 1 quad / tri up to 8 times 5 1 quad / tri up to 9 times 2 1 quad / tri up to 10 times 1 1 quad / tri up to 7 times 7 2 quad / tri up to 8 times 3 2 quad / tri up to 9 times 1 2 hex / tet up to 2 times 11 1 hex / tet up to 4 times 7 1 hex / tet up to 5 times 3 1 hex / tet up to 6 times 1 1

hex up to 1 time 10 3 hex up to 3 times 6 3 hex up to 4 times 4 3

III. EXPERIMENTAL SETTING

Our tests are run on one core of the IBM Power 8 [7], using

double-precision real arithmetic. This architecture supports up to

8 hardware threads per core and is equipped with multiple cache

levels. The driver routine, including the CSR-based

implementation of SpMV, is processed using the IBM XL C++

compiler, with the options: -O3 -qstrict -qsmp=noauto:omp. For

simplicity our analysis is restricted to exploit OpenMP thread

parallelism on a single core, using 1, 2, and 4 threads. This does

not represent a limitation for the matrices considered in this

work, as they fit in the memory of the local node. In order to

instrument the code and collect power measurements, we use the

AMESTER library [8]. To collect enough power samples, each

SpMV is repeated many times: the time cost of the auxiliary

repetition loop is negligible and does not affect measurements.

1 The coarsest possible meshes consist of one quadrilateral or two triangles, in the 2-D case, and one hexahedron or five tetrahedrons, in the 3-D case.

IV. RESULTS AND CONCLUSION

Figures 2 and 3 show plots of the measurement data where the

colors indicate the level of h-refinement and the polynomial

degree of the finite element basis functions, respectively. In

general, most of the data points lie very close to each other. This

observation of similar power/energy characteristics is in good

agreement with the fact that our finite element matrices resulted

from a common construction basis. However, we observed a

correlation of the characteristics with the polynomial degree.

(a) – time vs. energy, 1 thread per core

(b) – time vs. power, 1 threads per core

(c) – time vs. energy, 4 threads per core

(d) – time vs. power, 4 threads per core

Figure 2: Performance of the finite element matrices. Colors correspond to the

level of h-refinement. Circles indicate matrices that fit into the cache, rectangles

indicate bigger matrices that are stored in the DDR. In particular the configuration with 4 threads per core benefits

from higher degrees, as it can be observed from the clear color

gradient in Figures 3(c) and 3(d). This is due to the size of the

local dense matrix blocks which become larger for higher order

polynomials, since a larger number of basis functions shares an

overlapping support.

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 1 5 -

(a) – time vs. energy, 1 thread per core

(b) – time vs. power, 1 thread per core

(c) – time vs. energy, 4 threads per core

(d) – time vs. power, 4 threads per core

Figure 3: Performance of the finite element matrices. Colors correspond to the

polynomial degree of the finite element basis functions. Circles indicate matrices that fit into the cache, rectangles bigger matrices that are stored in the DDR. The values presented in Figures 2 and 3 are normalized to the

number of non-zero elements in each matrix. This has been done

to compare the actual cost of each floating-point multiply-add

operation (see [5] for more details). In practical problems,

however, what really matters is the total time- and energy-to-

solution. To this end, we extracted two sets of different matrices

with equal number of rows (which is often regarded the problem

size of an application). For these matrices we solved (4) using

the CG method, and we compare the number of iterations to

reach an absolute tolerance of . Table 2: Time / energy to solution and number of CG iterations for matrices with

equal number of rows.

nrows h p time / SpMV [s] energy / SpMV

[J] CG iter.

16,641 5 4 8.038e-4 1.8437e-3 338

16,641 6 2 3.425e-4 8.3990e-4 210

16,641 7 1 1.885e-4 4.19E-004 158

263,169 7 4 1.321e-2 3.3492e-2 1501

263,169 8 2 6.184e-3 1.8733e-2 818

263,169 9 1 3.326e-3 9.0704e-3 589

Table 2 summarizes the time and energy for one SpMV and the

number of CG iterations. The normalized data do not reflect

other important properties of the matrix like the condition

number, which can greatly affect the behaviour of a solution

scheme. The comparison of the matrices with equal problem size

clearly shows an advantage of lower order finite elements in

terms of time and energy per sparse matrix-vector multiplication.

This is expected since the lower order discretizations yield less

non-zero matrix entries. The number of CG iterations to reach

the desired accuracy also increases greatly when using higher

order finite elements. This is due to the worsened condition

number of the corresponding matrices. Both effects add up to a

clear advantage of lower order discretizations, which should

therefore be used whenever possible.

V. CONCLUSIONS

The experiments and results analyzed in this work suggest that

lower order discretization schemes should be preferred, when

possible, as they require less operations per row, and thus have

shorter execution time, which leads to lower energy consumtion.

This conclusion is valid for standard CSR-implementations with

no SIMD instructions: other more elaborate and architecture

optimized implementations can lead to different results and

should be investigated in future works.

ACKNOWLEDGMENTS

This work was supported by the European Comission as part

of the project Exa2Green under grant agreement no. 318793. We

acknowledge the financial support of the Future and Emerging

Technologies (FET) programme within the ICT theme of the

Seventh Framework Programme for Research (FP7/2007–2013).

TRADEMARKS

IBM, the IBM logo, ibm.com, IBM POWER, IBM Blue

Gene, are trademarks or registered trademarks of International

Business Machines Corporation in the United States, other

countries, or both. Other product and service names might be

trademarks of IBM or other companies.

REFERENCES

[16] L.C. Evans: Partial differential equations, AMS, 1998, Reprint 2008 [17] R. Adams, J.J.F. Fournier: Sobolev Spaces, 2nd ed., Academic Press, 2003

[18] A. Quarteroni, A. Valli: Numerical Approximation of Partial Differential

Equations, Springer, 2008 [19] A. Ern, J.-L. Guermond: Theory and Practice of Finite Elements, Springer,

2004

[20] A.C.I. Malossi, Y. Ineichen, C. Bekas, A. Curioni, E.S. Quintana-Orti: Performance and Energy-Aware Characterization of the Sparse Matrix-

Vector Multiplication on Multithreaded Architectures, in Proceedings of the

43rd Annual International Conference on Parallel Processing, Minneapolis, Minnesota, USA, 2014.

[21] Y. Saad: Iterative Methods for Sparse Linear Systems, 2000

[22] A.B. Caldeira, B. Grabowski, V. Haug, M.-E. Kahle, A. Laidlaw, C.D.

Maciel, M. Sanchez, S.Y. Sung: IBM Power System S822 Technical

Overview and Introduction, IBM, Tech. Rep. REDP-5102-00, August 2014

[23] C. Lefurgy, X. Wang, M. Ware: Server-level power control, in Proc. of the 4th IEEE onf on Autonomi omputing (I A ’07), 2007

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 1 6 -

Dynamics of a C-battery Scale Energy Harvester

V. Nico1, E. Boco

1, R. Frizzell

2, J. Punch

1

1Connect, Stokes Institute, University of Limerick

2Efficient Energy Transfer Dept., Bell Labs, Alcatel Lucent, Dublin, Ireland

Abstract— A “C-battery” scale two Degree-of-Freedom (2-DoF)

nonlinear electromagnetic energy harvester is presented in this

paper. The device employs velocity amplification, achieved through

sequential collisions between two free moving masses, to enhance

output power. The nonlinearities are due to the multiple masses and

the use of magnetic springs between the primary mass and the

housing and between the primary and secondary masses. A

theoretical model and its experimental validation are presented in

this work.

I. INTRODUCTION

In recent years, the development of small and low power

electronics has led to the deployment of Wireless Sensor

Networks (WSNs), which are largely used in military and civil

applications. There is often an abundance of unused ambient

energy in the vicinity of WSNs that can be converted to usable

electrical power through the use of energy harvesting techniques.

Among all the different forms, kinetic energy, available from

ambient vibrations, is one of the most common forms.

Conventional vibration energy harvesters (VEHs) are based on

simple linear spring-mass resonator designs for which the

resonant frequency has to be tuned to the main frequency of the

ambient vibration. The spectra of real vibrations are broad-band,

however, with content predominantly at low frequencies [1].

In the literature, different methods have been suggested to

overcome the problem of narrow bandwidth harvesters. Some

groups have proposed multi-frequency systems [2, 3] or methods

for varying the resonance frequency of piezoelectric beams [4].

Cottone et al. [5] instead showed that the use of nonlinear

bistable systems improves the performance of an energy

harvester. Another approach is to employ multi-DoF

configurations and it has been shown that these systems

outperform single-DoF equivalents [6, 7, 8, 9, 10]. These

approaches focus on increasing the bandwidth of the harvester at

low frequencies, but new methods for improving both the

bandwidth and the power of the harvested energy are required.

Cottone et al. [11] introduced the concept of a velocity amplified

electromagnetic generator (VAEG), which featured a series of

impacting masses. In the following studies [12, 13, 14, 15], it

was shown that within a given volume constraint, the 2-DoF

system performs best and that the nonlinearities, introduced by

the impacts, increased both the power and the bandwidth.

The work presented in this paper introduces the theoretical

and experimental analysis of an improved design of a 2-DoF

VAEG, th t is pp oxim t y th siz of “ -b tt y” In th

following sections, the experimental setup and the results of an

experimental characterisation under sinusoidal excitation are

presented. The nonlinear effects of the magnetic springs and the

consequent shift of the resonance peak are outlined. In the final

section, a theoretical model that can represent the dynamics of

the system is developed.

II. SYSTEM OF INTEREST

The system examined in this paper is shown in Figure 1: its

overall size is comparable to a C-battery: 25.5mm diameter by

57.45mm long, 29.340 cm3 total volume (Figure 2).

Figure 1 - Section of the VAEG prototype

Figure 2 - Illustration of the scale of the harvester with reference to a C-battery

The primary mass (Mass 1) is made from Teflon and acts as

the housing for seven coils, while the secondary mass (Mass 2)

comprises a Halbach stack of 5 NeFeB magnets: 3 axial and 2

radial. The detailed description of the transducer mechanism is in

o o t : “Power Optimization of a C-Battery Scale

Vibrational Energy Harvester”. Opposing magnets are affixed to

the base and cap and inside Mass 1 in order to reduce the energy

losses between Mass 1 and the housing and also between the two

masses.

The magnetic springs introduce a hardening nonlinearity in

the system that enhances the bandwidth of the harvester. The

spacing of the magnets can be varied by changing the height of

the cap and this significantly affects the dynamics of the system.

When the stack of magnets moves, an electrical current is

in u into th oi o ing to F y’s w n th in u

voltage is given by Vemf = Blv, where B is the magnetic field

experienced by the coil of total length l and v is the relative

velocity between the coils and magnets. Since the coils are

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 1 7 -

within the primary mass and the magnets form the secondary

mass, improving the velocity of the secondary mass increases the

output voltage and the output power (Pe ~ v2). The velocity of the

secondary mass can be increased by exploiting the velocity

amplification technique: following an impact, the final velocity

of Mass 2, v2f, is equal to [12]:

where e is the restitution coefficient and the subscript i and f

refer to the initial state before the impact and the final state after

the impact, respectively.

III. EXPERIMENTAL RESULTS

The harvester was tested under sinusoidal excitation at two

different level of acceleration: arms = 0.2 g and arms = 0.4 g (g =

9.81 m/s2). Since the height of the cap can be varied, for this test

a value of h = 50.10 mm was chosen.

The output voltage from the harvester was measured across a

load resistor (RL) and, since the frequency response of the device

varies for the two different excitations, the optimal load

resistance was set before every test.

The output power generated by the device for the two levels

of excitation is shown in Figure 3 over a range of frequencies.

Each discrete data point was measured under sinusoidal

excitation at the corresponding frequency, starting from rest. The

power was calculated using P = V2

rms/RL where Vrms is the rms

voltage measured across the optimal resistive load RL.

Figure 3 - Output power as function of acceleration level and frequency

The response of the system at the lower acceleration (arms =

0.2 g) is reasonably symmetric about the peak value and narrow.

By increasing the acceleration, the system starts to behave

nonlinearly: the peak shifts (from 13.5 Hz at arms = 0.2 g to 15.25

Hz at arms = 0.4 g) and bends to the right due to the hardening

effects of the magnetic springs. This hardening effect is more

predominant at higher accelerations because masses within the

harvester can reach positions closer to the magnets in the cap and

the base, and so experience stronger magnetic repulsive forces

compared to lower acceleration tests. The higher the repulsive

force, the higher the effective elastic constant of the magnetic

springs, which leads to the observed hardening effect. Due to the

increased nonlinearities of the system, the 3 dB bandwidth of the

harvester increases from 2 Hz (arms = 0.2 g) to 4 Hz (arms= 0.4 g).

IV. THEORETICAL MODEL

In order to predict the power output of the harvester, a model

that takes into account the magnetic repulsive force of the

magnets has been developed. A finite element simulation has

been carried to derive the magnetic force between the magnets of

the internal and external magnetic springs. Figure 4 shows the

trend of the simulated force as function of distance and the trend

of the fit, obtained according to Eqn.2:

where

x is the distance between magnets; and

p1, qi are the parameters of the fit.

Figure 4 - Simulated repulsive magnetic force as a function of distance

The dynamics of the system can be represented with three

coupled differential equations: two of which describe the

mechanical motion of the masses, and the third describes the

voltage generated in the coil.

The seven coils have been considered as a single coil with

resistance Rc and inductance Lc equal to the total resistance and

inductance of the seven coils. The Halbach stack of magnets has

been considered as a single magnet with magnetic field B equal

to the average field experienced by the coils.

The three equations that describe the system are:

where:

d1 and d2 are the measured viscous damping

coefficient acting on m1 and m2,

α = Bl/R is the conversion factor,

αVL is the magnetic force due to the transduction

mechanism,

Fint, Fint1, Fmag, Fmag1 are the magnetic forces as

described in Figure 5, and

z1, z2 are the displacement shown in Figure 5.

(1)

(2)

(3)

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 1 8 -

Figure 5 - Schematic model for the 2-DoF harvester: Mass 1 is in white, while

Mass 2 is in gray

The three differential equations have been solved numerically

using the Runge-Kutta algorithm. The trend of the output power

for different excitation frequencies and amplitude is presented in

Figure 6.

Figure 6 - Simulated output power as function of frequency

As in the experimental data presented in Figure 3, the

response of the system changes with increasing acceleration: the

peak at the lower excitation is quite symmetric, while it shifts

and bends to the right for higher amplitudes, due to the nonlinear

hardening effects of the magnetic springs. The simulations shows

also the peak of the secondary mass at f = 15.5 Hz. The power is

higher (~8x) for the theoretical prediction compared to the

measurements, and this is thought to be due to the fact that the

seven coils and the Halbach stack are considered as just one coil

and one magnet with magnetic field equal to the average field

felt by the coils.

V. CONCLUSIONS

The theoretical and experimental characterisation of a

nonlinear 2-DoF v o ity mp ifi VEH t “ -b tt y” s is

presented in this paper.

The device was tested experimentally under sinusoidal

excitation and demonstrated a maximum power output of 1.03

mW at 15.25 Hz for arms = 0.4 g. The theoretical simulation of

the magnetic force and the dynamic model of the system,

described in Section IV, present good agreement with the

frequency response of the system. The model can be used as a

design tool for choosing the correct configuration of magnetic

springs and geometrical parameters in order to maximise the

output power.

Since the harvester performs well at low frequencies, the next

step of this work will be the realisation of a small wearable

device that can be used to harvest human motion, for powering

low-consumption devices.

ACKNOWLEDGMENT

The authors acknowledge the financial support of Science

Foundation Ireland under Grant No.10/CE/I1853 and the Irish

Research Council (IRC) for funding under their Enterprise

Partnership Scheme (EPS). Bell Labs Ireland would also like to

thank the Industrial Development Agency (IDA) Ireland for their

continued support.

REFERENCES

[24] Roundy S et al, “A study of low level vibrations as a power source or

wireless sensor no s” Computer Communications, 26(11),pp. 1131– 144, July,

2003.

[25] Ferrari M et al, “Pi zo t i multifrequency energy converter for power

harvesting in autonomous mi osyst m” Sensor and Actuators A, 142(1), pp.

329-335, March, 2008 [26] Castagnetti D, “Exp im nt modal analysis of fractal-inspired multi-

frequency structures for piezoelectric energy onv t s” Smart Materials and

Structures, 21(9), p. 094009, August, 2012.

[27] Todorov G et al, “Tuning techniques for kinetic mems energy harvest s” Proceedings of the Telecommunications Energy Conference INTELC),

Amsterdam, Netherland, 9-13 October 2011, pp. 1–6.

[28] Cottone et al, “Non in energy h v sting” Physical Review Letter,

02(8), February, 2009.

[29] Petropoulos T et al., “ ms coupled resonators for power generation and

s nsing” Proceedings of Micromechanics Europe, Leuven, Belgium, 5-7 September 2004.

[30] Vullers RJM et al, “L g power amplification of mems harvester by secondary spring and mass ss mb y” Proceedings of the PowerMEMS 2012,

Atlanta, USA, 2-5 December 2012.

[31] Wu H et al, “D v opm nt of a broadband nonlinear two-degree-of-

freedom piezoelectric energy h v st ” Journal of Intelligent Material Systems and Structures, 25(14), pp. 1875–1889, September, 2014

[32] Galchev TV et al, “H v sting traffic-induced vibrations for structural health monitoring of b i g s” Journal of Micromechanics and Microengineering,

21(10), October, 2011.

[33] Ashraf K et al, “Imp ov energy harvesting from low frequency vibrations

by resonance amplification at multiple f qu n i s” Sensors and Actuators A:

Physical, 195, pp. 123–132, June, 2013

[34] Cottone F et al, “En gy harvesting apparatus having improved ffi i n y”

USA Patent 0074162, 2009.

[35] Cottone F et al, “Enh n vibrational energy harvester based on velocity

mp ifi tion” Journal of Intelligent Material Systems and structures, 25(4), pp.

443–451, August, 2013.

[36] O’Donoghu D et al, “A multiple-degree-of-freedom velocity-amplified vibrational energy harvester: Part A: experimental n ysis”. Proceedings of the

ASME 2014 Conference on Smart Materials, Adaptive Structures and Intelligent

Systems, 2, Newport, Rhode Island, USA, 8-10 September 2014.

[37] Nico V et al, “A multiple-degree-of-freedom velocity-amplified vibrational

energy harvester: Part B mo ing” Proceedings of the ASME 2014 Conference on Smart Materials, Adaptive Structures and Intelligent Systems, 2, Newport,

Rhode Island, USA, 8-10 September 2014.

[38] Boco E et al., “Optimiz tion of coil parameters for a nonlinear two degree-

of-freedom (2dof) velocity-amplified electromagnetic vibrational energy h v st ” Proceedings of the SMARTGREENS 2015 4th International

Conference on Smart Cities and Green ICT Systems, Lisbon, Portugal, 20-22

May 2015.

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 1 9 -

Power Optimization of a C-battery Scale Electromagnetic

Vibrational Energy Harvester

E.Boco1, V.Nico

1, R.Frizzell

2, J.Punch

1

1Connect, Stokes Institute, University of Limerick, Limerick, Ireland

2Efficient Energy Transfer Dept, Bell-Labs Alcatel-Lucent

Abstract— In this paper, a two Degree-of-Freedom (2DoF)

velocity amplified electromagnetic energy harvester is investigated.

Dimensions are very close to those of a “C-battery” (26.2mm

diameter and 50mm length, giving a volume of 27.8cm3), making the

harvester suitable to be integrated in electronic devices. The

harvester consists of two masses; one of which is a Halbach array of

magnets that oscillate inside the second mass, which is a set of seven

coils. Impacts between the masses cause velocity amplification of the

secondary (smaller) mass, which improves the electromagnetic

conversion. A finite element simulation is used to justify the choice

of the Halbach configuration for the magnets. Three configurations

of coils are tested, with different numbers of turns, to verify the

optimization. Tests under harmonic excitation result in a high

volumetric Figure of Merit (FoMV).

I. INTRODUCTION

IN the past few years, the idea of “smart cities” h s become

more prevalent in the European vision of future sustainable

development of society “ m t ity” means an urban space

where services are provided to the citizen following real time

data analysis on various factors, such as air pollution, traffic

conditions, temperature, light conditions and so on [1-3]. What

has to be taken into account in such a vision is that in, order to

have a constant and widespread monitoring in such a large

environment, a great number of sensors, transmitters and

receivers must be employed. This requires a large number of

independently-powered electronic devices to be embedded in

buildings, roads, and infrastructures.

Batteries are often regarded as a solution to power sensor

nodes, but they present significant issues: batteries are polluting

and their disposal has to be carefully planned. Moreover,

substituting them at the end of their lifetime can be difficult due

to the number of devices involved in a sensor network, or

potentially impossible if they are embedded in a structure [4].

For this reason, in the last few years, energy harvesting has been

proposed as a better solution to power low consumption devices.

In this paper, vibrational energy is investigated because it is

always present in environment and can be easily converted.

Electromagnetic conversion is considered, due to its high energy

density at small scales. Moreover, its dependency on the relative

velocity between magnet and coil, allows velocity amplification

technique to be exploited in the device [5-6], which can also

provide nonlinearity to broaden the harvesting band [7-8]. This

paper first examines the benefits of a Halbach stack of magnets,

before presenting an approximate method to optimize the

transducer design. In the final section a comparison with

experimental data is reported that demonstrates the benefits of

the optimization strategy for 2DoF systems through a high

volumetric Figure of Merit (FoMV). The suitability of the

optimization technique is also discussed in relation to nonlinear

devices.

II. HARVESTER DESIGN

The harvester (Fig. 1) consists of a primary (larger) mass,

made up of seven coils, oscillating outside a smaller mass, which

consists of the magnet stack. Both the primary and the secondary

masses oscillate between a set of magnetic springs, which

mediate the impacts between the two masses. These impacts

cause velocity amplification to occur, which enhances the power

harvested and introduces nonlinearities to the system. The cap

height, H, can range between 50.10 mm and 57.45 mm.

Figure 7 Harvester section

III. OPTIMIZATION MODEL

As a first approximation, a vibrational energy harvester can be

regarded as a spring-mass-damper system [9], in which the

damping is both due to the mechanical losses and to the energy

conversion (Fig. 2).

Following the linear approximation, the power output of a

damped harmonic oscillator is known to be:

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 2 0 -

Figure 8. Linear approximation for a vibrational energy harvester. The

damping is due both to mechanical losses and the conversion method.

Where P is the power dissipated (W), m is the oscillator mass, ω

and ωN are respectively the excitation frequency and the natural

frequency, and

Where:

If the oscillator is driven at resonance, the expression of the

power output simplifies to:

It is possible to find the amount of power dissipated by the

electrical damping:

Finding the derivative and equating to zero, allows the

maximization condition to be found:

Following this guideline, a model was formulated to calculate

the optimal number of turns and the best wire diameter that fit

within the geometrical design constraints.

In order to choose the highest magnetic flux density magnet

configuration, a COMSOL Multiphysics simulation was

performed, comparing a classical stack of magnets with a

Halbach configuration stack [10-11], as shown in Fig. 3.

In a classical stack, the magnetic flux lines spread out in the

volume; a Halbach configuration features diametrically and

axially polarized magnets which closes the flux line along the

stack. Moreover, the neighboring diametrical-axial-diametrical

magnet periods close the flux in the opposite direction, so that

the intensity difference between two periods is double the

intensity difference of a single periods.

Figure 3 COMSOL modeling: (a) classical stack (b) Halbach stack. Black

arrows denote the magnetic field direction, magnetic flux line are shown, and the colour scale refers to magnetic flux density norm.

Once the Halbach configuration was chosen as the most

suitable for the trasducer, a 2DoF harvester was constructed,

where the smaller mass was a Halback stack and the larger mass

was a series of coils (see Fig. 1). For the optimization method it

was necessary to measure the mean magnetic field strength along

the length of the stack and the mechanical damping of the system

using drop tests.

These parameters were found respectively to be

B = 0.017T

These parameters were found respectively to be

By varying the coil length and the wire diameter, it was

possible to determine the number of turns which maximized the

power output and resulted in a coil that fits the geometric

constraints of the harvester design.

A set of 7 coils with 2800 turns each, wound with a 50 µm

diameter wire was the optimal design.

.

IV. EXPERIMENTAL VERIFICATION

Since the system presented in Section 1 is highly nonlinear, due

to the presence of the magnetic springs and to the impacts, an

experimental verification of the power optimization was

required.

Three sets of coils were built in order to compare their power

outputs, that had respectively 1500, 2800 (to fit the calculation

results) and 4500 turns for each coil.

Experimental tests under harmonic excitation at a = 0.4g

(Figure 4) were conducted and showed a good agreement with

the theoretical prediction. The tests confirmed the power output

from the N=2800 coils being the highest. Interesting effects given by the presence of nonlinearity can

be seen for stronger magnetic springs, changing the height of the

cap (Fig.5): this demonstrates that the optimization condition is

not stable for this system.

A deeper study on how the change in the cap height affects the

dynamics adding non-linearity can be found in [26].

(a) (b)

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 2 1 -

(a) H = 57.45mm

(b) H = 52.40mm

(c) H = 50.10mm

Figure 5 Output power as a function of driving frequency and number of turns

coil. (a) Test with high position of the cap. (b) Test with the cap at 52.40mm

from the base. (c) Test with the lowest position of the cap. The optimization condition is changing..

V. FIGURE OF MERIT

The Volumetric Figure of Merit (FoMV), introduced by

Mitcheson et al.[12] is one of the most commonly used methods

to evaluate the performance of a vibrational energy harvester. It

represents the ratio between the actual power output from the

device and the energy present in a cubic harmonic oscillator

made of gold, with the same volume and mass as the device of

interest, and oscillating at the same frequency:

Figure 6, compares the highest figure of merit obtained from the

device analysed in this paper in this paper with recently

published results [13].

Figure 6 FoMV performance comparison with recent literature, (a) [13], (b)

[14], (c) [15], (d) [16], (e) [17], (f) [9], (g) [18], (h) [19], (i) [20], (j) [21], (k)

[23], (l) [24], (m) [25], (n) [22]. In red, our harvester FoMV

VI. CONCLUSIONS

In this paper, the power optimization of a vibrational energy

harvester employing velocity amplification is presented. The

harmonic oscillator approximation was used as a guideline for

the coil design and its results were compared to experimental

data.

Although some discrepancies due to the system nonlinearuty

were demonstrated, the theoretical power optimization is mostly

in agreement with the experimental data. Finally, the comparison

between the presented harvester performance and devices from

the recent literature demonstrated the effectiveness of velocity

amplification in improving the electromagnetic conversion.

Future work will investigate the parameters affecting nonlinear

electromagnetic optimization, and will study the optimal

mechanical configuration in order to maximize the performance

of 2DoF harvesters.

ACKNOWLEDGMENTS

The authors acknowledge the financial support of Science

Foundation Ireland under Grant No.10/CE/I1853 and the Irish

Research Council (IRC) for funding under their Enterprise

Partnership Scheme (EPS). This work was financially supported

by the Industrial Development Agency (IDA) Ireland.

Figure 4 Voltage and power output exciting the harvester at a = 0.4g. The

voltage output increases with the number of turns, and the power output is

maximum for N = 2800

(a)

(b)

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 2 2 -

REFERENCES

[39] K nouskos, Ho n , “ imu tion of m t G i ity with oftw Ag nts”, in “Computer Modeling and Simulation, 2009. EMS '09. Third

UKSim European Symposium on”, 25-27 Nov 2009, pages 424-429,

Athens. [40] Valasquez- , “ king m t iti s m t - Using Artificial

Int ig n T niqu s fo m t obi ity”, K ynot p h in

SmartGreens 2015, Lisbon. [41] Kamme st tt ,L ng , kopik,Kupzog,K stn , “P ti Risk Ass ssm nt

Using umu tiv m t G i o ”, P o ing of 4th International

Conference on Smart Grids and Green IT Systems, pages 31-42, June 2015, Lisbon .

[42] Kollias, Nikolaidis, In-Wall Thermoelectric Harvesting for Wireless Sensor

Networks”, P o ings of 4th International Conference on Smart Grids and Green IT Systems (SMARTGREENS 2014), pages 213-221, June

2015, Lisbon .

[43] Cottone, F izz , Goy , K y, Pun h, “Enh n vib tion n gy h v st b s on v o ity mp ifi tion” in Jou n of Int ig nt t i

Systems and Structure, Vol. 25, Sage Journal, 2013.

[44] V Ni o, D O’Donoghu , R F izz , G K y, n J Pun h, “A mu tip degree-of-freedom velocity amplified vibrationalenergy harvester: Part B

modelling”, in P o ings of A E, 2(46155), 2014 [45] D O’Donoghu , V Ni o, R F izz , G K y, n J Pun h, “A mu tip

degree-of-freedom velocity-amplified vibrational energy harvester: Part a

xp im nt n ysis”, P o ings of A E, 2(46155), pt 2014

[46] o o, Ni o, O’Donoghu , F izz , K y, Pun h, “Optimization of Coil Parameters for a Nonlinear Two Degree-of-Freedom (2DOF) Velocity-

amplified Electromagnetic Vibrational Energy Harvester”, Proceeding of

4th International Conference on Smart Grids and Green IT Systems, pages 31-42, June 2015, Lisbon .

[47] S.P.Beeby, R.N.Torah, M.J.Tudor, P.Glynne-Jon s, T O’Donn ,

T R sh , n Roy “A mi o t om gn ti g n to fo vib tion n gy h v sting”, Jou n of i om hanics and microengineering,

17(7), July 2007.

[48] X T ng, T Lin, n L Zuo, 2013 “D sign n optimiz tion of tubu in t om gn ti vib tion n gy h v st ” T ns tions on

Mechatronics, IEEE/ASME,19(2), Mar, pp. 615–622.

[49] Zhu, Beeby, Tudor, and H is “Vib tion n gy harvesting using the h b h y” In m t t i s and Structures, IOP Publishing, 2012.

[50] P.D.Mitcheson, E.M.Yeatman, G.K.Rao, A.S.Holmes, and T.C.Green.

“En gy h v sting f om hum n n m hin motion for wireless electronic vi s”, in Proceedings of the IEEE, 96(9).

[51] Ash f, Khi , D nnis, n h u in, 2013 “Imp ov n gy h v sting from low frequency vibrations by resonance amplification at multiple

f qu n i s” In nso s and Actuators A: Physical, ELSEVIER.

[52] Ashraf, Khi , D nnis, n h u in, “A wi b n f qu n y up-converting bounded vibration energy harvester for a low frequency

nvi onm nt”, in m t t i s and Structures, IOP Publishing, 2013.

[53] G h v, u gh, P t son, n N j fi, “H v sting t ffi -induced vib tions fo st u tu h th monito ing of b i g s”, in Jou n of

Micromechanics and Microengineering, IOP Publishing, 2011.

[54] G h v, Kim, n N j fi, “A p m t i f qu n y in s pow generator for scavenging low frequency ambient vibrations”, in P o i

Chemistry, Volume 1, Issue1, September 2009, Pages 14391442,

ELSEVIER, 2009. [55] R n u , Fio ini, h ijk, n Hoof , “H v sting n gy f om th motion of

human limbs: the design and analysis of an impact-based piezoelectric

g n to ” in m t Materials and Structures, IOP Publishing, 2009. [56] Ay , Zhu, Tu o , n by, 2009 “Autonomous tun b energy

h v st ”, in Pow E , 2009

[57] y, isungsitthisunti, Xu, Rho s, Jung, n P ou is, 2009 “ omp t low frequency meandered piezoelectric n gy h v st ”, in Pow E ,

2009.

[58] Ferro Solutions, VEH-460. [59] ini, n p oni, “An ffi i nt t om gn ti pow h v sting

device for low-f qu n y pp i tions”, in nso s n tu to s A:

Physical, ELSEVIER, 2011. [60] Ching, Wong, Li, Leong, an W n, “A s mi om hin mu ti-modal

son ting pow t ns u fo wi ss s nsing syst ms” in nso s n

actuators A: Physical, ELSEVIER, 2002. [61] Zhu, by, Tu o , n H is, “Vib tion n gy h v sting using th

h b h y”, in m t t ials and Structures, IOP Publishing 2012.

[62] Ku k ni, Koukh nko, To h, Tu o , by, O’Donn , n Roy, 2008

“D sign, f b i tion n t st of int g t mi o-scale vibration-based

t om gn ti g n to ” in nso s n A tu to s A: Physi , ELSEVIER.

[63] Y ng, n L , 2010 “Non-resonant electromagnetic wideband energy

h v sting m h nism fo ow f qu n y vib tions” In i osyst T hno

(2010) 16:961966, Springer-Verlag 2010. [64] Nico, Boco, Punch, Frizzell, Dynamics of a c-battery scale energy

harvester, 10th issue ICT-Energy Letters,2015.

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 2 3 -

Virtual Machine Power Modelling in Multi-tenant

Ecosystems: Challenges and Pitfalls

Leila Sharifi1, Felix Freitag

1, Luís Veiga

2

1 Computer Architecture Dept., Universitat Politecnica de Catalunya, Barcelona, Spain

2 Técnico Lisboa/INESC-ID Lisboa, Lisboa, Portugal

Abstract— Since energy has emerging as the first class

computing resource, we need to characterize this resource in

different granularity. On the other hand, the computing paradigm

is shifting to the multi-tenant ecosystems. Therefore, characterizing

the power consumption on Virtual Machines(VMs), running in data

center hosts is necessary to attain energy efficient cloud ecosystems.

In this paper, we study the challenges should be addressed in VM

power modeling in cloud service provisioning.

I. INTRODUCTION

Energy and associated environmental costs (cooling,

carbon footprint, etc.) of IT services constitute a remarkable

portion of service dynamic cost. Indeed, estimating energy

consumption at each level of service provisioning stack, i.e. from

hardware to , operating system/Virtual Machine(VM) and

application is the cornerstone of research toward energy

efficiency all through the stack.

Although there is a growing body of work centered on the

energy aware resource management, allocation and scheduling

[7,10,11] they mainly considered the whole system energy

measurement, estimation, improvement and optimization. There

is only limited work focusing on the energy issues per individual

job [1,5,8,18]. However, they only aim at reducing total energy

consumption in the infrastructure without taking into account the

energy-related behavior of each individual , its performance and

price, i.e., how expensive and efficient is the energy employed

for the observed job performance or progress.

Nonetheless, energy-based job pricing confronts some more

challenges further to the system wide energy efficiency issues. In

the system wide energy efficiency, the energy consumption of the

resources are measurable simply by plugging the energy meter

devices or exploiting the embedded sensors of the contemporary

devices. Nonetheless, it is nontrivial to measure the energy

consumed per VM, since we cannot embed a physical sensor in a

VM or plug a metering device to it. Therefore, estimation is still

the only option in this case. Estimation results in a more

complicated model since it has to deal with uncertainty and error.

In this work, we study the challenges of VM power modeling

in a multi-tenant ecosystem. In the next section, we outline the

background terms and hypothesis that we use in this work.

Section 3 surveys the sate of the art in VM power modeling and

introduces the challenges in this area. The work is concluded in

Section 4.

II. BACKGROUND

In this section we introduce the background and hypothesis

required to drive the discussion in the rest of this paper.

Energy Proportionality

The vision of energy proportional system implies the power

model of an ideal system in which no power is used by idle

systems (Ps=0), and dynamic power dissipation is linearly

proportional to the system load.

LDR indicates the maximum difference of the actual power

consumption, P(U), and linear power model over the linear

power model as in (1).

(1)

IPR is the indicator of idle to peak power consumption as

illustrated in (2)

(2)

To measure how far a system power model is from the ideal

(energy proportional) one, Proportionality Gap(PG) (19) is

defined as the normalized difference of the real power value and

the ideal power value, which is indicated as PMax × U, under a

certain utilization level as shown in (3). Therefore, having

proportionality gap values for a given device, we can reconstruct

the power model of the device.

(3)

Given the state of the art hardware, designing hardware

which is fully energy proportional remains an open challenge,

power model of a non-energy proportional system is illustrated

in Figure 1. However, even in the absence of redesigned

hardware, we can approximate the behavior of energy

proportional systems by leveraging combined power saving

mechanisms [16] and engaging heterogeneous commodity

devices combined with powerful server machines in lieu of

homogeneous server hardware platform [19].

( ) ( )s d

s d

P U P PLDR max

P P

idle

Max

PIPR

P

( ) ( )( ) Max

Max

P U P UPG U

P

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 2 4 -

Figure 9 - Energy Proportionality.

II.1. Server power modeling

State of the art platforms are not capable of fine-grained

power measurement. Therefore, to manage dynamic power

proportionality, a power model is required. Currently, Running

Average Power Limit (RAPL) counters, available in the recent

Intel CPUs, is the closest to the hardware based monitoring.

RAPL allows monitoring of whole CPU package, cores, and

DRAM. Since these counters are not available in all CPUs, to

cope with the heterogeneous infrastructure, a group of works rely

on performance counters for synthesising a power model.

Mapping the counter and power values is usually done through

linear regression. Nonetheless, linear power model is not

sufficient in many cases due to non-proportional power

dissipation characteristics of CPUs.

Moreover, linear models rely on the non-correlated covered

features, which is not a valid assumption in the state of the art

systems. A quadratic solution fits better the power modeling of

multi-core systems [2].

However, Hyperthreading and turbo-boost may still impede

the model from accurate estimation, due to hidden states they

make. A hyperthread aware power modeling mechanism is

introduce in [20]. The introduced model differentiates between

the cases where either single or both hardware threads of a core

are in use.

The most recent work in this line is BitWatts [5], which

introduces a counter based power model for each individual

frequency.

Nonetheless, there is a trade off between accuracy and the

overhead. Targeting the community of commodity devices

forming a collaborative system, in edge layer, integrated with

data center to form P2P assisted clouds, we should particularly

tune the trade off level due to the lower energy consumption in

such devices, which acquires adaptive models.

III. ESTIMATING VM ENERGY

In multi-tenant platforms, the efficiency of VM

consolidation, power dependent cost modeling, and power

provisioning are highly dependent on accurate power models.

Such models are particularly needed because it is not possible to

attach a power meter to a virtual machine. In this section we

review the state of the art VM power models and demonstrate

their shortcomings.

III.1. Related Work

In general, VMs can be monitored as black-box systems for

coarse-grained scheduling decisions. However, for fine-grained

scheduling decisions—e.g., with heterogeneous hardware—

finer-grained estimation at sub-system level is required and

might even need to step inside the VM.

So far, fine-grained power estimation of VMs required

profiling each application separately. To exemplify, WattApp

[12], which relies on application throughput instead of

performance counters as a basis for the power model. PMapper

[17] maps resources using a centralized step-wise decision

algorithm in lieu of application power estimation.

To generalize power estimation, JouleMeter [9] assumes that

each VM only hosts a single application and thus treat VMs as

black boxes. In a multi-VM system, they try to compute the

resource usage of each VM in isolation and feed the resulting

values in a power model. Bertran et al. [1] propose an approach

employes a sampling phase to gather data related to

performance-monitoring counters (PMCs) and compute energy

models from these samples. With the gathered energy models, it

is possible to predict the power consumption of a process, and

therefore apply it to estimate the power consumption of the entire

VM. Another example is VMeter [3], which estimates the

consumption of all active VMs on a system. A linear model is

us to omput th V s’ pow onsumption with the help of

available statistics (processor utilization and I/O accesses) from

each physical node. The total power consumption is

subs qu nt y omput by summing th V s’ onsumption with

the power consumed by the infrastructure.

Janacek et al. [6] exploit a linear power model to compute the

server consumption with postmortem analysis. The computed

power consumption is then mapped to VMs depending on their

load. This technique is not effective when runtime information is

required. In general, VMs can be monitored as black-box

systems for coarse-grained scheduling decisions. However, for

fine-grained scheduling decisions—e.g., with heterogeneous

hardware— finer-grained estimation at sub-system level is

required and might even need to step inside the VM.

So far, fine-grained power estimation of VMs required

profiling each application separately.To exemplify, WattApp

[12], which relies on application throughput instead of

performance counters as a basis for the power model. PMapper

[17] maps resources using a centralized step-wise decision

algorithm in lieu of application power estimation.

IV. CHALLENGES

However, collocation of applications has its own challenges.

Workload intensity is often highly dynamic. The power profile of

the data center hardware is inherently heterogeneous; this makes

the optimal VM power modeling problem more complicated.

The nonlinearity and in some cases unpredictability of the energy

efficiency profile aggravates the complexity of energy efficient

collocation management, due to inaccurate VM power

characterization.

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 2 5 -

IV.1. Static Power and Overhead Modeling

Energy consumption of the host per job embraces the static

power consumption, independent of the resource utilization, and

the dynamic power, which is degraded not only proportional to

the VM's allocated resources but also on account of the overhead

caused in the hypervisor, and the interference due to collocation.

Estimating this overhead is complicated since the pattern of the

hypervisor overhead is tightly coupled with the number of VMs,

the type of resources each VM asks for, and the number of times

the switching occurs between VMs and hypervisor. Thus, for a

more accurate estimation, further to individual VM's energy, VM

interference energy overhead should also be estimated. Some

estimation methods have been proposed in the state of the art:

e.g. [3,4,14]. In [15] the authors argue that, in virtualized

environments, energy monitoring has to be integrated within the

VM as well as the hypervisor.

Work in [13] introduces an interference coefficient, defined

to model the energy interference. The major contribution of this

work is to estimate the energy interference according to the

previous knowledge of standalone application running on the

same machine. They model interference as a separate implicit

task. Moreover, an energy efficient collocation management

policy is introduced in this work that is modeled as an

optimization problem solvable by data mining techniques. All

the VMs running on the same machine are known as a collection.

The energy consumption of a collection is the sum of idle energy

consumed for the longest VM run, dynamic energy consumed by

each VM if they were run in isolated environment, and the

energy depleted due to interference between each VM pair. The

interference energy can be positive or negative depending on the

intersection of resources between each VM pair. Interference

energy is estimated as the coefficient of the summation of idle

and isolated run for each VM. On the other hand, performance is

measured as the delay, which is measured by modeling the

system as a M/M/1 queue and calculating the imaginary

interference tasks response time as the delay due to interference.

IV.2. Non-energy Proportional Host Effect

Besides the hypervisor and interference overhead in multi-

tenant systems, the non-energy proportional hardware adds more

complexity to the VM power modeling agenda. In non-energy

proportional hardware platform, since the hardware power model

is non-linear, two identical VMs, sharing the same hardware,

may end up with different dynamic power usage estimation

during the runtime, which may lead to unfair energy based

service charging, and planning.

Figure 2, visualizes such a case. In this scenario, there are

two identical VMs, i.e. VM1 and VM2, collocated on a host with

the power model demonstrated in the Figure.

If we only run VM1, the dynamic power estimated for this

VM will be P1, whereas running the second identical VM on the

same machine predicted as P2 < P1. Therefore, in case of

collocation, there should be a strategy to divide the dynamic

power fairly among the running VMs.

Figure 10 - VM power modeling issues in non-energy proportional

systems.

To address the fairness issue introduced in the previous

section we propose the weighted division VM power model. In

this model as illustrated in [4], a particular VM's power

consumption, PVM(i) is calculated according to the relative

utilization, i.e. u(i)/U, contributed by that particular VM. In this

equation, u(i) represents the utilization incurred by VM i, and U

denotes the overall machine utilization.

(4)

V. CONCLUSION

In this paper, we explained the state of the art Virtual

Machine(VM) power modeling techniques and their

shortcomings. We demonstrated that current VM power models,

fail to capture the effect of non-energy proportional hosts in a

multi-tenat cloud ecosystems. We argued that a fair VM power

model needs not only to be able to characterize per VM resource

usage and translate it to power, but also it requires to be aware of

the overall host utilization for fair power division.

Moreover, interference and other vitalization overhead

require an accurate model which can be mappable to the power

consumption. Therefore, future line of work in VM power

modeling should address overhead modeling in multi-tenant

ecosystems baring in mind the properties of non-energy

proportional hosts.

REFERENCES

[65] Bertran, R.; Gonzàlez, M.; Martorell, X.; Navarro, N. & Ayguadé, E. “Counter-based power modeling methods: Top-down vs. bottom-up” The

Computer Journal, Br Computer Soc, 2013, 56, 198-213.

[66] Bircher, W. L. & John, L. K. “Complete system power estimation: A trickle-down approach based on performance events Performance Analysis

of Systems & Software”, 2007. ISPASS 2007. IEEE International

Symposium on, 2007, 158-168.

[67] Bohra, A. E. & Chaudhary, V. VMeter: “Power modelling for virtualized

clouds Parallel & Distributed Processing”, Workshops and Phd Forum

(IPDPSW), 2010 IEEE International Symposium on, 2010, 1-8.

[68] Chen, Q.; Grosso, P.; van der Veldt, K.; de Laat, C.; Hofman, R. & Bal, H.

“Profiling energy consumption of VMs for green cloud computing”,

Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth International Conference on, 2011, 768-775.

[69] Colmant, M.; Kurpicz, M.; Felber, P.; Huertas, L.; Rouvoy, R. & Sobe, A. “Process-level power estimation in VM-based systems”, European

Conference on Computer Systems (EuroSys), 2015, 14.

( )( ) i

VM

u P UP i

U

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 2 6 -

[70] Janacek, S.; Schroder, K.; Schomaker, G.; Nebel, W.; Ruschen, M. &

Pistoor, G. “Modeling and approaching a cost transparent, specific data

center power consumption”, Energy Aware Computing, 2012 International Conference on, 2012, 1-6.

[71] Jin, X.; Zhang, F.; Wang, L.; Hu, S.; Zhou, B. & Liu, Z. “A Joint

Optimization of Operational Cost and Performance Interference in Cloud Data Centers”, arXiv preprint arXiv:1404.2842, 2014.

[72] Kaewpuang, R.; Chaisiri, S.; Niyato, D.; Lee, B. & Wang, P. “Cooperative

Virtual Machine Management in Smart Grid Environment”, IEEE, 2013.

[73] Kansal, A.; Zhao, F.; Liu, J.; Kothari, N. & Bhattacharya, A. A. ,”Virtual

machine power metering and provisioning”, Proceedings of the 1st ACM

symposium on Cloud computing, 2010, 39-50.

[74] Khosravi, A.; Garg, S. K. & Buyya, R. “Energy and carbon-efficient

placement of virtual machines in distributed cloud data centers”, Euro-Par

2013 Parallel Processing, Springer, 2013, 317-328.

[75] Kim, N.; Cho, J. & Seo, E. “Energy-credit scheduler: An energy-aware

virtual machine scheduler for cloud systems”, Future Generation Computer

Systems , 2012.

[76] Koller, R.; Verma, A. & Neogi, A. “WattApp: an application aware power

meter for shared data centers”, Proceedings of the 7th international

conference on Autonomic computing, 2010, 31-40.

[77] Pore, M.; Abbasi, Z.; Gupta, S. K. & Varsamopoulos, G. “Energy aware

Colocation of Workload in Data centers”, High Performance Computing

(HiPC), 2012 19th International Conference on, 2012, 1-6.

[78] Smith, J. W.; Khajeh-Hosseini, A.; Ward, J. S. & Sommerville, I.

“Clou onito : P ofi ing Pow Us g ”, Cloud Computing (CLOUD),

2012 IEEE 5th International Conference on, 2012, 947-948.

[79] Stoess, J.; Lang, C. & Bellosa, F. “Energy Management for Hypervisor-

Based Virtual Machines”. USENIX annual technical conference, 2007, 1-

14.

[80] Tolia, N.; Wang, Z.; Marwah, M.; Bash, C.; Ranganathan, P. & Zhu, X.

“Delivering Energy Proportionality with Non Energy-Proportional

Systems-Optimizing the Ensemble”. HotPower, 2008, 8, 2-2.

[81] A. Verma P. Ahuja, A. N. “pMapper: power and migration cost aware

application placement in virtualized systems”, 9th ACM/IFIP/USENIX

International Conference on Middleware , 2008, 243-264.

[82] Wang, C.; Urgaonkar, B.; Kesidis, G.; Shanbhag, U. V. & Wang, Q. “A

case for virtualizing the electric utility in cloud data centers”, Proceedings

of the 6th USENIX conference on Hot Topics in Cloud Computing, 2014, 22-22.

[83] Wong, D. & Annavaram, M. “Knightshift: Scaling the energy

proportionality wall through server-level heterogeneity”, Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on,

2012, 119-13.

[84] Zhai, Y.; Zhang, X.; Eranian, S.; Tang, L. & Mars, J. “Happy: Hyperthread-aware power profiling dynamically”, 2014 USENIX Annual Technical

Conference (USENIX ATC 14)(Philadelphia, PA, 2014, 211-217.

N . 1 0 - 1 5 t h J u l y 2 0 1 5 I C T - E N E R G Y L E T T E R S

- P a g e 2 7 -

SAVE THE DATE

D A T E E V E N T W E B S I T E

Aug 31 – Sep 1, 2015 - Dresden EnA-HPC 2015 http://www.ena-hpc.org

September 7-9, 2015 - Copenhagen EnviroInfo & ICT4S 2015 http://enviroinfo2015.org/

September 14-16, 2015 - Bristol ICT-Energy Workshop www.ict-energy.eu

October 20-22, 2015 - Lisbon ICT 2015

Innovate, Connect, Transform

https://ec.europa.eu/digital-agenda/en/ict2015-

innovate-connect-transform-lisbon-20-22-

october-2015

Nov 29 – Dec 4, 2015 - Boston 2015 MRS Fall Meeting & Exhibit http://www.mrs.org/fall2015/

December 11-13, 2015 - Sydney IEEE GreenCom 2015 http://www.swinflow.org/confs/greencom2015/

December 14-16, 2015 – Las Vegas IGSC’15 http://igsc.eecs.wsu.edu/

January 18-20, 2016 - Prague HiPEAC 2016 Conference https://www.hipeac.net/2016/prague/

January 20, 2016 - Prague

ICT-Energy Workshop

Minimizing Energy Consumption of

Computing to the Limit

https://www.hipeac.net/events/activities/7326/ict-

energy/

July, 2016 – Copenhagen ICT-Energy International Conference www.ict-energy.eu

ICT-ENERGY LETTERS is realized with the contribution of LANDAUER

project and ICT-Energy Coordination Action, funded under the Future and

Emerging Technologies (FET) programme within the ICT theme of the Seventh

Framework Programme for Research of the European Commission.

I C T - E N E R G Y L E T T E R S

c/o NiPS Laboratory, Dipartimento di Fisica e Geologia

Università degli Studi di Perugia

Via A. Pascoli, 1 – I-06123 – Perugia (Italy)

www.ict-energyletters.eu