[IEEE 2013 1st International Conference on Orange Technologies (ICOT 2013) - Tainan (2013.3.12-2013.3.16)] 2013 1st International Conference on Orange Technologies (ICOT) - Dynamic

Dynamic Power Management for Embedded Ubiquitous Systems

Anand Paul#, Bo-Wei Chen*, 1. Jeong#, and Jhing-Fa Wang*

The School of Computer Science and Engineering, Kyungpook National University, Korea #Electronics Engineering, Hanyang University, Korea

*Electrical Engineering, National Cheng Kung University, Tainan, Taiwan

Abstract- In this work, embedded system working model is

designed with one server that receives requests by requester

through a queue, and that is controlled by a power manager (PM).

A novel approach is presented based on reinforcement learning

to predict the best policy amidst existing DPM policies and

deterministic markovian non stationary policies (DMNSP). We

apply reinforcement learning which is a computational approach

to understanding and automating goal-directed learning and

decision-making to DPM. Reinforcement learning uses a formal

framework defining the interaction between agent and

environment in terms of states, actions, and rewards. The

effectiveness of this approach is demonstrated by an event driven

simulator which is designed using JA VA with a

power-manageable embedded devices. Our experiment result

shows that the novel dynamic power management with time out

policies gives average power saving from 4% to 21% and the

novel dynamic power management with DMNSP gives average

power saving from 10% to 28% more than already proposed

DPM policies.

Index Terms- Dynamic Power Management, Embedded

systems, Reinforcement learning.

I. INTRODUCTION Energy consumption has become one of the primary concerns in electronic design due to the recent popularity of portable devices and cost concerns related to desktops and servers. The battery capacity has improved very slowly (a factor of 2 to 4 over the last 30 years), while the computational demands have drastically increased over the same period. Heat extraction is a large issue for both portable and non-portable electronic systems. Finally, in recent time the operating costs for large electronic systems, such as data warehouses, has become a concern.

At the system level, there are three main sources of energy dissipation: (i) processing units; (ii) storage elements; (iii) interconnects and communication units. Energy efficient system level design must minimize the energy consumption in all three types of components, while carefully balancing the effects of their interaction. The software implementation also strongly affects the system energy consumption. For example, software compilation affects the instructions used and thus the energy consumed by computing elements; software storage and data access in memory affect the energy balance between processing and storage units; and the data representation affects power dissipation of the communication resources.

978-1-4673-5936-8/13/$31.00 ©2013 IEEE 67

Embedded systems are collections of components, which may be heterogeneous in nature. For example, a laptop has a digital VLSI component, an analog component (wireless card), a mechanical part (hard disk drive), and an optical component (display). Peak performance in electronic design is required only during some time intervals. As a result, the system components do not always need to be delivering peak performance.

The ability to enable and disable components, as well as tuning their performance to the workload (e.g., user's requests), is important in achieving energy efficient utilization. In this work, we will present new approaches for lowering energy consumption in both system design and utilization. The Policy presented in this thesis has been experimented using an event driven simulator, which demonstrated the effectiveness in power savings with less impact on performance or reliability.

The fundamental premise for the applicability of power management schemes is that systems, or system components, experience non-uniform workloads during normal operation time. Non-uniform workloads are common in communication networks and in almost any interactive system.

Dynamic power management (DPM) techniques achieve energy efficient utilization of systems by selectively placing system components into low-power states when they are idle. As illustrated in figure 1.

Figure 1 An Interactive Systems is busy or idle depending on the user requests Dynamic voltage scaling (DVS) algorithms reduce energy consumption by changing processor speed and voltage at run-time depending on the needs of the applications running. If only processor frequency is scaled, the total energy savings would be small as power is inversely proportional to cycle time and energy is proportional to the execution time and power. Early DVS algorithms set processor speed based on the processor utilization of fixed intervals and did not consider the individual requirements of the tasks running. There has been a number of voltage scaling techniques proposed for real-time

systems. The approaches presented in [2] assume that all tasks run at their worst-case execution time (WCET). The workload variation slack times are exploited on task-by-task basis in [6], voltage scheduler that determines the operating voltage by analyzing application requirements. The scheduling is done at task level, by setting processor frequency to the minimum value needed to complete all tasks. For applications with high frame-to-frame variance, such as MPEG video, schedule smoothing is done by scheduling tasks to complete twice the amount of work in twice the allocated time.

In all DVS approaches presented in the past, scheduling was done at the task level, assuming multiple threads. The prediction of task execution times was done either using worst case execution times, or heuristics. Such approaches neglect that DVS can be done within a task or for single-application devices. For, instance, in MPEG decoding, the variance in execution time on frame basis can be very large: a factor of three in the number of cycles [15], or a range between 1 and 2000 IDCTs per frame for MPEG video.

System-level dynamic power management [1] decreases the energy consumption by selectively placing idle components into lower power states. System resources can be modeled using state-based abstraction where each state trades off performance for power [2]. This is illustrated in figure 2.

Work load I Requests I I Requests I i •

Device I Busy _ Busy I Time

Power state I l,tVorking i 1M Sleeping .. Working I I I I I

Tj T2 T3 7: Figure 2.System behaviors in different state

For example, a system may have an active state, an idle state, and a sleep state that has lower power consumption, but also takes some time to transition to the active state. The transitions between states are controlled by commands issued by a power manager (PM) that observes the workload of the system and decides when and how to force power state transitions. The power manager makes state transition decisions according to the power management policy. The choice of the policy that minimizes power under performance constraints (or maximizes performance under power constraint) is a constrained optimization problem.

[n the recent past, several researchers have realized the importance of power management for large classes of applications. Chip-level power management features have been implemented in mainstream commercial microprocessors [2]. The most common power management policy at the system level is a timeout policy implemented in most operating systems. The drawback of this policy is that it wastes power while waiting for the timeout to expire [3].

Predictive policies for hard disks [3,4] and for interactive terminals [1] force the transition to a low power state as soon as a component becomes idle if the predictor estimates that the idle period will last long enough. An incorrect estimate can cause both performance and energy penalties. The distribution of idle and busy periods for an interactive terminal is represented as a time series in [3], and approximated with a

68

least-squares regression model. The regression model is used for predicting the duration of future idle periods. A simplified power management policy predicts the duration of an idle period based on the duration of the last activity period. The authors of [3] claim that the simple policy performs almost as well as the complex regression model, and it is much easier to implement. In [2], an improvement over the prediction algorithm of [5] is presented, where idleness prediction is based on a weighted sum of the duration of past idle periods, with geometrically decaying weights. The policy is augmented by a technique that reduces the likelihood of multiple mispredictions. All these policies are formulated heuristically, and then tested with simulations or measurements to assess their effectiveness.

II. OVERVIEW

In this paper, we present new approaches that combine the advantages of discrete and continuous models. The new DPM algorithms with intelligent approach are guaranteed to be globally optimal, while allowing event-driven policy evaluation and providing a more flexible and accurate model for the user and the device. The system can be modeled with three components: the user, device and the queue as shown in Figure 3. While the methods presented in this work are general, the optimization of energy consumption under performance constraints (or vice versa) is applied to and measured on the following devices: WLAN card [5] on the laptop, the Smart Badge [7] and laptop and desktop hard disks. The Smart Badge is used as a personal digital assistant (PDA). The WLAN card enables internet access on the laptop computer running Linux operating system. The hard disks are both part of Windows machines, one in the desktop and the other in the laptop. The queue models a memory buffer associated with each device. In all examples, the user is an application that accesses each device by sending requests via

Figure 3 System model The power management aims at reducing energy

consumption in systems by selectively placing components into low power states. Thus, at run time, the power manager (PM) observes user request arrivals, the state of the device's buffer, the power state and the activity level of the device. When all user requests have been serviced, the PM can choose to place the device into a low power state. This choice is made based on a policy. Once the device is in a low power state, it returns to active state only upon arrival of a new request from a user. Note that a user request can come directly from a human user, from the operating system, or even from another device. Each system component is described probabilistically.

The user behavior is modeled by a request interarrival distribution[9] . Similarly, the service time distribution describes the behavior of the device in the active state. The transition distribution models the time taken by the device to transition between its power states. Finally, the combination of interarrival time distribution (incoming jobs to the queue) and service time distribution (jobs leaving the queue) appropriately characterizes the behavior of the queue. These three categories of distributions completely characterize the stochastic optimization problem. The details of each system component are described in the next sections.

A. Requester model

A special entity called "requester" that generates workloads including 10 requests and computation needs. Request modeling is one essential part of power management because policies predict future workloads based on their requester models. We consider two requester models for designing policies: single requester, multiple requesters. These models are increasingly complex and close to the programs running on realistic interactive systems like a laptop computer.

Power mana�er Observet

Requester I Reauest

Figure 4 Single requester model

Figure 4 depicts the concept of the single-request model. The requester generates requests for the device; mean while, the power manager observes the requests. Based on this observation, the power manager issues commands to change the power states of the device. Some policies explicitly use this model in determining their rules to change power states [7]; some other policies implicitly assume a single requester [6]. Multiple-requester Model:

In complex system, there may be more than one entities that generate requests. For example, in a multiprogramming system, several process may generate requests to the same device. Different processes consume different energy. In particular, a server consumes large energy on both a network card and hard disk.

The request interarrival times in the active state (the state where one or more requests are in the queue) for all three devices are Poisson distribution in nature. Thus, we can model the user in active state with rate A and the mean request interarrival time lIA where the probability of the hard disk or the Smart Badge receiving a user request within time interval t follows the Poisson probability distribution shown below. The exponential distribution[13] does not model well arrivals in the idle state. The model we use needs to accurately describe the behavior of long idle times as the largest power savings are possible over the long low-power periods. We first filter out short user request interarrival times in the idle state in order to focus on the longer idle times.

69

III. PROPOSED POWER AWARE SYSTEM A general model for dynamic power management learning agent is shown in figure 5, the accumulation of experience that guides the behavior (action policy) is represented by a cost estimator whose parameters are learned as new experiences are presented to the agent.

The agent is also equipped with sensors that define how observations about the external process are made. These observations may be if necessary combined with past observations or input to a state estimator, defining an information vector or internal state which represents the agent's belief about the real state of the process The cost estimator then maps these internal states and presented reinforcements to associated costs, which are basically expectations about how good or bad these states are, given the experience obtained so far. Finally, these costs guide the action policy. The built-in knowledge may affect the behavior of the agent either directly, altering the action policy or indirectly, influencing the cost estimator or sensors.

The experience accumulation and action taking process is represented by the following sequence. At a certain instant of time, the agent: 1. Makes an observation and perceives any reinforcement signal provided by the Process. 2. Takes an action based on the former experience associated with the current observation and reinforcement. 3. Makes a new observation and updates its cumulated experience.

l +

Process I

' ... I

Action at I State Xt I

I

Reinforcement rt � r

�'" Cost Estimation

Selection � I-Experience

� �ensing

Accumulation

T i i

Built-in knowledge

Figure 5 A general model of the learning agent Agent

A.Intelligent DPM Model In this section, Intelligent dynamic power

management(IDPM) designed using reinforcement learning agent method which is given In fig 6. In this agent have all

existing heuristic and DMNSP policies. The agent predicts the best suitable policy online from amidst policies and gives the control of DPM to the predicted one. . This agent learning to predict the best by reinforcement learning method

POLICIES I

-+ I

Action at I POLICY Xt I

I

Reinforcement rt r

!Goo Cost Estimation

Selection � _ Experience � r- Sensing Accumulation

i i i

REWARD TABLE

Agent

Figure 6. An IDPM model of the learning agent

B. The Reinforcement Condition oflDPM The basic assumption of Markov Decision Processes is the Markov condition: any observation made by the agent must be a function only of its last observation from the state transition and action on select the best policy and change the control to the best (plus some random disturbance)

(1)

Where 0t is the observation at time t, at is the action taken to

predict best policy and Wt is the reward weight. 0t Provides

complete information about X t. This is equivalent to perfect

observability of best policy, Of particular interest is the discounted infinite horizon

formulation of the Markov Decision Process problem. Given

� A finite set of possible actions a E A, � A finite set of polices X EX, � A finite set of bounded

(payoffs r{ X, a) E 9{; reinforcements

70

The agent gives the reward to which policy minimizes the power consumption. The condition of policies for getting the

reward is power saving P save in sleep time should be more

than the sum of power consumption at wake up time

Pwake and power consumption of idle time Pidle of

embedded system.

�h-sleep X Psave � TwakexPwake + TfdlexPidle

To get reward, the policy should make the embedded device to

sleep state until the above condition is satisfied. So the

threshold time 1'rh _ sleep is

T: > TwakexPwake + TfdlexPidle th-sleep -

Psave To get reward, the policy should make the system idle state,

above or equal to the threshold time I'rh _ sleep.

The pseudo code for the IDPM given below

IDPM For every sec If (any request)

{

} Else if(No

REINFORCEMENT AGENT For every 10 sec

Agent selects the winner from

reward

Table;

Th = winner policy Th;

If (Th < idle_time)

Make the system to sleep;

IV . EXPERIMENTAL RESULTS

All the policies suggested so far [14] have either under prediction or over prediction by which they pay performance or power penalty. Our policy makes sure that server is ON, when there is an event in the Service Requester and Service Queue. Which means that under prediction or over prediction will never occur. Performance penalty will never occur by the proposed scheme.

Following are the experimental result comparison of proposed scheme with other existing methods.

o Always on DDMNSP

o

• Greedy .IDPM(TO)

500 1000

DTime out o IDPM(MP)

1500 POWER CONSUM PTION(JOULES)

Figure 7 Power consumption of IBM Hard Disk Drive under different DPM polices

o Always on DDMNSP

o 100


200

DTime out DIDPM(MP)

300 400 POWER CONSUMPTION(JOULES)

Figure 8 Power consumption of Fijisu hard disk drive under different DPM polices

o Always on o DMNSP

5

z 4 :ill 3 W ;! 2 I-

� 0

• Greedy DTime out .IDPM(TO) o IDPM(MP)

200 400 600 POWER CONSUM PTION(JOULES)

Figure 9 Power consumption ofHP Smart badge under different DPM polices

71

o Always on DDMNSP

o


500 1000

DTime out DIDPM(MP)

1500 POWER CONSUMPTION(JOULES)

Figure 10 Power consumption of WLAN under different DPM polices

REFERENCES

[I] L.Benini, R. Hodgson and P. Siegel, "System-Level Power Estimation and Optimization", International Symposium on Low Power Electronics and Design, pp. 173-178,1998. [2] L. Benini, G. Paleologo, A Bogliolo and G. De Micheli, "Policy Optimization for Dynamic Power Management", in IEEE Transactions on Computer-Aided Design, vol. 18, no. 6, pp. 813-833, June 1999. [3] E. Chung, L. Benini and G. De Micheli, "Dynamic Power Management using Adaptive Learning Trees ", International Conference on Computer-Aided Design, 1999. [4] R Golding, P. Bosch and J.Wilkes, "Idleness is not sloth" HP Laboratories Technical Report HPL-96-140, 1996. [5] Anand Paul , Ebenezer Jeyakumar, "Power Aware Energy Efficient Real time Embedded Systems", Proceeding of ADCOM 2003 PSG College of Technology. [6] E. Chung, L. Benini and G. De Micheli, "Dynamic Power Management for nonstationary service requests", Design, Automation and Test in Europe, pp. 77-81, 1999. [7] Q. Qiu and M. Pedram, "Dynamic power management based on continuous-time Markov decision processes", Design Automation Conference, pp. 1999. [8] T. Simunic, L. Benini, G. De Micheli, "Energy-Efficient Design of Battery-Powered Embedded Systems," International Symposium on Low Power Electronics and Design, 1999 . [9] "Operations Research An Introduction", Hamdy ATaha, Prentice hall of India, 2002 [10] S. Irani and S. Shukla and R. Gupta. "Competitive Analysis of Dynamic Power Management Strategies for Systems with Multiple Power Saving State". In Proceedings of the Design Automation and Test Conference Europe (DATE02),2002 [11] Carlos Henrique Costa Ribeiro. A Tutorial on Reinforcement Learning Techniques. Division of Computer Science Department of Theory of Computation Technological Institute of Aeronautics S-ao Jos' e dos Campos, Brazil [12] Richard S. Sutton and Andrew G. Barto Reinforcement Learning - An Introduction, MITPress, Cambridge, MA,1998 A Bradford Book [13] "Probability And Statistics For Engineers", Richard A Johnson, Prentice hall of India, 200 I [14] Comparing System-Level Power Management Policies . Yung-Hsiang Lu, Giovanni De Micheli, Stanford University, IEEE, 2001 [15] Y. Lu, E. Chung, T. Simuni'c, L. Benini and G. De Micheli,

"Quantitative Comparison of PM Algorithms", Design, Automation and Test in Europe,pp.20-26,2000.

Documents

[IEEE 2013 1st International Conference on Orange Technologies (ICOT 2013) - Tainan (2013.3.12-2013.3.16)] 2013 1st International Conference on Orange Technologies (ICOT) - Dynamic