Presentation

Context Aware Resource Management in Multi-Inhabitant Smart Homes: A Nash H-learning based Approach

Nirmalya Roy, Abhishek Roy & Sajal K DasPresented by: Viraj Bhat

virajbATcaip*rutgers*edu

ECE-572 - Parallel and Distributed Computing 2

Q-learning Q-learning

Reinforcement Learning (RL) algorithm Does not need a model of its environment and can be

used on-line. Estimates the values of state-action pairs. Q(s,a) = expected discounted sum of future

payoffs obtained by taking action a from state s and following an optimal policy thereafter.

Optimal action from any state is the one with the highest Q-value.


Nash Equilibrium Game theory

selection of the best option when the costs and advantages of each possibility are not pre-determined, but are instead reliant on the future behaviors of other players.

Zero sum game player benefits only at the expense of others (Chess, go, matching

pennies Non-Zero sum game

one player does not necessarily correspond with a loss by another (prisoner's dilemma).

A Nash Equilibrium is a set of mixed strategies for finite, non-cooperative games between two or more players whereby no player can improve his or her payoff by changing their strategy.


Entropy Entropy in information theory describes with how much

randomness (or, alternatively, 'uncertainty') there is in a signal or random event. [Shannon's entropy]

Entropy satisfies the assumptions : The measure should be proportional (continuous)

changing the value of one of the probabilities by a very small amount should only change the entropy by a small amount.

If all the outcomes are equally likely then increasing the samples should always increase the entropy.

Entropy of the final result should be a weighted sum of the entropies.


Smart Home Goal is to provide its inhabitants

maximum possible comfort minimize resource consumption reduce overall cost of maintaining a home

How it does it Autonomously acquire and apply knowledge about its inhabitants (“Context

Awareness”) Infrastructure needs to be cognizant of its context.

Adapt to inhabitants behavior or preferences (Strike a balance) Problem

Mobility of individuals creates uncertainty of location and subsequent activities Optimal location prediction across multiple inhabitants is “NP hard” (Proved in

this paper) Contexts of multiple inhabitants in same environments are inherently co-

related and interdependent.


Contributions Prediction of location across multiple inhabitants is an NP-

hard problem. Develop Nash H-learning

Explores co-relation of mobility patterns across inhabitants minimizes overall joint uncertainty

Predict most likely routes followed by multiple inhabitants Knowledge of inhabitants contexts like location and

associated activities helps control automated devices Nash-H learning performs better than predictive schemes

optimized for individual inhabitants’ location/activity. Collective better than individual


Single Inhabitant Location Tracking Symbolic interpretation of the inhabitants

movement (mobility) profile as captured by RFID readers, sensors and pressure switches

Inhabitants current location is reflection of mobility/activity profile that can be learned over time in a online fashion. Mobility Stochastic process (repetitive

nature of routes) Piecewise stationary LZ-78 (text compression) minimize overall entropy


Multi-Inhabitant Location Prediction For a group of predictions for inhabitants residing in a

smart home consisting of L different locations; the objective is to minimize the number of successful location predictions

Problem of maximizing number of successful predictions of multiple inhabitants’ locations is NP hard Proof : Reduce the problem to Set Packing

Set Packing: Goal is to maximize the number of mutually disjoint subsets from S

Tracking: Each location identified by a sensor occupied by at most inhabitant


Predictive Nash-H learning Background

Assumption: Every agent wants to satisfy his own preferences

Goal: achieve suitable balance among the preferences of all inhabitants residing in a smart home. (Non co-operative game theory) Every n stochastic game possesses at least one Nash equilibrium.

Entropy – H Learning Learn to perform actions that optimize reward.

Learns a value function that maps state-action pairs to future reward using entropy measure, H.

(new experience + old value function) statistically improved value function


Nash-H Algorithm1. Learning agent indexed by i learns about its H- values by

forming an arbitrary guess at time 0. Initially assumed to be zero.

2. At each time t agent i observes current state and takes its action

3. Observes own reward & action by other agents & their rewards & its own state calculates Nash Equilibrium

1. Other agents H-values not given but starting initialized to zero.2. Agent i observes other agents immediate rewards and previous

actions.3. Agent i updates it’s beliefs about j’s H-function according to a

updating rule its applied to it’s own.Learning parameters (0-1)


NHL algorithm


Convergence of NHL Algorithm1. Every state and action for k = 1,… n

are visited infinitely often.

2. Updates of H-function occur only at the current state.

Learning rate satisfies the following conditions:

kk Aa Ss

Proof: Iterative utility functions converge to the Nash Equilibrium with

probability 1 The predictive H-learning framework given by equation to predict

entropies converges to a Nash Equilibrium


Worst case Analysis Ratio of the worst possible Nash Equilibrium and

“Social Optimum” as the measure of the effectiveness of the system. Worst case analysis (coordination ratio) = Worst possible

cost/Optimal Cost Similar to the problem of throwing m balls in m bins and

attempting to find the expected maximum number of balls.

)m lg lg

m lg( actions takingsinhabitant for ratioon coordinati case worst The mm

mmTm lg43most at is actions with sinhabitant ofnumber any of ratioon coordinati The


Inhabitants’ Joint Typical Routes Use concept of jointly-typical joint set and asymptotic

equipartition property (AEP) to derive small subsets of highly probable routes.

System captures typical set of inhabitant movement profiles from the H-learning scheme Uses this to predict the inhabitants most likely routes Measure this using which provides the gap between the

ideal probability of a typical route and the probability the route is stored in a dictionary

Experiments have delta less than equal to 0.01.


Smart Home: Resource and Comfort Management Mobility-Aware Energy Consumption

Devices like lights, fan or air-conditioner operate in a pro-active manner to conserve energy during occupants absence (in particular places).

Smart Temperature Control Distributed Control System

Preconditioning period: Time to get the temperature to a specific level Estimation of Inhabitants Comfort

Subjective measure experienced; difficult to categorize; approximated Joint function (Temperature deviation, No of Manual devices & Time spent)

occupant)first for entry time andoperation device predictivefor (time interval

on is device thefor which time

zones ofnumber

zone in device ofpower

consumedenery average Expected 2

21

thth

1 1

12

t

ij

R

j i ijt

ttt

R

jiQ

Qtt

)1

,1

,)(

1(

MfComfort


Experiments Inhabitants in a smart home equipped with RF tags sensed by

RF reader The sensors placed in a smart home work in coordination

with RF readers. Studied energy management without any predictive scheme

and per-inhabitant location prediction. Simulation experiments for 12 weeks over 4 inhabitants and 2

visitors

Raw data


Predictive Location Estimation

H-leaning approach Entropy of resident is (1-3); Visitor is 4(uncertainity)

Nash H learning Entropy of individual is 4.0 at start As it learns more due to learning entropy reduces to 1.0 Total Entropy is between 10.0 to 4

H - learning

Nash-H learning


Prediction Successes

H learning capable of estimating location of residents with 90% accuracy in 3 weeks Visitors is between 50-60%

Nash-H learning initially slow but has higher success rate of 95% for joint predictions Individual predictions have correlations so only 80% is max possible.

Nash-H learning leads to higher success rate than simple H-learning

H - learning

Nash-H learning


Inhabitants joint typical routes

Size of individual and joint typical set is initially 50% of the total routes Size shrinks to less than 10% as the system captures inhabitants

movements


Storage Overhead

Nash-H scheme has low storage (memory) overhead – 10KB

Compared to total storage for 40KB for existing per-inhabitant prediction scheme.


Energy Savings

Without prediction daily energy consumption is 20KW-Hr

Using predictive schemes; daily average energy consumption kept at 4KW-Hr (after system learns)


Comfort

Nash-H learning scheme reduced manual operations performed and time spent for all inhabitants


Comments and Discussions How does this system scale as the number of

individuals in the system increase. The system uses joint typical sets; which hopefully

characterize the entire set and represent the most probable routes; what if this is not the case.

Results about entropy of visitors (entropy is far worse than individuals) is not presented in Nash-H learning framework (Prediction Success).

To get successful predictions in learning schemes memory required for storage keeps increasing. Figure 8 : cleverly plotted to highlight Nash-H scheme.


References1. N. Roy, A. Roy, S. K. Das, “Context-Aware Resource

Management in Multi-Inhabitant Smart Homes: A Nash H-Learning based Approach”, Pervasive Computing and Communications, Pisa - Italy, 13-17 March 2006

2. “Information Entropy” http://en.wikipedia.org/wiki/Shannon_entropy

3. J. Hu, M. P. Wellman, “Nash Q-Learning for General-Sum Stochastic Games” Journal of Machine Learning Research 4 (2003) 1039-1069

4. “Nash equilibrium”, http://en.wikipedia.org/wiki/Nash_equilibrium

Documents

Presentation