Upload
butest
View
475
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Context Aware Resource Management in Multi-Inhabitant Smart Homes: A Nash H-learning based Approach
Nirmalya Roy, Abhishek Roy & Sajal K DasPresented by: Viraj Bhat
virajbATcaip*rutgers*edu
ECE-572 - Parallel and Distributed Computing 2
Q-learning Q-learning
Reinforcement Learning (RL) algorithm Does not need a model of its environment and can be
used on-line. Estimates the values of state-action pairs. Q(s,a) = expected discounted sum of future
payoffs obtained by taking action a from state s and following an optimal policy thereafter.
Optimal action from any state is the one with the highest Q-value.
ECE-572 - Parallel and Distributed Computing 3
Nash Equilibrium Game theory
selection of the best option when the costs and advantages of each possibility are not pre-determined, but are instead reliant on the future behaviors of other players.
Zero sum game player benefits only at the expense of others (Chess, go, matching
pennies Non-Zero sum game
one player does not necessarily correspond with a loss by another (prisoner's dilemma).
A Nash Equilibrium is a set of mixed strategies for finite, non-cooperative games between two or more players whereby no player can improve his or her payoff by changing their strategy.
ECE-572 - Parallel and Distributed Computing 4
Entropy Entropy in information theory describes with how much
randomness (or, alternatively, 'uncertainty') there is in a signal or random event. [Shannon's entropy]
Entropy satisfies the assumptions : The measure should be proportional (continuous)
changing the value of one of the probabilities by a very small amount should only change the entropy by a small amount.
If all the outcomes are equally likely then increasing the samples should always increase the entropy.
Entropy of the final result should be a weighted sum of the entropies.
ECE-572 - Parallel and Distributed Computing 5
Smart Home Goal is to provide its inhabitants
maximum possible comfort minimize resource consumption reduce overall cost of maintaining a home
How it does it Autonomously acquire and apply knowledge about its inhabitants (“Context
Awareness”) Infrastructure needs to be cognizant of its context.
Adapt to inhabitants behavior or preferences (Strike a balance) Problem
Mobility of individuals creates uncertainty of location and subsequent activities Optimal location prediction across multiple inhabitants is “NP hard” (Proved in
this paper) Contexts of multiple inhabitants in same environments are inherently co-
related and interdependent.
ECE-572 - Parallel and Distributed Computing 6
Contributions Prediction of location across multiple inhabitants is an NP-
hard problem. Develop Nash H-learning
Explores co-relation of mobility patterns across inhabitants minimizes overall joint uncertainty
Predict most likely routes followed by multiple inhabitants Knowledge of inhabitants contexts like location and
associated activities helps control automated devices Nash-H learning performs better than predictive schemes
optimized for individual inhabitants’ location/activity. Collective better than individual
ECE-572 - Parallel and Distributed Computing 7
Single Inhabitant Location Tracking Symbolic interpretation of the inhabitants
movement (mobility) profile as captured by RFID readers, sensors and pressure switches
Inhabitants current location is reflection of mobility/activity profile that can be learned over time in a online fashion. Mobility Stochastic process (repetitive
nature of routes) Piecewise stationary LZ-78 (text compression) minimize overall entropy
ECE-572 - Parallel and Distributed Computing 8
Multi-Inhabitant Location Prediction For a group of predictions for inhabitants residing in a
smart home consisting of L different locations; the objective is to minimize the number of successful location predictions
Problem of maximizing number of successful predictions of multiple inhabitants’ locations is NP hard Proof : Reduce the problem to Set Packing
Set Packing: Goal is to maximize the number of mutually disjoint subsets from S
Tracking: Each location identified by a sensor occupied by at most inhabitant
ECE-572 - Parallel and Distributed Computing 9
Predictive Nash-H learning Background
Assumption: Every agent wants to satisfy his own preferences
Goal: achieve suitable balance among the preferences of all inhabitants residing in a smart home. (Non co-operative game theory) Every n stochastic game possesses at least one Nash equilibrium.
Entropy – H Learning Learn to perform actions that optimize reward.
Learns a value function that maps state-action pairs to future reward using entropy measure, H.
(new experience + old value function) statistically improved value function
ECE-572 - Parallel and Distributed Computing 10
Nash-H Algorithm1. Learning agent indexed by i learns about its H- values by
forming an arbitrary guess at time 0. Initially assumed to be zero.
2. At each time t agent i observes current state and takes its action
3. Observes own reward & action by other agents & their rewards & its own state calculates Nash Equilibrium
1. Other agents H-values not given but starting initialized to zero.2. Agent i observes other agents immediate rewards and previous
actions.3. Agent i updates it’s beliefs about j’s H-function according to a
updating rule its applied to it’s own.Learning parameters (0-1)
ECE-572 - Parallel and Distributed Computing 11
NHL algorithm
ECE-572 - Parallel and Distributed Computing 12
Convergence of NHL Algorithm1. Every state and action for k = 1,… n
are visited infinitely often.
2. Updates of H-function occur only at the current state.
Learning rate satisfies the following conditions:
kk Aa Ss
Proof: Iterative utility functions converge to the Nash Equilibrium with
probability 1 The predictive H-learning framework given by equation to predict
entropies converges to a Nash Equilibrium
ECE-572 - Parallel and Distributed Computing 13
Worst case Analysis Ratio of the worst possible Nash Equilibrium and
“Social Optimum” as the measure of the effectiveness of the system. Worst case analysis (coordination ratio) = Worst possible
cost/Optimal Cost Similar to the problem of throwing m balls in m bins and
attempting to find the expected maximum number of balls.
)m lg lg
m lg( actions takingsinhabitant for ratioon coordinati case worst The mm
mmTm lg43most at is actions with sinhabitant ofnumber any of ratioon coordinati The
ECE-572 - Parallel and Distributed Computing 14
Inhabitants’ Joint Typical Routes Use concept of jointly-typical joint set and asymptotic
equipartition property (AEP) to derive small subsets of highly probable routes.
System captures typical set of inhabitant movement profiles from the H-learning scheme Uses this to predict the inhabitants most likely routes Measure this using which provides the gap between the
ideal probability of a typical route and the probability the route is stored in a dictionary
Experiments have delta less than equal to 0.01.
ECE-572 - Parallel and Distributed Computing 15
Smart Home: Resource and Comfort Management Mobility-Aware Energy Consumption
Devices like lights, fan or air-conditioner operate in a pro-active manner to conserve energy during occupants absence (in particular places).
Smart Temperature Control Distributed Control System
Preconditioning period: Time to get the temperature to a specific level Estimation of Inhabitants Comfort
Subjective measure experienced; difficult to categorize; approximated Joint function (Temperature deviation, No of Manual devices & Time spent)
occupant)first for entry time andoperation device predictivefor (time interval
on is device thefor which time
zones ofnumber
zone in device ofpower
consumedenery average Expected 2
21
thth
1 1
12
t
ij
R
j i ijt
ttt
R
jiQ
Qtt
)1
,1
,)(
1(
MfComfort
ECE-572 - Parallel and Distributed Computing 16
Experiments Inhabitants in a smart home equipped with RF tags sensed by
RF reader The sensors placed in a smart home work in coordination
with RF readers. Studied energy management without any predictive scheme
and per-inhabitant location prediction. Simulation experiments for 12 weeks over 4 inhabitants and 2
visitors
Raw data
ECE-572 - Parallel and Distributed Computing 17
Predictive Location Estimation
H-leaning approach Entropy of resident is (1-3); Visitor is 4(uncertainity)
Nash H learning Entropy of individual is 4.0 at start As it learns more due to learning entropy reduces to 1.0 Total Entropy is between 10.0 to 4
H - learning
Nash-H learning
ECE-572 - Parallel and Distributed Computing 18
Prediction Successes
H learning capable of estimating location of residents with 90% accuracy in 3 weeks Visitors is between 50-60%
Nash-H learning initially slow but has higher success rate of 95% for joint predictions Individual predictions have correlations so only 80% is max possible.
Nash-H learning leads to higher success rate than simple H-learning
H - learning
Nash-H learning
ECE-572 - Parallel and Distributed Computing 19
Inhabitants joint typical routes
Size of individual and joint typical set is initially 50% of the total routes Size shrinks to less than 10% as the system captures inhabitants
movements
ECE-572 - Parallel and Distributed Computing 20
Storage Overhead
Nash-H scheme has low storage (memory) overhead – 10KB
Compared to total storage for 40KB for existing per-inhabitant prediction scheme.
ECE-572 - Parallel and Distributed Computing 21
Energy Savings
Without prediction daily energy consumption is 20KW-Hr
Using predictive schemes; daily average energy consumption kept at 4KW-Hr (after system learns)
ECE-572 - Parallel and Distributed Computing 22
Comfort
Nash-H learning scheme reduced manual operations performed and time spent for all inhabitants
ECE-572 - Parallel and Distributed Computing 23
Comments and Discussions How does this system scale as the number of
individuals in the system increase. The system uses joint typical sets; which hopefully
characterize the entire set and represent the most probable routes; what if this is not the case.
Results about entropy of visitors (entropy is far worse than individuals) is not presented in Nash-H learning framework (Prediction Success).
To get successful predictions in learning schemes memory required for storage keeps increasing. Figure 8 : cleverly plotted to highlight Nash-H scheme.
ECE-572 - Parallel and Distributed Computing 24
References1. N. Roy, A. Roy, S. K. Das, “Context-Aware Resource
Management in Multi-Inhabitant Smart Homes: A Nash H-Learning based Approach”, Pervasive Computing and Communications, Pisa - Italy, 13-17 March 2006
2. “Information Entropy” http://en.wikipedia.org/wiki/Shannon_entropy
3. J. Hu, M. P. Wellman, “Nash Q-Learning for General-Sum Stochastic Games” Journal of Machine Learning Research 4 (2003) 1039-1069
4. “Nash equilibrium”, http://en.wikipedia.org/wiki/Nash_equilibrium