58
Approximate Dynamic Programming Methods for Residential Water Heating A thesis submitted in partial fulfillment for the degree of Master’s of Science in the Department of Electrical Engineering by Matthew Motoki December 3, 2015

Master's Thesis Slides

Embed Size (px)

Citation preview

Page 1: Master's Thesis Slides

Approximate Dynamic Programming Methods forResidential Water Heating

A thesis submitted in partial fulfillment for the degree of Master’s of Sciencein the

Department of Electrical Engineering

byMatthew Motoki

December 3, 2015

Page 2: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 3: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 4: Master's Thesis Slides

Motivation

Why do we need a smarter water heater?

I Energy efficiency is important.

• Electricity is expensive.• Burning fossil fuels is bad for the environment.

I Can we do better than water heaters with an adjustable set-point?

• If so, then are there any provable guarantees that can be made?• Theoretically, what is best that we can do?

I The legacy grid is becoming obsolete.

• Renewable energy sources are variable and distributed.• Energy storage capabilities of water heaters have been fully exploited.

1 / 31

Page 5: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 6: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 7: Master's Thesis Slides

Problem Formulation

State VariableDefine t ∈ 0,∆t, . . . , (N − 1)∆t. Define tk = mod(k ,N)∆t, where k = 0, 1, . . . isthe simulation time stage.

The state x := (T , h) summarizes the information needed to make a decision.

We require T ∈ [Tamb,Tmax ]. The temperature at tk is written Tk .

The hot water usage history is hk := (ti ,wi ) | 0 ≤ i < mod(k ,N), where wk is theintensity of the hot water draw at time tk .

2 / 31

Page 8: Master's Thesis Slides

Problem Formulation

Decision VariableThe decision variable is

uk :=

1, if the water heater is on

0, if the water heater is off.

We assume that the decision uk is constant during the interval [tk , tk+1). A feasibledecision uk ∈ Ωu, is one that does not violate T ∈ [Tamb,Tmax ]. A policy µ is amapping from a state into a feasible decision.

3 / 31

Page 9: Master's Thesis Slides

Problem Formulation

Hot Water Demand (Disturbance Variable) 1

We model hot water demand as a cyclostationary random process W(t) given by

W(t) := specific heat∑τ∈Ωτ

Npeople∑i=1

Nτi∑j=1

F (j)τ,i ·

(T (j)τ,i − Tamb

)· IS(j)τ,i ≤ t < S(j)

τ,i +D(j)τ,i

,

where Ωτ := shower , bath, . . . , dishwasher is the set of possible usage events, Npeople

is the number of people in a household, Nτi is the number of events of type τcorresponding to the i th person in the household, and the following are randomvariables:

S(j)τ,i := the start time of E(j)

τ,i ,

D(j)τ,i := the duration of E(j)

τ,i ,

F (j)τ,i := the flow rate of E(j)

τ,i ,

T (j)τ,i := the desired temperature of E(j)

τ,i .

4 / 31

Page 10: Master's Thesis Slides

Problem Formulation

Hot Water Demand (Disturbance Variable) 2

We can only observe W(t) at pre-specified times t ∈ Ωt , therefore, we approximateW(t) using a piecewise linear interpolation

W(t) :=W(tk) +t − tk

∆t[W(tk + ∆t)−W(tk)].

for all k = 0, 1, . . . and t ∈ [tk , tk + ∆t). The discrete-time analog of W(t) to be theaverage of W(t) over t ∈ [tk , tk + ∆t),

Wk :=1

∆t

∫ tk+∆t

tk

W(t) dt = 12 [W(tk) +W(tk + ∆t)].

We denote particular realizations of W(t) and Wk using w(t) and wk , respectively.We discretize wk ∈ 0,∆w , . . . ,wmax. We write the conditional probability massfunction of Wk given hk as pWk

(wk | hk).

5 / 31

Page 11: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 12: Master's Thesis Slides

Problem Formulation

State Equation

The state equation maps the current state xk , current decision uk , and currentdisturbance wk into the next state xk+1 according to

xk+1 = f (xk , uk ,wk) :=(fT (Tk , uk ,wk), fh(tk , hk ,wk)

),

where

Tk+1 = fT (Tk , uk ,wk) := maxTk − rcool∆t (Tk − Tamb)

+ rheat∆t uk − rloss∆t wk , Tamb

hk+1 = fh(hk ,wk) :=

(tk ,wk) ∪ hk , tk 6= (N − 1)∆t

∅, otherwise,

for all k = 0, 1, . . .

6 / 31

Page 13: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 14: Master's Thesis Slides

Problem Formulation

Objective Function

The objective is to minimize over all policies µ, the following function

Jµ(x0) = limK→∞

EW

[1

K

K−1∑k=0

g(Xk , µ(Xk),Wk ; θ

) ∣∣∣∣∣ x0

],

= limK→∞

1

K

K−1∑k=0

EW0,W1,...,Wk ,

[g(Xk , µ(Xk),Wk ; θ

) ∣∣ x0

].

where X0 = x0 is given and Xk = f(Xk−1, µ(Xk−1),Wk−1

), for all k = 1, 2, . . .

7 / 31

Page 15: Master's Thesis Slides

Problem Formulation

Stage Cost

The stage cost is

g (xk , uk ,wk ; θ) := α gdiscomfort (xk , uk ,wk ;Tmin) + (1− α) goperating (xk , uk) ,

where θ := α,Tmin is a customer-defined parameter set, α ∈ [0, 1] is the relativeweighting of the objectives, and Tmin is the minimum desirable temperature during ahot water use.

Operating Cost

The operating cost is

goperating (uk) :=1

∆t

∫ tk+∆t

tk

C (t) rating uk dt,

where C (t) is the cost of power and rating is the power rating of the water heater.8 / 31

Page 16: Master's Thesis Slides

Problem Formulation

Discomfort CostThe discomfort cost is

gdiscomfort

(xk , uk ,wk ;Tmin

):=

1

∆t

∫ tk+∆t

tk

maxTmin − T (t), 0

· Iw(t) > 0 dt,

where

T (t) := Tk +t − tk

∆t[fT (Tk , uk ,wk)− Tk ],

for all k = 0, 1, . . .

9 / 31

Page 17: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 18: Master's Thesis Slides

Problem Formulation

Problem StatementFind a feasible on/off policy that minimizes an expected objective cost.

minimizeµ

limK→∞

EW

[1

K

K−1∑k=0

g(Xk , µ(Xk),Wk ; θ

) ∣∣∣∣∣ x0

]subject to Xk+1 = f

(Xk , µ(Xk),Wk

), µ(xk) ∈ 0, 1,

Tk ∈ [Tamb,Tmax ], for all k = 0, 1, . . .

This is a discrete-time, average cost periodic Markov decision problem (MDP).

10 / 31

Page 19: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 20: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 21: Master's Thesis Slides

Methodology

Finite-Horizon Dynamic Programming

The goal is to minimize over all policies µ, the following function

Jµ(x0) = EW

[gterminal(XM) +

M−1∑k=0

g(Xk , µk(Xk),Wk ; θ

) ∣∣∣∣∣ x0

],

where M is the horizon and gterminal is a terminal cost function.

The optimal policy µ∗ is the minimizer of Bellman’s equations

J∗(xM) = gterminal(xM),

J∗(xk) = minuk∈0, 1

EWk

[g(xk , uk ,wk ; θ) + J∗

(f (xk , uk ,Wk)

)| xk],

where J∗ is known as the optimal cost-to-go function.

11 / 31

Page 22: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 23: Master's Thesis Slides

Methodology

Average Cost Dynamic Programming for Periodic MDP’s

Relative value iteration (VI) can be used to solve average cost periodic MDP’s.

1. Initialize J and µ arbitrarily and fix a reference state xref .

2. Calculate the new cost-to-go function J ′ by solving an N-horizon MDP usingJ(x0) as the terminal cost function.

3. Update the current cost-to-go function using J(xk)← J ′(xk)− J ′(xref ).

4. Repeat step 2 until convergence is achieved.

The relative value iteration algorithm terminates with J being a differential costfunction—interpreted as the minimum expected N-stage costs relative to the referencestate xref ; furthermore, J(xref ) is interpreted as the average cost of completing a cycle.

12 / 31

Page 24: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 25: Master's Thesis Slides

Methodology

Approximate Dynamic Programming (ADP)

Exact dynamic programming is hard because of the large state-space; in particular, Tk

is continuous and dimension of hk increases at every stage (except the last stage of acycle). Simplify the model to get a more tractable problem.

1. Temperature Aggregation

2. Usage History Aggregation

3. Approximate Transition Probabilities Using Density Estimation

4. Q-Learning

13 / 31

Page 26: Master's Thesis Slides

Methodology

Temperature Aggregation

Discretize temperature T ∈ Tamb,Tamb + ∆T , . . . ,Tamb + (n − 1)∆T.Let A(T ) be the following random function of T

A(T ) :=

sgn(T − T )∆T , w.p. |T − T |/∆T

0, w.p. 1− |T − T |/∆T ,

where T = round(T/∆T )∆T . The aggregate problem has the following modifiedthermodynamics

Tk+1 = fT (Tk , uk ,wk) := round(Tk+1/∆T )∆T +A(Tk+1),

where Tk+1 = fT (Tk , uk ,wk).

14 / 31

Page 27: Master's Thesis Slides

Methodology

Usage History Aggregation

Here the goal is find a low-dimensional feature vector φk , such thatpWk

(wk | hk) ≈ pWk(wk | φk). We are interested in φk with simple update rules

φk+1 = fφ(φk ,wk). For example,

φ(1a)k = Iwk−1 > 0, φ

(1a)k+1 = Iwk > 0,

φ(2a)k =

k−1∑i=iStartUse

Iwi > 0, φ(2a)k+1 = Iwk 6= 0 ·

(2a)k + Iwk > 0

),

φ(3a)k =

k−1∑i=iStartCycle

Iwi > 0, φ(3a)k+1 = Imod(k ,N) = 0 ·

(3a)k + Iwk > 0

).

The aggregate problem uses xk = (Tk , tk ,φk) in place of xk .

15 / 31

Page 28: Master's Thesis Slides

Methodology

Approximate Transition Probabilities Using Density Estimation

• A closed-form expression for pW is hard to find.

• Use kernel density estimation to get an estimate of pWk(wk | hk).

• Estimation of high dimensional pdf’s is difficult, so use usage history aggregationto estimate pWk

(wk | φk) instead.

• Use the estimate pWk(wk | φk) to calculate the transition probabilities

Pr[fT (Tk , uk ,Wk) = Tk+1 | Tk ,φk , uk

]and Pr

[fφ(φk ,Wk) = φk+1 | φk

].

16 / 31

Page 29: Master's Thesis Slides

Methodology

Model-Free Q-Learning

• Model-Free Q-Learning involves learning from trajectories of the form(x0, u0), (x1, u1), . . . (xp, up) where uk = µ(xk).

• Q-factors are updated using the following formula

Q(xk , uk)← (1− γ)Q(xk , uk) + γ

[g(xk , uk ,wk ; θ) + min

vk+1

Q(xk+1, vk+1)

],

where xk+1 = f (xk , uk ,wk) and 0 ≤ γ ≤ 1 is the learning rate.

• The policy is updated using µ(xk)← IQ(xk , 1) < Q(xk , 0).• Model-Free Q-Learning does not require knowledge of the transition probabilities,

but it suffers from the problem of “Exploration v.s. Exploitation”.

• An ε-greedy algorithm can be used to tradeoff between exploration andexploitation.

17 / 31

Page 30: Master's Thesis Slides

Methodology

Model-Based Q-Learning

• Model-Based Q-Learning involves learning from usage trajectories w0,w1, . . . ,wp.

• The model of the system is used to obtain a family of state-decision pairtrajectories corresponding to each usage trajectory.

• The Q-factors are updated using the same formula.

• Model-Based Q-Learning does not require knowledge of the transitionprobabilities and it does not have the problem of “Exploration v.s. Exploitation”.

18 / 31

Page 31: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 32: Master's Thesis Slides

Methodology

Prescient Lower Bound (PLB)

1. Generate/observe a series of usage trajectories.

2. Solve the finite-horizon problem corresponding to these trajectories exactly.

3. The average of the optimal costs is a lower bound for the objective function.

This lower bound represents represents the minimum possible objective cost, given thathot water usage is known.

19 / 31

Page 33: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 34: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 35: Master's Thesis Slides

Results

Numerical Simulations Setup

Figure: Simulate Hot Water Usage Data

20 / 31

Page 36: Master's Thesis Slides

Results

Numerical Simulations Setup

Figure: Hot Water Usage Probability Mass Function

20 / 31

Page 37: Master's Thesis Slides

Results

Numerical Simulations Setup

0 2 4 6 8 10 12 14 16 18 20 22 240.18

0.2

0.22

0.24

0.26

0.28

0.3

Price o

f P

ow

er

($/k

W)

Figure: Time-Varying Price of Power

20 / 31

Page 38: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 39: Master's Thesis Slides

Results

Set-Point MethodsThe policy of a set-point water heater maps (Tk , uk−1) to uk :

µset−point(Tk , uk−1;ϑ) :=

0, if Tk > Tset(tk) + δ(tk)

1, if Tk < Tset(tk)− δ(tk)

uk−1, otherwise

for all k = 0, 1, . . . , where ϑ := Tset , δ.A simple case occurs when δ(tk) ≡ 0:

µsimple

(Tk ;Tset

):= I

Tk < Tset(tk)

,

for all k = 0, 1, . . .Relative VI with state xk = Tk does no worse than simple set-points.

21 / 31

Page 40: Master's Thesis Slides

Results

Simple Set-Point with HECO Pricing

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

1

3

5

7

9

11

13

15

Dis

com

fort

Cos

t (°

C/us

e)

Operating Cost ($/day)

SimpleSet−PointSolution

DynamicProgramming

Solution

PrescientLower Bound

Set−Point (°C)

25 30 35 40 45 50 55

22 / 31

Page 41: Master's Thesis Slides

Results

Simple Set-Point with HECO Pricing

1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.750

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Dis

com

fort

Cos

t (°

C/us

e)

Operating Cost ($/day)

Set−Point (°C)

25 30 35 40 45 50 55

22 / 31

Page 42: Master's Thesis Slides

Results

Simple Set-Point with Constant Pricing

0.1 0.25 0.4 0.55 0.7 0.85 1 1.15 1.3 1.45 1.6

1

3

5

7

9

11

13

15

Dis

com

fort

Cos

t (°

C/us

e)

Operating Cost ($/day)

SimpleSet−PointSolution

DynamicProgramming

Solution

PrecientLower Bound

Set−Point (°C)

25 30 35 40 45 50 55

23 / 31

Page 43: Master's Thesis Slides

Results

Simple Set-Point with Constant Pricing

1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.60

0.4

0.8

1.2

1.6

2

2.4

Dis

com

fort

Cos

t (°

C/us

e)

Operating Cost ($/day)

Set−Point (°C)

25 30 35 40 45 50 55

23 / 31

Page 44: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 45: Master's Thesis Slides

Results

Temperature Aggregation

0.1 0.25 0.4 0.55 0.7 0.85 1 1.15 1.3 1.45 1.6

1

3

5

7

9

11

13

15

Dis

co

mfo

rt C

ost

(°C

/use

)

Operating Cost ($/day)

Hard, 1Hard, 1/3Hard, 1/10Coarse, 1Coarse, 1/3Coarse, 1/10PLB

24 / 31

Page 46: Master's Thesis Slides

Results

Temperature Aggregation

1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.60

0.15

0.3

0.45

0.6

0.75

Dis

com

fort

Cost (

°C

/use)

Operating Cost ($/day)

Hard, 1Hard, 1/3Hard, 1/10Coarse, 1Coarse, 1/3Coarse, 1/10PLB

24 / 31

Page 47: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 48: Master's Thesis Slides

Results

Usage History Aggregation

1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.60

0.15

0.3

0.45

0.6

0.75

Dis

com

fort

Cost (

°C

/use)

Operating Cost ($/day)

∅φ(1a)

φ(2a)

φ(3a)

PLB

25 / 31

Page 49: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 50: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 51: Master's Thesis Slides

Problem Extension

Solar Water Heating

Let Vk be a random variable representing the solar irradiance at time tk . In practice,we will have estimate vk using forecasting methods. Let efficiency(vk) convertirradiance into usable power. The modified temperature equation is

fT (Tk , uk ,wk , vk) = max Tk − rcool∆t(Tk − Tamb) + rheat∆t uk

− rloss∆t wk + rsolar∆t · efficency(vk),Tamb

where rsolar is a conversion factor from power to temperature.

26 / 31

Page 52: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 53: Master's Thesis Slides

Problem Extensions

Demand Response

Compensate customers for reducing/shifting electricity use.

Water Heater

-L

KDE

pW

6pW

DP@@@R

µ

ExpectedLoad

Utility

-C minimizeC

n∑k=1

(aL2

k + bLk + c)

subject to L = f ′(C),

1

n

n∑k=1

C(k) = Cavg ,

Cmin ≤ C(k) ≤ Cmax .

27 / 31

Page 54: Master's Thesis Slides

Problem Extensions - Automated Demand Resonse

Heursitic for Setting Price

Find β1, β2 ≥ 0 such that

C = β1L + β2,1

N

N∑k=1

C(k) = Cavg , Cmin ≤ C(k) ≤ Cmax ,

and β1 is maximal.The closed-form solution is

β∗1 = max

Cmax − Cavg

Lmax − Lavg,Cmin − Cavg

Lmin − Lavg

and β∗2 = Cavg − β∗1Lavg .

The update isC← (1− η)C + η(β∗1L + β∗2).

28 / 31

Page 55: Master's Thesis Slides

Problem Extensions

Automated Demand Resonse Simulation

29 / 31

Page 56: Master's Thesis Slides

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Page 57: Master's Thesis Slides

Conclusion

Summary

• Formulated the problem of minimizing a weighted sum of operating anddiscomfort costs as an average cost MDP.

• Considered approximate DP methods such as aggregation, density estimation, andQ-Learning.

• Approximate DP is at least as good as simple set-points.

• Applications of Water heaters optimized with approximate DP are solar waterheating and automated demand response.

• A longer cycle (e.g., a week or a month) should be considered.

• Non-stationary usage patterns should be considered.

30 / 31

Page 58: Master's Thesis Slides

Thank You

31 / 31