Optimization and machine learning for smart-microgrids · Optimization and machine learning for smart-microgrids Bertrand Cornélusse September 2017 overview

Optimization and machine learning for smart-microgrids

Bertrand Cornélusse September 2017

overview

About the Montefiore Institute

2

Electrical Engineering and Computer Science department of the University

Engineering degrees in electronics,

power systems, computer

science, data science

In the power systems group:

• Challenge power system design and regulation

• New management of security in power systems (Garpur project)

• Global-grid concept

• Micro-grid concept (grid-tied)

A bit about my background

3

PhD: EDF’s generation assets scheduling

Management and design of European Day-Ahead market

algorithm (Euphemia)

I apply optimization and machine learning to power systems

Active management of distribution networks and

hosting capacity computation (GREDOR project coordination)

Microgrids

A (grid-tied) microgrid offers many value creation mechanisms

4

Function Description BSS*

Energy markets Decide on the price your are willing to pay/sell ++

Ancillary services Sell services to the grid ++

Peak reduction Through local and community optimization ++

UPS functionality Operate in islanded mode ++

EfficiencyThrough optimized load and generation management

Community Exchange energy locally at a preferred tariff ++

*BSS: Battery Storage System

A few definitions

Single-user microgrid: loads, generation, storage, a network, a connection to the grid

5

MG 2 Storage

Community microgrid

MG 1

Public grid

Community microgrid: a group of single-user microgrids + a microgrid operator

Ope

rato

r

Advantages for the public grid

6

Function Description BSS*

Peak reduction / flow management

Momentarily set constraints to the microgrid ++

Voltage support Reactive power flexibility of battery storage and PV ++

Phase balancing Using storage DC buffer ++

Power factor correction

Flexibility of inverters ++

Frequency support Primary or secondary reserve ++

*BSS: Battery Storage System

– The Smart(micro)grids Team –

“These advantages can be offered only by a smart microgrid energy management system”

7

A standard energy management system

• Energy monitoring

• Fixed rules for storage operation

8

Astandardmicrogridenergymanagementsystem…

A smart microgrid energy management system …

• exploits data to make the microgrid flexible, robust, and extract the maximum of value!

• has a community management feature

9

Asmartmicrogridenergymanagementsystem!

Functional modules that exploit data

10

Modules of the smart EMS

Monitoring

Analytics

Forecasting

State estimation

Operational planning

Real-time control

Energy Market

participation

Reserve Market

participation

Arrows imply a dependency, not a flow of information !

We also plan to implement a design / sizing tool based on these components

Pull and store data

Present data, decisions and

results

Calibrate models

using data

Forecast consumption

and production using past data

Take decisions for next day

Take decisions for next seconds

Participate actively in

energy markets


reserve markets

Arrows indicate a dependency between functional modules, not a flow of information!

A combination of AI methods

11

Discipline Description

Machine learning Deep neural nets for forecasting

Stochastic optimizationMixed Integer Programming formulations of operational planning problems

Reinforcement learning Autocalibration of operational policies

Model Predictive control For real-time battery management problem

Community management (1/2)

• Each member of the community can decide, at every moment, either to connect to the public grid or to the community microgrid

✦ This is not physical, this is accounting

✦ Members have their own retailer

• Each member only sees data pertaining to his activities, plus information for the community

12

MG 2 Storage

Community microgrid

MG 1

Public grid

Ope

rato

r

Community management (2/2)

• Members see what would be their optimal profit if they were not part of the community (selfish profit)

✦ Community always makes at least the same profit as the sum of selfish profits

✦ Fairness achieved through an “a priori” repartition rule

13

MG 2 Storage

Community microgrid

MG 1

Public grid

Ope

rato

r

Should soon have a paper on the

proposed method

Operational planning• Optimize operation by anticipating on the

evolution of load, generation and prices, taking into account the technical constraints of the microgrid

• Typically with an horizon of one day

• Important to plan the operation of storage systems, and other devices having a highly “time-coupled” behavior such as flexible loads, or steerable generators

• Islanded mode: take preventive decisions to maintain the power to critical loads as long as possible.

14


Monitoring

Analytics

Forecasting

State estimation


Real-time control

Energy Market

participation

Reserve Market

participation



Take decisions for next day

Advanced energy/ancillary services market participation

• Optimal bidding in day-ahead market using anticipated load, generation, and prices.

• Adjust energy exchanges in intra-day market to match changes in load, generation, and prices.

• Exploit balancing opportunities by reacting to TSO’s signals.

• Provide remunerated flexibility margins that the TSO can activate for balancing purposes.

15


Monitoring

Analytics

Forecasting

State estimation


Real-time control

Energy Market

participation

Reserve Market

participation




energy markets


Monitoring

Analytics

Forecasting

State estimation


Real-time control

Energy Market

participation

Reserve Market

participation




reserve markets

We are applying these concepts to a pilot project

16

Hydro-gen

SME 1

SME 2

BSS

SME 3

PV panels

With the support of the Wallon Government, in collaboration with Nethys, CE+T, Sirris, MeryTherm, SPI

Example of application of machine learning

17

Deep Reinforcement Learning Solutions for

Energy Microgrids Management

Vincent François-Lavet [email protected]

David Taralla [email protected]

Damien Ernst [email protected]

Raphael Fonteneau [email protected]

Department of Electrical Engineering and Computer Science, University of Liege, Belgium

Abstract

This paper addresses the problem of efficiently operating the storage devices in an

electricity microgrid featuring photovoltaic (PV) panels with both short- and long-term

storage capacities. The problem of optimally activating the storage devices is formulated

as a sequential decision making problem under uncertainty where, at every time-step, the

uncertainty comes from the lack of knowledge about future electricity consumption and

weather dependent PV production. This paper proposes to address this problem using

deep reinforcement learning. To this purpose, a specific deep learning architecture has

been designed in order to extract knowledge from past consumption and production time

series as well as any available forecasts. The approach is empirically illustrated in the case

of a residential customer located in Belgium.

1. Introduction

An electricity microgrid is an energy system consisting of local electricity generation, localloads (or energy consumption) and storage capacities. In this paper, we consider microgridsthat are provided with different types of storage devices in order to be able to address bothshort- and long-term fluctuations of electricity production using photovoltaic (PV) panels(typically, batteries for short-term fluctuations, and hydrogen/fuel cells for long-term fluctu-ations). Distinguishing short- from long-term storage is mainly a question of cost: batteriesare currently too expensive to be used for addressing seasonal variations. Energy micro-grids face a dual stochastic-deterministic structure: one of the main challenge to meet whenoperating microgrids is to find storage strategies capable of handling uncertainties relatedto future electricity production and consumption; besides this, microgrids also have thecharacteristics that their dynamics deterministically reacts to storage management actions.

In this paper, we propose to design a storage management strategy which exploits thischaracteristic. We assume that we have access to: (i) an accurate simulator of the (deter-ministic) dynamics of a microgrid and (ii) time series describing past load and productionprofiles, which are realizations of some unknown stochastic processes. In this context, wepropose to design a deep Reinforcement Learning (RL) agent (Mnih et al. (2015)) for ap-proximating the optimal strategy through interaction with the environment. The deep RLalgorithm proposed in this paper has been specifically designed to the setting which is origi-nal in the sense that the environment is partly described with a deterministic simulator (fromwhich we can generate as much data as necessary), and partly with a limited batch of real

1

François-Lavet, Vincent, et al. "Deep reinforcement learning solutions for energy microgrids management." European Workshop on Reinforcement Learning. 2016.

Assumptions

• We assume that we have access to:

✦ an accurate simulator of the dynamics of a microgrid

✦ time series describing past load and production profiles, which are realizations of some unknown stochastic processes

18


Architecture of the deep neural net for learning the state-action value function Q(s,a)

19

(i) a base case with minimal information available to the agent:st =

⇥[ct�hc , . . . , ct�1], [�t�hp , . . . ,�t�1], sMG

t

⇤

where hc = 12h and hp = 12h are the lengths of the time series considered as input to theneural network (consumption and production respectively);(ii) the case where information on the season is provided:

st =⇥[ct�hc , . . . , ct�1], [�t�hp , . . . ,�t�1], sMG

t , ⇣s⇤

where ⇣s is the smallest number of days to the solar solstice (21st of June) which is thennormalized into [0,1];(iii) the case where accurate production forecasting is available:

st =⇥[ct�hc , . . . , ct�1], [�t�hp , . . . ,�t�1], sMG

t , ⇣s, ⇢24, ⇢48⇤

where ⇢24 (resp. ⇢48) is the (known) solar production for the next 24 hours (resp. 48 hours).

4.1 Neural network architecture

We propose a Neural Network (NN) architecture where the inputs are provided by the statevector, and where each separate output represents the Q-values for each discretized action.Possible actions a are whether to charge or discharge the hydrogen storage device with theassumption that the batteries handle at best the current demand (avoid any value of lossload whenever possible). We consider three discretized actions : (i) discharge at full ratethe hydrogen storage, (ii) keep it idle or (iii) charge it at full rate.

The NN processes time series thanks to a set of convolutions with 16 filters of 2 ⇥ 1

with stride 1 followed by a convolution with 16 filters of 2⇥ 2 with stride 1. The output ofthe convolutions as well as the other inputs are then followed by two fully connected layerswith 50 and 20 neurons and the output layer. The activation function used is the RectifiedLinear Unit (ReLU) except for the output layer where no activation function is used.

Input #1

Input #2

Input #3

...

Fully con-nected layersConvolutions Outputs

Figure 1: Sketch of the structure of the NN architecture. The NN processes time seriesthanks to a set of convolutional layers. The output of the convolutions as wellas the other inputs are followed by fully connected layers and the ouput layer.Architechtures based on LSTMs instead of convolutions obtain close results andthe reader is welcome to experiment with the source code.

4.2 Splitting times series to avoid overfitting

We consider the case where the agent is provided with two years of actual past realizationsof (ct) and (�t). In order to avoid overfitting, these past realizations are split into a trainingenvironment (y = 1) and a validation environment (y = 2). The training environment isused to train the policy while the validation environment is used at each epoch to estimate

4

Time series of consumption, etc.

Value of decision to store in a given state

of the microgrid

Convolutional layers to extract meaningful features from time series


Example results (long term + short term storage)

20

how well the policy performs on the undiscounted objective My and selects the final NN. Thefinal NN is then used in a test environment (y = 3) to provide an independent estimationon how well the policy performs.

4.3 Training

By starting with a random Q-network, we perform at each time step the update given in Eq. 1and, in the meantime, we fill up a replay memory with all observations, actions and rewardsusing an agent that follows an ✏-greedy policy s.t. the policy ⇡(s) = maxa2AQ(s, a; ✓k)is selected with a probability 1 � ✏, and a random action (with uniform probability overactions) is selected with probability ✏. We use a decreasing value of ✏ over time. During thevalidation and test phases, the policy ⇡(s) = maxa2AQ(s, a; ✓k) is applied (with ✏ = 0). Asdiscussed in François-Lavet et al. (2015), we use an increasing discount factor along with adecreasing learning rate through the learning epochs so as to enhance learning performance.

4.4 Results and discussions

We consider a robust microgrid sizing provided by François-Lavet et al. (2016). The sizeof the battery is xB = 15kWh, the instantaneous power of the hydrogen storage is xH2

=

1.1kW and the peak power generation of the PV installation is xPV= 12kWp. We first run

the base case with minimal information available. The selected policy is based on the bestvalidation score. The typical behaviour of the policy is illustrated in Figure 2 (test data).Since the microgrid has no information about the future, it builds up (during the night)a sufficient reserve in the short-term storage device so as to be able to face the next dayconsumption without suffering too much loss load. It also avoids wasting energy (when theshort term storage is full) by storing in the long-term storage device whenever possible.

(a) Typical policy during summer (b) Typical policy during winter

Figure 2: Computed policy with minimal information available to the agent. H action = 0means discharging the hydrogen reserve at maximum rate; H action = 1 meansdoing nothing with the hydrogen reserve; H action = 2 means building up thehydrogen reserve at maximum rate.

We now investigate the effect of providing additional information to the agent. We reportin Figure 3(a) the operational revenue on the test data M

⇡qy for the three cases as a function

of a unique percentage of the initial sizings xB, xH2 , xPV . For each configuration, we run the

5


Other flavors of learning

• Transfer learning: can we reuse a deep neural network on another microgrid?

• Imitative learning: Learn from “optimal” trajectories => optimize offline, learn, apply learned strategy online

21

We are also developing a microgrid laboratory

22

Microgrid laboratory

Contact

Pr. Bertrand Cornélusse

Smart microgrids – Montefiore Institute – Chaire Nethys

Electrical engineering and computer science department

23

Allée de la découverte 10, B-4000 Liège (Belgium)

+32 (0)477 32 42 75

[email protected]

Documents

Optimization and machine learning for smart-microgrids · Optimization and machine learning for smart-microgrids Bertrand Cornélusse September 2017 overview