Download ppt - Predicting Electricity Distribution Feeder Failures using Machine Learning

Predicting Electricity Distribution Feeder Failures using Machine Learning

Marta Arias 1, Hila Becker 1,2

1Center for Computational Learning Systems2Computer ScienceColumbia University

LEARNING ‘06

Overview of the Talk

Introduction to the Electricity Distribution

Network of New York City What are we doing and why?

Early solution using MartiRank, a boosting-

like algorithm for ranking

Current solution using Online learning

Related projects


Introduction to the Electricity Distribution

Network of New York City What are we doing and why?

Early solution using MartiRank, a boosting-

like algorithm for ranking

Current solution using Online learning

Related projects

The Electrical System1. Generation 2. Transmission

3. PrimaryDistribution

4. Secondary Distribution

Electricity Distribution: Feeders

Problem

Distribution feeder failures result in automatic feeder shutdown called “Open Autos” or O/As

O/As stress networks, control centers, and field crews

O/As are expensive ($ millions annually) Proactive replacement is much cheaper and

safer than reactive repair

Our Solution: Machine Learning Leverage Con Edison’s domain knowledge and

resources Learn to rank feeders based on susceptibility to

failure How?

Assemble data Train model based on past data Re-rank frequently using model on current

data

New York City

Some facts about feeders and failures

About 950 feeders: 568 in Manhattan 164 in Brooklyn 115 in Queens 94 in the Bronx

Some facts about feeders and failures About 60% of feeders failed at least once On average, feeders failed 4.4 times (between June 2005 and August 2006)

Some facts about feeders and failures mostly 0-5 failures

per day more in the summer strong seasonality

effects

Feeder data Static data

Compositional/structural Electrical

Dynamic data Outage history (updated daily) Load measurements (updated every 5 minutes)

Roughly 200 attributes for each feeder New ones are still being added.

Feeder Ranking Application

Goal: rank feeders according to likelihood to failure (if high risk place near the top)

Application needs to integrate all types of data

Application needs to react and adapt to incoming dynamic data Hence, update feeder ranking every 15 min.

Application Structure

Static data

SQLServer

DB

MLEngine

MLModels Rankings

DecisionSupport GUI

ActionDriver

ActionTracker

Decision Support App

Outage data

Xfmr Stress data

Feeder Load data

Goal: rank feeders according to likelihood to failure


Introduction to the Electricity Distribution Network of New York City What are we doing and why?

Early solution using MartiRank, a boosting-like algorithm for ranking Pseudo ROC and pseudo AUC MartiRank Performance metric Early results

Current solution using Online learning Related projects

(pseudo) ROC

sorte

d b

y sco

re

0

0

0

1

2

1

3

outagesfeeders

(pseudo) ROC

Number of feeders

Number of

outages

941

210

Fractionof

outages

(pseudo) ROC

1

1

Area under

the ROC curve

Fraction of feeders

Some observations about the (p)ROC Adapted to positive labels (not just 0/1) Best pAUC is not always 1 (actually it almost never is..)

E.g.: pAUC = 11/15 = 0.73 “Best” pAUC with this data is 14/15 = 0.93 corresponding to ranking 21000

1 1

2 0

3 2

4 0

5 0

ranking outages

1 2 3 4 5

3

2

1

MartiRank

Boosting-like algorithm by [Long & Servedio, 2005]

Greedy, maximizes pAUC at each round Adapted to ranking Weak learners are sorting rules

Each attribute is a sorting rule Attributes are numerical only

If categorical, then convert to indicator vector of 0/1

MartiRankfeeder list begins in random order

sort list by “best” variable

divide list in two: split outages evenly

divide list in three: split outages evenly

chooseseparate “best”variables foreach part, sort

chooseseparate “best”variables foreach part, sort

continue…

MartiRank

Advantages: Fast, easy to implement Interpretable Only 1 tuning parameter “nr of rounds”

Disadvantages: 1 tuning parameter “nr of rounds”

Was set to 4 manually..

Using MartiRank for real-time ranking of feeders MartiRank is a “batch” algorithm, hence must deal with

changing system by: Continually generate new datasets with latest data

Use data within a window, aggregate dynamic data within that period in various ways (quantiles, counts, sums, averages, etc.)

Re-train new model, throw out old model Seasonality effects not taken into account

Use newest model to generate ranking Must implement “training strategies”

Re-train daily, or weekly, or every 2 weeks, or monthly, or…

Performance Metric

feedersfailures

failureranki

i

##

)(1

∗−

∑

Normalized average rank of failed feeders Closely related to (pseudo) Area-Under-ROC-Curve

when labels are 0/1: avgRank = pAUC + 1 / #examples

Essentially, difference comes from 0-based pAUC to 1-based ranks

Performance Metric Example

feedersfailures

failureranki

i

##

)(1

∗−

∑

5833.08*3

5321 =

++−

1 0

2 1

3 1

4 0

5 1

6 0

7 0

8 0

ranking outages

3

2

1

1 2 3 4 5 6 7 8 pAUC=17/24=0.7

How to measure performance over time Every ~15 minutes, generate new ranking

based on current model and latest data Whenever there is a failure, look up its rank

in the latest ranking before the failure After a whole day, compute normalized

average rank

MartiRank Comparison: training every 2 weeks

Using MartiRank for real-time ranking of feeders MartiRank seems to work well, but..

User decides when to re-train User decides how much data to use for re-training …. and other things like setting parameters, selecting

algorithms, etc. Want to make system 100% automatic!

Idea: Still use MartiRank since it works well with this data, but

keep/re-use all models


Introduction to the Electricity Distribution Network of New York City What are we doing and why?

Early solution using MartiRank, a boosting-like algorithm for ranking

Current solution using Online learning Overview of learning from expert advice and the Weighted

Majority Algorithm New challenges in our setting and our solution Results

Related projects

Learning from expert advice

Consider each model as an expert Each expert has associated weight (or score)

Reward/penalize experts with good/bad predictions

Weight is a measure of confidence in expert’s prediction

Predict using weighted average of top-scoring experts

Learning from expert advice

Advantages Fully automatic

No human intervention needed Adaptive

Changes in system are learned as it runs Can use many types of underlying learning algorithms Good performance guarantees from learning theory:

performance never too far off from best expert in hindsight Disadvantages

Computational cost: need to track many models “in parallel” Models are harder to interpret

Weighted Majority Algorithm [Littlestone & Warmuth ‘88] Introduced for binary classification

Experts make predictions in [0,1] Obtain losses in [0,1]

Pseudocode: Learning rate as main parameter, ß in (0,1] There are N “experts”, initially weight is 1 for all For t=1,2,3, …

Predict using weighted average of each experts’ prediction Obtain “true” label; each expert incurs loss li Update experts’ weights using wi,t+1 = wi,t • pow(ß,li)

In our case, can’t use WM directly Use ranking as opposed to binary

classification More importantly, do not have a fixed set of

experts

Dealing with ranking vs. binary classification Ranking loss as normalized average rank of

failures as seen before, loss in [0,1] To combine rankings, use a weighted

average of feeders’ ranks

Dealing with a moving set of experts Introduce new parameters

B: “budget” (max number of models) set to 100 p: new models weight percentile in [0,100] : age penalty in (0,1]

When training new models, add to set of models with weight corresponding to pth percentile (among current weights)

If too many models (more than B), drop models with poor q-score, where qi = wi • pow(, agei)

I.e., is rate of exponential decay

Other parameters

How often do we train and add new models? Hand-tuned over the course of the summer

Every 7 days Seems to achieve balance of generating new models to adapt to

changing conditions without overflowing system

Alternatively, one could train when observed performance drops .. not used yet

How much data do we use to train models? Based on observed performance and early experiments

1 week worth of data, and 2 weeks worth of data

Performance

Failures’ rank distribution

Daily average rank of failures

Other things that I have not talked about but took a significant amount of time DATA

Data is spread over many repositories. Difficult to identify useful data Difficult to arrange access to data

Volume of data. Gigabytes of data accumulated on a daily basis. Required optimized database layout and the addition of a

preprocessing stage

Had to gain understanding of data semantics

Software Engineering (this is a deployed application)

Current Status

Summer 2006: System has has been debugged, fine-tuned, tested and deployed

Now fully operational Ready to be used next summer (in test mode)

After this summer, we’re going to do systematic studies of Parameter sensitivity Comparisons to other approaches

Related work-in-progress Online learning:

Fancier weight updates with better guaranteed performance in “changing environments”

Explore “direct” online ranking strategies (e.g. the ranking perceptron) Datamining project:

Aims to exploit seasonality Learn “mapping” from environmental conditions to good performing

experts’ characteristics When same conditions arise in the future, increase weights of experts that

have those characteristics Hope to learn it as system runs, continually updating mappings

MartiRank: In presence of repeated/missing values, sorting is non-deterministic and

pAUC takes different values depending on permutation of data Use statistics of the pAUC to improve basic learning algorithm

Instead of input nr of rounds, stop when AUC increase is not significant Use better estimators of pAUC that are not sensitive to permutations of the

data

Other related projects within collaboration with Con Edison

Finer-grained component analysis Ranking of transformers Ranking of cable sections Ranking of cable joints Merging of all systems into one

Mixing ML and Survival Analysis

Acknowledgments

Con Edison: Matthew Koenig Mark Mastrocinque William Fairechio John A. Johnson Serena Lee Charles Lawson Frank Doherty Arthur Kressner Matt Sniffen Elie Chebli George Murray Bill McGarrigle Van Nest team

Columbia: CCLS:

Wei Chu Martin Jansche Ansaf Salleb Albert Boulanger David Waltz Philip M. Long (now at Google) Roger Anderson

Computer Science: Philip Gross Rocco Servedio Gail Kaiser Samit Jain John Ioannidis Sergey Sigelman Luis Alonso Joey Fortuna Chris Murphy

Stats: Samantha Cook