74
Survival Analysis using Deep Learning for Predictive Aging Models of Batteries in Electric Vehicles Ganesh Chaitanya Kudaka September 17, 2019 Internal Supervisor: Thomas Hellström External Supervisor: Christian Fleicher, Volvo Cars, Gothenburg Examiner: Ola Ringdahl Master Thesis (5DV206), 30 ECTS Master of Science in Computer Science (Specialization in Robotics and Control), 120 ECTS

Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

Survival Analysis using Deep Learningfor Predictive Aging Models of Batteriesin Electric VehiclesGanesh Chaitanya Kudaka

September 17, 2019

Internal Supervisor: Thomas HellströmExternal Supervisor: Christian Fleicher, Volvo Cars, GothenburgExaminer: Ola RingdahlMaster Thesis (5DV206), 30 ECTSMaster of Science in Computer Science (Specialization in Robotics and Control), 120 ECTS

Page 2: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of
Page 3: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

I would like to dedicate this thesis to my loving parents

Page 4: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of
Page 5: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

Acknowledgements

Many people have supported me throughout my Master thesis and I would like to acknowl-edge the most significant ones here.

First, I thank Volvo Cars and Umeå University for approving the collaboration in orderto carry out the industrial research thesis. I would like to express my sincere gratitudeto Christian Fleischer, who has provided me with this opportunity and been an excellentsupervisor who always took time to provide the required infrastructure to accelerate researchat Volvo Cars. Many thanks to Fabio Delgado who cooperated with me in this thesis throughinsightful discussions. Thanks for the prompt feedback relating to my work and knowledgetransfer from your previous research on Battery Data Analytics. The thesis couldn’t have beendone as smoothly without your involvement. Working with you has been a very intellectuallystimulating experience.

I would additionally like to thank Ashok Koppisetty and Herman Johnsson from the BigData team who were instrumental in providing me with the required datasets and explainingthe structural arrangement which helped in creating a custom data parser for my softwaretoolchain. Your advice helped me kickstart my research and eventually choose the rightproblems to tackle.

I would also like to thank Johnny Ngu from Analysis and Validation whose input helpedme in understanding feature engineering while working with battery data. The BatteryTraction team at Volvo Cars has been indispensable in providing a conducive environmentfor research and in helping me get acquainted with the battery related terminology. I willforever be indebted to everyone of you.

I am deeply grateful to Georgios Tsiatsios, my batchmate during this masters program.With his tenacity and sangfroid, we both could accomplish incredible milestones enduringthe most trying times. I am extremely fortunate to have gotten an excellent friend and anincredible teammate.

Lastly, I am grateful for the consistent support of my wonderful family. The strongsense of integrity and conscientiousness I imbibed from my father have been vital in pushingme towards the 20% of the Pareto distribution (when α = log45) in all my professionalendeavors.

Page 6: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of
Page 7: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

Abstract

With the growing EV market, predictive maintenance of batteries is one of the key challengesfaced by the car industry. Since this technology is still nascent, most of the battery dataobtained is through simulations which may not give an accurate estimate of the batterybehavior in the real world. The goal of this thesis is to predict the probability of occurrenceof an event in the future when we have incomplete information about the life cycle of thebattery parameters from the entire fleet of cars in operation in real world conditions. Weachieve this through Survival Analysis. This technique has proven to be a reliable solutionfor time to failure analysis. In this thesis we estimate the level of degradation of a PHEV carbattery through time based on the data collected till now. Through our software toolchainwe have seen well defined estimates till 4 months into the future through statistical analysis.The toolchain parameters are flexible to adapt it on any kind of raw battery data. It hasbeen demonstrated that some of these methods can also be substituted with neural networksas a proof of concept. The results of these deep learning algorithms have a large scopeof improvement as we collect more data for the entire fleet in the future. Some criticalobservations and future research strategies have been discussed at the end.

Page 8: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of
Page 9: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

Table of contents

List of figures xi

List of tables xiii

1 Introduction 11.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Related Research 52.1 Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Survival Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Theory 113.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Data Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2.1 Principle Component Analysis . . . . . . . . . . . . . . . . . . . . 133.2.2 K-means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 143.2.3 Variance Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2.4 Correlation Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.3 Survival Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3.1 Censorship and Truncation . . . . . . . . . . . . . . . . . . . . . . 173.3.2 Kaplan Meier Estimate . . . . . . . . . . . . . . . . . . . . . . . . 183.3.3 Nelson Aalen Estimator . . . . . . . . . . . . . . . . . . . . . . . 193.3.4 Cox Proportional Hazard Function . . . . . . . . . . . . . . . . . . 20

3.4 Survival Analysis Neural Networks . . . . . . . . . . . . . . . . . . . . . . 213.4.1 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . 213.4.2 Weibull Time to Event Analysis . . . . . . . . . . . . . . . . . . . 22

Page 10: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

x Table of contents

3.4.3 Cox Proportional Hazard Function . . . . . . . . . . . . . . . . . . 25

4 Method 294.1 Modelling Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2 Final pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.1 Data Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.2.2 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2.3 Neural Network Analysis . . . . . . . . . . . . . . . . . . . . . . . 42

5 Results 555.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

References 57

Page 11: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

List of figures

1.1 Overall Prediction Architecture . . . . . . . . . . . . . . . . . . . . . . . . 3

3.1 MDF File Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 RNN Architecture[9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3 Weibull Probability Density Function . . . . . . . . . . . . . . . . . . . . 24

4.1 Battery Modelling Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2 Battery Modelling Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 304.3 Parallel Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.4 Duration(sec) per file for the Capacity Database . . . . . . . . . . . . . . . 334.5 Duration(sec) per file for the SOH Database . . . . . . . . . . . . . . . . . 344.6 PCA Analysis Results for Minimum Cell Capacity . . . . . . . . . . . . . 364.7 PCA Analysis Results for SOH . . . . . . . . . . . . . . . . . . . . . . . . 364.8 Capacity Correlation Heatmap . . . . . . . . . . . . . . . . . . . . . . . . 374.9 SOH Correlation Heatmap . . . . . . . . . . . . . . . . . . . . . . . . . . 384.10 Battery Capacity Degradation Profile . . . . . . . . . . . . . . . . . . . . . 394.11 Battery Capacity Degradation Prediction . . . . . . . . . . . . . . . . . . . 404.12 Battery SOH Degradation Profile . . . . . . . . . . . . . . . . . . . . . . . 424.13 Battery SOH Degradation Prediction . . . . . . . . . . . . . . . . . . . . . 434.14 Battery Capacity Cumulative Hazard Rate . . . . . . . . . . . . . . . . . . 444.15 Battery Capacity Hazard Rate Prediction . . . . . . . . . . . . . . . . . . . 454.16 Battery SOH Cumulative Hazard Rate . . . . . . . . . . . . . . . . . . . . 464.17 Battery SOH Hazard Rate Prediction . . . . . . . . . . . . . . . . . . . . . 474.18 Cox Survival Function (Capacity) . . . . . . . . . . . . . . . . . . . . . . 474.19 Cox Hazard Function (Capacity) . . . . . . . . . . . . . . . . . . . . . . . 484.20 Cox Survival Function (SOH) . . . . . . . . . . . . . . . . . . . . . . . . 484.21 Cox Hazard Function (SOH) . . . . . . . . . . . . . . . . . . . . . . . . . 494.22 Weibull Time-to-event Neural Network Structure . . . . . . . . . . . . . . 50

Page 12: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

xii List of figures

4.23 Discrete log-likelihood for Weibull hazard function . . . . . . . . . . . . . 514.24 Weibull Activation function . . . . . . . . . . . . . . . . . . . . . . . . . . 514.25 Average Weibull Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 524.26 Weibull Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.27 Weibull Distribution training Trend . . . . . . . . . . . . . . . . . . . . . . 534.28 Cox Proportional Hazards Neural Network . . . . . . . . . . . . . . . . . . 53

Page 13: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

List of tables

4.1 Summary Statistics of Clean Data . . . . . . . . . . . . . . . . . . . . . . 384.2 Global Battery degradation Kaplan Meier Estimate . . . . . . . . . . . . . 414.3 Global Battery Hazard Rate Estimate . . . . . . . . . . . . . . . . . . . . . 464.4 Weibull predicted parameters for each car . . . . . . . . . . . . . . . . . . 49

Page 14: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of
Page 15: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

Chapter 1

Introduction

Traditionally, we have used fossil fuels as the main source of fuel for automobiles. Thismakes vehicles a huge contributor to the carbon footprint which has a detrimental impact onthe environment. Many companies nowadays are taking a more eco-friendly approach byelectrifying new automobiles and also the existing fleet in operation to a large extent. Themain source of portable power for these automobiles will be batteries. Therefore, it is ofprimal importance to research on batteries in order to commercialize them to understandtheir viability in the field of transportation and vehicle propulsion.

Even in the existing chemical combinations of batteries, Lithium-ion batteries haveproved to be the best choice for Electric Vehicles(EVs). The reasons for this choice beingfast charge capability, high power density and energy efficiency, wide operating temperaturerange, low self-discharge rate, light weight, and small size [44]. However, the battery wearsout due to many factors such as temperature, depth of discharge and charge/discharge rates[39], driving style, route terrain etc.

The report is systematically divided into chapters. The first chapter is a basic introductionof the concept. I explain the problem statement in depth and its significance in the fieldof vehicular transportation. This section also contains a brief description of the raw dataprovided and the research questions which arise while processing the data. The secondchapter contains the explanation of a general battery management system along with thetechnical knowledge derived from various previous data driven studies which lead to theanswers of the research questions. The third chapter contains the history and the structureof the raw data provided for this experiment. It also includes a comprehensive explanationof the derived knowledge which was useful in making a software pipeline to process therecorded battery data. The fourth chapter contains the explanation of the exact order ofimplementation of various algorithms in order to illustrate the sequence in which the data

Page 16: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

2 Introduction

can be processed to obtain insightful observations. The final chapter contains the results andfuture research directions of this experiment.

1.1 Problem Statement

This research study focuses primarily on the car battery aging pattern. The relevant datahas been collected from 32 Volvo cars situated at various places in Gothenburg. The mainmotivation behind this study is to predict the behaviour of batteries when subjected to variousreal environmental conditions. This helps us plan the maintenance schedule of the batteriesahead of severe degradation through the prediction of Minimum Cell Capacity i.e. theirreversible capacity degradation also known as battery aging. The data are a collection ofsignals coming from sensors integrated into the Battery Management System (BMS). Thisdata has been collected over several months from Plug-in Hybrid Electric Vehicles (PHEV)and the purpose of this work is to observe if a deep learning model can be taken to findcomplex relationships within the data that can estimate how the battery ages. A deep learningmodel is a type of machine learning algorithm that uses a neural network with many layersin order to capture patterns.[46] The data provided contains hundreds of different time seriesbattery behavior parameters (also referred to as features or signals) with each signal havingtheir individual time steps. The thesis aims at demonstrating how we can use neural networksto capture the battery degradation pattern.

1.2 Goals

With the current technology available, it is difficult to get an accurate estimate of theRemaining Useful Lifetime (RUL) of battery. Car manufacturers get battery lab test datafrom the suppliers. The lab test data usually consists of the battery tests in controlledenvironments and fixed driving styles[1]. This only gives an approximate analysis of the carbattery degradation pattern as the tests are conducted in an artificial simulation. However, weaim to capture driving pattern in real world scenarios. The real world scenarios will capturemuch more information like more patterns of driving behaviors all over the world accordingto different city plans and several climatic conditions impacting the battery in various ways.The data collected from all over the world can be further used for analysis and predictions.Traditionally estimations of battery health parameters like the State of Health (SOH) andMinimum Cell Capacity were done with Kalmann Filters[19][23][24]. This algorithm isa tried and tested method and has proven to give very accurate estimations for ElectricalVehicles(EVs). However, we aim to predict the future behaviour based on the available

Page 17: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

1.3 Research questions 3

Fig. 1.1 Overall Prediction Architecture

data recorded in the past. To make this possible, we intend to use Statistical Analysis tosee the feasibility of predictions and then convert them to Neural Networks by capturingall the parameters which affect the battery behavior significantly. The thesis demonstratesthe potential of Neural Networks as the next generation solution to simulate the BatteryManagement System(BMS) inside a car according to the different environments it operatesin.

The architecture of this system as illustrated in Figure 1.1 is defined in a way where thecars operating in various places act as individual agents and these agents shall send theirrespective timeseries information to a cloud processing unit. The communication betweenthe global and the local model is expected to increase the accuracy of predictions overtime.This thesis will focus on only the data processing at the global level.

1.3 Research questions

The thesis aims to explore the following concepts about battery behaviour predictions

1. How is it possible to predict an event in the future when a neural network hasn’t beentrained to recognize the whole pattern yet?

2. What is the data pipeline to be used to reduce the amount of data without losingimportant variables?

3. Which signals inside the dataset produce a significant variance in battery behavior?

4. What kind of Statistical Analysis can be used to predict future battery behavior?

5. How can Statistical methods be converted into Neural Networks? What kind of lossfunctions can be used?

Page 18: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of
Page 19: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

Chapter 2

Related Research

2.1 Predictive Analytics

Predictive Analytics is a branch of science where statistical and analytical calculationsare used to estimate future events or behaviours[35]. These are mainly done through thedevelopment of complex models which is dependent on the event or behaviour to be predicted.The performance of these models are then compared using various evaluation strategies. Datacollection and processing is a preliminary step which heavily determines the performanceof the predictive model. The data must be analyzed to detect patterns, noise, anomalies etc.This process helps in filtering the best possible representative data subset for the final modelevaluation. Mostly, this type of analysis is given a huge amount of input in the form of featuresor signals which are collected over time. These features are processed in order to predicta single output variable by often using heavy statistical models. Such models are typicallyused in actuarial science[7], marketing[18], financial services[28], telecommunications[43],travel[48], healthcare[42], social networking[15] and etc.

This type of analysis is often times very computationally heavy[11]. This is mainlybecause of either the complexity of the model, the amount of data given to process or both. Inour case, we are given a fixed amount of resource to process a huge amount of data. Hence,in the subsequent sections, we explore the various techniques which have been previouslyresearched to create models in order to achieve reliable predictions.

2.1.1 Neural Networks

In the paper written by Hicham Chaoui [5], they have used a neural network with delayedinputs in order to predict the State of Charge (SOC) and State of Health (SOH) values ofLi-ion batteries. The benefit of this method lies in the fact that the network doesn’t need

Page 20: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

6 Related Research

any information regarding the battery model or parameters. They have used the previouslyrecorded parameters like ambient temperature, voltage, current etc as inputs. Anotheradvantage of this method is that it compensates for the non-linear behaviour of the batteriessuch as hysteresis and reduction in performance due to aging. From the results we learn thatthis method provides a high accuracy and robustness with a simple algorithm to evaluate aLiFePO4 battery.

Conventionally batteries were modelled by defining an empirical model dependent on theequivalent electric circuit, which can be made through actual measurements or estimation ofbattery parameters using extrapolation. The problem with these models is that such modelscan only describe the battery characteristics till before the battery is charged again. In aresearch paper written by Massimo[36], the inter-cycle battery effects are automatically takeninto effect and implemented into the generic model used as a reference template. This methodwas used to model a commercial lithium iron phosphate battery to check the flexibility andaccuracy of the described strategy.

In the paper written by Ecker[10], a multivariate analysis of the battery parametershas been performed which was subjected to accelerated aging procedures. Impedance andcapacity were used as the output variables. A change in these parameters was observed whenthe battery was subjected to different conditions of State of Charge and temperature. Thislead to the development of battery functions which model the aging process bolstered by thephysical measurements on the battery. These functions lead to the development of a generalbattery aging model. These models were subjected to different drive cycles and managementstrategies which gave a better understanding of their impact on battery life.

The study done by Tina[41] highlights the use of several layers of Recurrent NeuralNetworks (RNNs) to take into account the non-linear behaviour of photovoltaic batterieswhile modelling. In this manner the trend of change in the input and the output parametersof the battery are taken into account while charge and discharge processes. This has provedto be a powerful tool inspite of the heavy computations required to run the neural networkarchitecture. In this model, the electric current supplied to the battery is the only effectiveexternal input parameter because it is dependent on the user application. The voltage andSOC are taken as the output variables.

In the study done by Chemali[6], the LSTM cells were used for a Recurrent NeuralNetwork (RNN) to estimate the State of Charge (SOC). The significance of the researchpresented in this paper lies in the fact that they used a standalone neural network to takeinto account all the dependencies without using any prior knowledge of the battery modelor employing any filters for estimation like the Kalman filter. The network is capable ofgeneralizing all kinds of feature changes it is trained on using datasets containing different

Page 21: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

2.1 Predictions 7

battery behaviour subjected to different initial environment conditions. One such featuretaken into account while recording datasets is ambient temperature. The model was able topredict a very accurate SOC with variation in external temperature in this case.

In the research conducted by Saha[37], a bayesian network was used to estimate theremaining useful life of a battery where the internal state is difficult to measure undernormal operating conditions. Therefore, these states need to be estimated using a Bayesianstatistical approach based on the previous data recorded, indirect measurements and differentoperational conditions. Concepts from the electrochemical processes like equivalent electriccircuit and statistics like state transition models were combined in a systematic framework tomodel the changes in aging process and predict the Remaining Useful Life (RUL). Particlefilters and Relevance Vector Machines (RVMs) were used to provide uncertainty bounds forthese models.

In the study conducted by You[47], the State Of Health(SOH) of the battery has beenestimated to determine the schedule for the maintenance and replacement time period ofthe battery or to estimate the driving mileage. Most of the previous research is done in aconstrained environment where a full charge cycle with constant current has been a commonassumption. However, these assumptions cannot be used for EVs as the charge cycles aremostly partial and dynamic. The objective of this study was to come up with a robust modelto estimate SOH where batteries are subjected to real world environment conditions. Thismethod has demonstrated a technique to make use of historic data collected on current,voltage and temperature in order to estimate the State of Health(SOH) of the battery inreal-time with high accuracy.

In case of using memory based deep learning techniques, the time correlation of thesignals for each car to predict the corresponding output sequence has to be taken intoconsideration.[46] This makes data input trivial when applying a deep learning model sinceduring data treatment, we introduce a time delay to train the neural network on previousinputs.

This is exactly what a recurrent deep learning model does. A recurrent deep learningmodel remembers previous inputs and can have the ability to find complex relationships indata that are time dependent. An efficient type of recurrent deep learning model is calledLong Short Term Memory (LSTM) and will be the recurrent deep learning model that willbe tested on the data.

2.1.2 Survival Analysis

Survival Analysis has been employed for a variety of research projects most of which havedegradation and lifetime analysis. This is generally known as reliability theory in the field

Page 22: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

8 Related Research

of engineering, duration analysis in the field of economics and event history analysis insociology. In this section, we explore a few research papers with practical application of thistechnique for real-life prediction challenges in the absence of data.

In the paper published by Pine[13], the medicine tamoxifen is studied for its therapeuticeffects on patients suffering from breast cancer. In this random trial, it was noticed thatthe tumour can reappear or the patient may die due to natural causes. This study usessurvival analysis to make a semi-parametric transformation model to calculate the potency ofcompeting risks which is dependent on the covariates. Generally independent right censoringis used to model survival data. However, this research is an extension to the conventionalapproach. Essentially regression is used in order to perform curve fitting and the estimationof the coefficients is achieved with a rank based least squares criterion. The ideal baselineprobability was estimated using a separate function. With this kind of study, it was concludedthat it may help to target tamoxifen at women who may benefit at a reasonable expense andwith minimal complications. This derived knowledge can be beneficial for the patient to givean estimate of the reduction in the uncertainty of the course of breast cancer.

Christiana[27] has summarized and illustrated the various techniques used in the fieldof survival analysis. These include the treatment of the dataset while we have to makepredictions on the basis of incomplete data and the estimation algorithms themselves. Thedataset used to visualize these algorithms was from a clinical trial on colon cancer adjuvanttherapy[32]. This is an inbuilt dataset in the statistical software R. This contains 929 coloncancer patients. In this paper, non-parametric models like the Kaplan Meier Estimate andparametric regression models like the Cox proportion Hazards models, Weibull distributionand Log-logistic distribution have been demonstrated with examples. It has been statedthat this kind of analysis can be used to make frailty models, predict recurrent events, riskprediction, individual matching etc.

In the study conducted by Lane[45], the Cox proportional Hazards model was used topredict the failure of banks. This has been extensively used in biomedical research, however,in this paper, we see its potential to be applied in the financial sector as well. It has beenstated in the paper that the best advantage of this model over other classification techniquesis that it provides the expected time to failure which is of huge importance to empiricists inthe field of finance and economics. This non-parametric approach shows the relationshipbetween the input features and the output survival time. In the financial sector, this may beused as an early warning system as this method provides the exact time estimate rather thanthe probability of occurrence of an event over a large time slice.

A research study was conducted by Ihwah[20] in the field of customer churn analysis. Inthis study, the factors analyzing consumers decision to buy a product was analyzed through

Page 23: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

2.1 Predictions 9

a Time-to-event approach. This method is essentially a survival or failure model. In orderto examine the relationship between the input parameters and the outcome of the customer,the semi-parametric Cox proportional hazards model was used. Another advantage of thismethod was that it provided a hazard ration between two different individuals with differentcovariates. The consumer purchasing decisions were studied where the response was thetime taken by the consumer to buy a product, the input variables used were level of education,occupation and monthly income of the consumer. The program developed in this experimentwas used to estimate the regression coefficients of the model. From the hazard ratio resultsin 3 separate cases, it was concluded that, the time it takes junior high school consumer todecide to buy a product is shorter than the elementary school consumer. The time it takescivil servant consumers to decide to buy a product shorter than non-civil servant consumers.The time required for consumers earning Indonesian Rupiah (IDR) 3,000,000 to decide tobuy a product is shorter than consumers earning an income of IDR 1,000,000.

Weibull distributions have been explored by Mudholkar and Srivastava[33]. Thesedistributions were used previously to perform lifetime degradation analysis where the hazardfunction was monotonic in nature. It means, that the function was either continuouslyincreasing to signify growing probability of failure in case of wear out or continuouslydecreasing to signify a falling probability of failure in case of burn-in. Burn in can be definedas a phase of rigorous testing for machine components before being assembled into the finalproduct to be certified for satisfactory quality. These monotonic functions do not seem toaccommodate the hazard rates of most real life cases. Therefore, this paper presents a generalWeibull distribution equation called the exponentiated Weibull family. This general equationcan be adjusted to represent bathtub, unimodal and a broader class of monotonic failure rates.

The practical application of Weibull distribution for hazard analysis over failure data wasexplained by Jardine[22]. In this paper, Jardin has used Weibull distribution to explain thehazard rate in case of aircraft and marine engine failures. The inputs were the diagnosticvariables collected on both the machines while operating in real world environments. Thesevariables were then modeled into hazard rate equations which were till then commonly onlyused for medical research. Examination of the residuals shows a good fit of the Weibullproportional hazards model to the data.

Page 24: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of
Page 25: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

Chapter 3

Theory

3.1 Data Description

The battery data was collected from Volvo V60 Plug-in Hybrid Electric Vehicles (PHEV) bya sensor made by the Wireless Information Collection Environment (WICE) team at VolvoCars. The sensor collects numerous signals from the cars while the cars are in operation.These PHEVs are based on Volvo’s proprietary Scalable Product Architecture (SPA) platform.The working principle of a typical PHEV is the combination of an Internal CombustionEngine (ICE) and a battery electric motor. When both engines are at full capacity, the carstarts running on the battery power. When the battery is empty, the car switches to the ICE asa source of fuel. Since these vehicles run on electricity, there are significantly less emissionsmaking them environment friendly. The data provided for this thesis was specific to theelectric motor.

The 32 cars in this experiment were equipped with sensor modules by the WICE team.These sensors collected various signals from different cars in operation with the customers’approval. These included current, voltage, battery temperature, ambient temperature, batterycapacity estimations etc. The experiment started with a few cars being fitted with an initialsensor which collected about 180 signals in the beginning. As time progressed, new carswere added and the sensors were upgraded to record about 1700 signals including the signalsrecorded in the initial stages. Among these signals, some were also calculated based on otherraw signals as mentioned before. Therefore, we have data from cars with different start timesparticipating in the experiment. This inconsistency in the data collection pattern exhibitedby the team was because of the nascent and rapidly developing research which lead to thediscovery of new important signals deemed better for battery analysis.

All kinds of sensors in the experiment collect a large number of signals and store them ina Master Database File. This is the primary format of data collection in the Microsoft SQL

Page 26: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

12 Theory

Fig. 3.1 MDF File Structure

Server. These files contain time dependent information for each time drive cycle. The typicalstructure of the file is as shown in Figure 3.1

In this file format, the most important aspect of the header file is the absolute time(abs_time). This marks the beginning of each drive cycle. The data block contains variouschannels with a local time axis which starts at zero at each drive cycle. The time axis foreach channel has a different timestep. Each time axis inside a channel can consist of one ormore signals of similar characteristics. The signals are grouped in such a way because ofmultiple sensors recording the same attribute of the battery. For eg. in this dataset, there are9 battery temperature modules observed continuously during a drive cycle.

The experiment was started on 7th March 2018 for the first car. The total duration ofthis experiment at the time of data extraction from the sensors was 223 days for the oldestcar. There are still more cars being added to the experiment. Since the duration of the datacollection is severely short, we cannot record the event of failure of a battery. This meansthat we only have a very small part of the degradation pattern we are supposed to predict.If we proceed to use the neural network, we have no way to evaluate its accuracy of outputas the event has not been seen yet. To train the neural network to observe a pattern, it ismost essential to have the record of the event in the training data atleast once. Also, wehave a lot of data from different cars for a very short period of time. This means all the data

Page 27: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

3.2 Data Science 13

collected is not time correlated. Therefore, neural networks cannot be used to predict Endof Life of a battery in this case. An alternative analysis of this data can be to predict thelikelihood of the degradation of the battery. The data collected can still be used for batterydegradation analysis where the event to be predicted would be 5% degradation in MinimumCell Capacity.

3.2 Data Science

The pipeline of this project can be broadly divided into two parts. The first part would consistof data analysis and the second part would consist of modelling and predictions. Below are afew important concepts used for data analysis.

3.2.1 Principle Component Analysis

Principal component analysis (PCA) is a tool that transforms a given set of correlated datapoints into a set of linearly uncorrelated variables. This is a non-parametric method toextract information from complex datasets.[40] A large table of numbers is very difficult tocomprehend. This can be simplified by finding the relationship between the available columnsin the dataset. The resulting vectors calculated with this method explain the entire varianceof the dataset from the most significant component to the least. Through this procedure, PCAhas proved to be a very powerful tool for exploratory data analysis and for making predictivemodels. We have particularly used it for the purpose of reducing the dimension of the dataset.

In the provided battery dataset, suppose we have n number of observations for p numberof features, we would need to reduce the amount of data to be processed for the rest of thepipeline to make the whole algorithm more computationally efficient. Geometrically, we cansay that the n observations lie in a p-dimensional space. However, not all these multivariatedimensions have an equal impact on the Minimum Cell Capacity of the car over time. Thesevariables can be omitted from the dataset without significant loss of knowledge. In thisexperiment, we consider only the first and also the most significant principle component forthe rest of the analysis.

Z1 = φ11X1 +φ21X2 +φ31X3....φp1Xp (3.1)

where:

φ11,φ21....φp1 = loadings or weights or eigen valuesX1,X2....Xp = features

Page 28: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

14 Theory

The equation 3.1 shows the representation of the first principle component which showsthe maximum variance among the dataset. All the loadings combined form the principlecomponent vector[21]

φ1 = (φ11 φ21 φ31 ...φp1)T (3.2)

Similarly the subsequent principle components can be found in the same way. However,each principle component found by this equation is uncorrelated and the correspondingloading vectors are found to be orthogonal to each other.

3.2.2 K-means Clustering

Clustering is a unsupervised learning algorithm. Since we have a lot of data points andit is difficult to define the relationship between all of these just by observation, we try togroup into K distinct , non-overlapping clusters using K-mean clustering[21]. The criteria forclustering is that of each cluster C in K number of clusters, we try to minimize the amountof variation between each observation inside each cluster. Therefore, the the equation to beminimized can be written as in equation 3.3:

minimizeC1,...,CK

{K

∑k=1

W (Ck)

}(3.3)

To calculate the within-cluster variation (W (Ck)), we can use the distance between twopoints in the dataset. The most common choice for this has been the squared EuclideanDistance. Therefore it can be defined as in equation 3.4:

W (Ck) =1

|Ck| ∑i,i′∈Ck

p

∑j=1

(xi j − xi′ j

)2 (3.4)

By combining 3.3 and 3.4,

minimizeC1,...,CK

{K

∑k=1

1|Ck| ∑

i,i′∈Ck

p

∑j=1

(xi j − xi′ j

)2

}(3.5)

where

|Ck|= Number of observation in cluster Kxi j = Data pointp = The number of features in the data point

Page 29: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

3.2 Data Science 15

A simplified explanation of this iterative algorithm can be written as follows:

1. The algorithm is initialized with a random number representing the number of clustersthe data can be segregated into in the beginning.

2. Minimization step:

(a) For each cluster K compute the centroid. The centroid is the mean of each featureof the data points in the current cluster.

(b) Each observation is given a new cluster number based on the nearest centroid.

Therefore, each datapoint is assigned to a cluster and the configuration changes as thesystem reaches an optimal solution. Though this algorithm, the cluster only reaches a localoptimum. Therefore, the final cluster configuration will be dependent on the initial numberof clusters assigned for the algorithm to begin with.

3.2.3 Variance Threshold

In the dataset, the sensor records two different types of data.

1. Event driven - These are state variables which describe the state of the system. Hence,in a majority of event driven features, we observe the same readings over long periodsof of time.

2. Continuous - These are timeseries variables which may have large fluctuations overtime.

The aim of using this technique was to filter out event driven variables like for eg. Signalcommunicating which drive profile is active in the car, passenger seat position, sunroofposition etc. A typical normal distribution probability density[4] can be defined as inequation 3.6:

f(x|µ,σ2)= 1√

2πσ2e−

(x−µ)2

2σ2 (3.6)

where:

µ = Mean or expectation of the distributionσ = Standard deviationσ2 = Variance

Page 30: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

16 Theory

All the features following this distribution are normalized to bring them on a commonscale. By normalization, we adjust the mean of the features to zero and variance to one. Thenotation [31] for a standard normal distribution is given by equation 3.7

N(µ,σ2)= N (0,1) (3.7)

Therefore, the probability density function of the standard normal distribution is given bythe equation 3.8

ϕ(x) =1√2π

e−12 x2

(3.8)

In this configuration, we only consider signals with variance greater than 0.00000001 inorder to filter out constant signals. The threshold is set to a minimum in order to lose theleast amount of data.

3.2.4 Correlation Matrices

In data analysis, we need to find the dependence of the variables on each other in order tosimplify the dataset especially when we have multivariate timeseries data. This enhancesour ability to understand the relationship between the variables. In our case, since we havea clearly defined output feature, we find the correlations between the rest of the featuresand the output feature to understand the predictive relationship between them. This can beused to filter the most useful input variables which have a high correlation with the outputvariables.

Given two random variables, X and Y, the correlation between both entities can be definedas:

ρX ,Y = corr(X ,Y ) =cov(X ,Y )

σX σY=

E [(X −µX)(Y −µY )]

σX σY(3.9)

where:

E = expected value operatorcov = covarianceµx,µy = expected valuesσx,σy = standard deviationscorr = widely used alternative notation for the correlation coefficient.

ρX ,Y is also called the person correlation coefficient. Since we deal with multivariatedata, let’s assume we have n random variables, X1, . . . ,Xn. Then the correlation matrix in this

Page 31: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

3.3 Survival Analysis 17

case will be an n×n matrix whose (i, j) entry is corr(Xi,X j

). The correlation matrix will

always be symmetric because the correlation between Xi and X j is the same as the correlationbetween X j and Xi.

3.3 Survival Analysis

In survival analysis, subjects are usually followed over a specified time period and the focusis on the time at which the event of interest occurs. This technique suits our needs since theinformation about the occurrence of the event is not complete. It means that, we can onlypredict the probability of the occurrence of an event beyond a certain period of time. In ourcase the battery data doesn’t span from the beginning to the End of Life (EOL). This makesit difficult to predict the time till which the battery is operational. Therefore, we use survivaland degradation functions to capture the battery behaviour with the available information.

A survival function typically denoted by S can be written as in equation 3.10:

S(t) = Pr(T > t) (3.10)

where:

t = time of interestT = random variable denoting the time of deathPr = Probability

This function gives the probability of the occurrence of death. Therefore, this is anincreasing function. We would like to predict the probability of survival which should bea non-increasing function. It means that S(u)≤ S(t) if u ≥ t. Since we do not have all theinformation on the full battery lifecycle data needed to model the driving behavior, we usesurvival analysis to get an estimate of the probability of the battery surviving a given amountof time in the future based on data collected from batteries till now.

3.3.1 Censorship and Truncation

One basic concept in time to event predictions is Censoring. This is a part of featureengineering where we have an additional feature based on the data given to us. Censoring isdependent on the occurrence of an event in a given time frame. Therefore, we have two typesof censoring[29]:

1. Right Censored - An observation is right censored when the event doesn’t happenin the time frame during which the experiment takes place. This means the subject

Page 32: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

18 Theory

might have either left the experiment before the event occurred or the event mighthave occurred after the experiment has ended. Therefore, in the battery data, a batteryis censored if the required degradation of minimum cell capacity of a battery is notrecorded by the time the experiment ends. Therefore, if the experiment takes placeduring the time (0, t), then the event might occur in the time frame (t,∞).

2. Left Censored - An observation is left censored when the event occurs before theexperiment began.

Assumptions

In this thesis, we have made the following important assumptions as the data provided wasnot sufficient:

1. The 32 cars are assumed to have the same starting date for the data collection.

2. We slightly modify the definitions of censorship to fit the data into the likelihoodfunction. If the event has occurred during the time frame, we consider the datapointobserved. If the event hasn’t occurred, we consider it a censored datapoint. This canbe written as follows

δ =

{1 if T <C0 if T >C

(3.11)

where

δ = Random variable/censorship variableT = Time to eventC = Time to the censoring event

3.3.2 Kaplan Meier Estimate

The Kaplan Meier Estimate is a lifetime distribution function which is a complement to thesurvival function as in equation 3.10. The lifetime distribution function can be typicallydenoted by the equation 3.12:

F(t) = Pr(T ≤ t) = 1−S(t) (3.12)

The event density can be defined by differentiating the probability as in Equation 3.13

f (t) = F ′(t) =ddt

F(t) (3.13)

Page 33: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

3.3 Survival Analysis 19

By combining Equations 3.12 and 3.13 we get the survival density function:

s(t) = S′(t) =ddt

S(t) =ddt

∫∞

tf (u)du =

ddt[1−F(t)] =− f (t) (3.14)

The Kaplan Meier Estimate [26] is the most common non-parametric survival functionused to understand the lifetime of an entity given that we do not have enough data till theend. This function is most commonly used in the medical field for predicting the lifetime ofa patient. In this thesis we use it to analyze the lifetime of a battery based on the availabledata. The Kaplan Meier Estimate is also called the product limit estimator.

The survival function estimate S(t) which represents the probability that life is longerthan time (t) is given by:

S(t) = ∏i: ti ≤ t

(1− di

ni

)(3.15)

where:

ti = time till atleast one event has happeneddi = number of events that happened till tini = number of individuals known to have survived till ti

This function gives us a discrete distribution showing the probability of the event at eachsuccessive interval[25]. This curve gives the estimate of the lifetimes when some of theobservations are not available.

3.3.3 Nelson Aalen Estimator

The Nelson Aalen Estimator is a non-parametric cumulative hazard rate function [34]. Acumulative hazard function Λ can be calculated according to the Equation 3.16.

Λ(t) =− logS(t) (3.16)

where S(t) is the survival function. By differentiating both sides:

ddt

Λ(t) =−S′(t)S(t)

= λ (t) (3.17)

From equation 3.17 a cumulative hazard function from the beginning till time t can bewritten as:

Λ(t) =∫ t

0λ (u)du (3.18)

Page 34: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

20 Theory

Nelson-Aalen estimator is a function which gives us the sum of expectation of the numberof events occurred till time t in the future. It can be calculated by the equation 3.19.

H(t) = ∑ti≤t

di

ni(3.19)

where

di = the number of events at tini = the total individuals at risk at ti

3.3.4 Cox Proportional Hazard Function

Cox Proportional Hazard Function captures the failure rate within a particular populationbased on the observed covariates associated with time[8]. Failure rate can be defined as thefrequency with which a given set of entities in a population stop functioning conventionally. Areliability function can be defined in terms of the Survival Function F(t) as in equation 3.20.

R(t) = 1−F(t) (3.20)

where

R(t) = Reliability FunctionF(t) = Survival Function

This equation represents the probability of survival before time t. Hence, the reliabilityof the component. The Failure rate λ (t) can be written as in equation 3.21:

λ (t) =f (t)R(t)

(3.21)

where f (t) is the failure density function. In order to find the failure density function over ∆ttime interval we combine 3.20 and 3.21,

λ (t) =R(t1)−R(t2)(t2 − t1) ·R(t1)

=R(t)−R(t +∆t)

∆t ·R(t)(3.22)

This is the conditional probability of survival till t in terms of reliability function.Since the battery data we have is multivariate, suppose we have k covariates X1,X2, · · · ,Xk

affecting the output variable at time t, then the hazard function for the ith person at time tcan be denoted as λ (t|X1i,X2i, · · · ,XKi). We define a baseline hazard function λ0(t) whenX1i = 0,X2i = 0, . . . ,XKi = 0. To find the relative risk at any time t from the beginningdepending on the values of the covariates, we can find the following hazard ratio

Page 35: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

3.4 Survival Analysis Neural Networks 21

λ1(t)λ0(t)

(3.23)

The log of this ratio is a linear combination of covariates which can be written as follows:

log(

λ (t|X1i,X2i, . . . ,XKi)

λ0(t)

)= β1X1i +β2X2i + . . .+βKXKi (3.24)

The ratio of hazard functions can be considered a ratio of risk functions, so the propor-tional hazards regression model can be considered as function of relative risk (while logisticregression models are a function of an odds ratio). This assumption means if a covariatedoubles the risk of the event on day one, it also doubles the risk of the event on any other day.Sir David Cox observed that if the proportional hazards assumption holds then it is possibleto estimate the effect parameter(s) without any consideration of the hazard function. Changesin a covariate have a multiplicative effect on the baseline risk. The model in terms of thehazard function at time t is:

λ (t|X1i,X2i, . . . ,XKi) = λ0(t)exp(β1X1i +β2X2i + . . .+βKXKi) (3.25)

This approach to survival data is called application of the Cox proportional hazardsmodel, sometimes abbreviated to Cox model or the proportional hazards model.

3.4 Survival Analysis Neural Networks

3.4.1 Recurrent Neural Networks

Recurrent neural networks (RNN) are used to detect patterns and trends in the training databecause they have a temporal dimension[12]. These types of neural networks take time andspace into consideration therefore, they understand the information context along with thedirect relation between the input and output. This has proved RNNs to be incredibly accuratein predictions.

The typical structure of an RNN is given in Figure 3.2. In this diagram towards the left isa single RNN cell where we have an input vector xt which goes inside the neural network,gets processed by the hidden layer ht and gives the output vector yt . This output is againrecycled as an input for the next training iteration. This process is continued several times tillthe error reduces to a minimum. The error in each step is calculated as follows:

et = (yt − yt)2 (3.26)

Page 36: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

22 Theory

where

yt = Actual outputyt = Predicted output

This can be reimagined as several neural networks being arranged in a sequence oneafter the other. On the right side of Fig 3.2, we have a graphical representation of the neuralnetwork architecture. Each layer has its own time dependent hidden layer and the output ofthe previous layer acts as an input to the next layer. Therefore, time correlation of input datais an important aspect to consider while training memory based neural networks.

The training of the typical RNN takes place through gradient descent. The direction ofpropagation for the loss function in order to minimize the gap error in the output is given bydifferentiating 3.26 as in equation 3.27.

∇et =−2(yt − yt)∇yt (3.27)

With multiple layers, the training strategy is called Backpropagation through time [17].Throughout the iterations, the output and the hidden layer can be updated as in equations3.28 and 3.29:

ht = σh (Whxt +Uhht−1 +bh) (3.28)

yt = σy (Wyht +by) (3.29)

There are many variations in the architecture of the RNNs. Some of the most com-monly used architectures are the Long short-term memory (LSTM) and Gated recurrentunits (GRUs). These are mainly used for the purpose of recognising temporal patterns inhandwriting recognition[16], speech recognition[38], weather predictions[2].

Page 37: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

3.4 Survival Analysis Neural Networks 23

3.4.2 Weibull Time to Event Analysis

A Weibull distribution provides the probability of the occurrence of a possible outcome inan experiment. In the field of reliability engineering, Weibull distributions are the mostcommonly used parameter to observe the aging process of machines. In this thesis, we willbe using this to capture the aging process of batteries.

A typical weibull distribution probability density function is written by the equation 3.30:

f (x;λ ,k) =

{kλ

( xλ

)k−1 e−(x/λ )kx ≥ 0

0 x < 0(3.30)

where

k = Shape parameterλ = Scale parameterx = Time to failure

The equation 3.30 can estimate the failure rate as a function of time. According to thevariation of K as shown in Figure 3.3 we can interpret the hazard rates as follows:

1. k < 1 Failure rate decreases over time.

2. k = 1 Failure rate is constant over time.

Fig. 3.2 RNN Architecture[9]

where

xt = Input Vectorht = Hidden Layer Vectoryt = Output VectorW,U = Parameter MatricesV = Vector

Page 38: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

24 Theory

Fig. 3.3 Weibull Probability Density Function

3. k > 1 Failure rate increases over time. Typically represents the aging process.

We use this function for churn prediction[30]. The expectation from our Churn modelwould be:

1. Work with recurrent events

2. Handle time varying covariates (battery features)

3. Learn temporal patterns

4. Handle sequences of varying length

5. Learn with censored data

6. Make flexible predictions

The goal for churn prediction is to maximize the function in equation 3.31[30]:

N

∑n=1

Tn

∑t=0

unt · log [Pr(Y n

t = ynt |xn

0:t)]+(1−unt ) · log [Pr(Y n

t > ynt |xn

0:t)] (3.31)

Page 39: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

3.4 Survival Analysis Neural Networks 25

where:

ynt = Time to Event for user n = 1, . . . ,N at timestep t = 0,1, . . . ,Tn

xn0:t = data up to time t

unt = indicating if data point is censored un

t = 0 or not unt = 1

Y nt = random experiment

In order to train the neural network, the objective function or the loss functions wheny ∈ (0,∞) continuous is give by equation 3.32,

error =− log(

f (y)uS(y)1−u)=−u · log[λ (y)]+Λ(y) (3.32)

when λ (y) = Λ′(y) or if y ∈ {0,1, . . .} is discrete is given by 3.33

error =− log(

p(y)uS(y+1)1−u)=−u · log[ed(y)−1

]+Λ(y+1) (3.33)

where:

y = observed timeu = censorship indicator s.t u = 1 if y is uncensored u = 0 else.R(θ ,y) = θ(y) be some Recurrent Cumulative Hazard Function parametrized by θ

θ = output of the RNN

By combining the continuous objective function in equation 3.32 and the churn model inequation 3.31, we optimize this function through the neural network:

maxlog(L(w,y,u,x)) :=T

∑t=0

(ut ·[

βt · log(

yt

αt

)+ log(βt)

]−(

yt

αt

)βt)

(3.34)

Similarly by combining the discrete objective function in equation 3.33 and the churnmodel in equation 3.31:

maxlog(Ld(w,y,u,x)) :=T

∑t=0

(ut ·

[exp

[(yt +1

αt

)βz

−(

yt

αt

)βt]−1

]−(

yt +1αt

)βt)

(3.35)The aim of this analysis is to find the values of α and β by maximizing this equation.

3.4.3 Cox Proportional Hazard Function

The Cox Proportional hazard models as described in section 3.3.4 can also be implementedusing neural networks. We use a discrete-time survival cox neural network which is trained

Page 40: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

26 Theory

with a mini-batch gradient descent. Suppose we have have n subjects to train the neuralnetwork on and [t1, t2, . . . , tn] corresponding lifetime duration for each individual. Theconditional hazard probability h j for each subject is defined as the probability of failureduring the interval j, given that the individual has survived at least to the beginning of theinterval. The probability that the individual will last till the end of the interval is given byequation 3.36:

S j =j

∏i=1

(1−hi) (3.36)

This survival model can be modified into the likelihood for discrete-time. There are twoways to achieve this, we discretize based on time or the individual. For this mini batch trainedneural network we discretize based on subjects. Therefore, for any subject in the dataset, thelikelihood can be calculated by multiplying the probability that they survive through intervals1 to j is the probability that they fail during this period as in equation 3.37:

lik = h j

j−1

∏i=1

(1−hi) (3.37)

In order to calculate the log-likelihood, we take the log on both sides as in equation 3.38:

loglik = ln(h j)+

j−1

∑i=1

ln(1−hi) (3.38)

If the subject has a censored time tc such that

12(t j−2 + t j−1

)≤ tc <

12(t j−1 + t j

)(3.39)

Then the likelihood is the probability that the subject would survive from 1 to j − 1intervals. Therefore, the equation 3.37 can be rewritten as :

lik =j−1

∏i=1

(1−hi) (3.40)

Similarly, log-likelihood can be written as:

loglik =j−1

∑i=1

ln(1−hi) (3.41)

One observation made through this is that the subject whose censoring time lies in thesecond half of the interval are given credit for surviving the interval. If they are not givencredit, a downward bias is seen in the survival estimates[3].

Page 41: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

3.4 Survival Analysis Neural Networks 27

A custom loss function was implemented in Keras in python[14]. For subjects whichwere observed during the time time interval 1 to j the vector survs with length n throughwhich the subject survived is given by:

survs( j) =

{1, ift ≥ t j

0, otherwise(3.42)

For subjects where failure has occurred during this time interval is given by:

surv f ( j) =

{1, ift j−1 ≤ t < t j

0, otherwise(3.43)

In censored individuals:

survs( j) =

{1, ift ≥ 1

2

(t j−1 + t j

)0, otherwise

(3.44)

surv f ( j) = 0 (3.45)

Therefore, the log likelihood of each individual can be found by applying these conditionsin equation 3.38 and equation 3.41:

loglik =n

∑i=1

(ln(1+ survs(i) ·

(survpred(i)−1

))+ ln

(1− surv f (i) · survpred(i)

) )(3.46)

The summation of the log likelihood each individual will give us the global model ofthe hazard rate of the whole dataset. In the neural network, we find the negative of theequation 3.46 in order to maximize the likelihood. Hence the loss can be minimized usingmini-batch gradient descent. The cox neural network used in this case has the followingfeatures:

1. It is theoretically justified and fits into the established literature on survival modeling

2. The loss function depends only on the information contained in the current mini-batch,which enables rapid training with mini-batch SGD and application to arbitrary-sizedatasets

3. It is flexible and can be adapted to specific situations. For instance, for small samplesize where we wish to minimize the number of neural network parameters, it is easy toincorporate a proportional hazards constraint so that the effect of the input data on thehazard function does not vary with follow-up time.

Page 42: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of
Page 43: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

Chapter 4

Method

4.1 Modelling Procedure

An initial idea of the pipeline according to the knowledge gathered from the availableliterature can be visualized as in Figure 4.1

Fig. 4.1 Battery Modelling Pipeline

The basic idea behind modelling battery data would be to use the raw data provided asthe input and get the relevant predictions as the output. This can be broken down broadlyinto two parts Data Processing as explained in section 4.2.1 and Battery Aging Modelling asexplained in sections 4.2.2 & 4.2.3.

Page 44: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

30 Method

4.2 Final pipeline

Fig. 4.2 Battery Modelling Pipeline

Given in Figure 4.2 is a more detailed flowchart of the entire battery aging analysissoftware toolchain where each algorithm is broken down into individual function blocksaccording to their sequence of application. The data was initially given to us in a compressedformat in the form of mdf files. This was not easily accessible to humans as it was saved ina highly compressed format as described more in detail in section 3.1. The data had to becleaned and converted carefully into dataframes then saved into csv file. This algorithm isexplained in detail in section 4.2.1. After we create a human readable database of all featurescollected by the WICE sensor in all cars, we isolate the feature names pertaining to the carbattery aging. This algorithm is explained in 4.2.1. The selected feature values were thenextracted from the database in order to create two different aggregated dataframes which isassumed to represent the whole dataset as explained in 4.2.1. The aggregated data is furtherprocessed to decrease the overall size and increase the importance of the most significantvariables as in section 4.2.1. The final dataset is then used for Survival Analysis.

Since we deal with Big Data, it is advisable to parallelize algorithms to enable fasterprocessing. The software toolchain shown in Figure 4.2 is implemented to leverage the powerof multiple cores in the CPU provided to us. The multiprocessing library was used in orderto execute specific functions on individual files where we do not intent to extract correlatedinformation between different files. This parallel function was made flexible so that it can beused for multiple unique processes. The psuedo code for the parallel algorithm is as given inFigure 4.3:

Page 45: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

4.2 Final pipeline 31

Initialize a process manager

return dictionary = Declare the return type for the function

jobs = Initialize list of processes

For each car in all cars:

m = Process definition(Process function, Process name,

↪→ Arguments to the function)

Append the process definition to the list of jobs

Start the process

repeat

for each process in the jobs list:

Join all processes after the individual tasks end

return dictionary = return a dictionary of values with process

↪→ names as keys, data as values

Fig. 4.3 Parallel Algorithm

4.2.1 Data Handling

The data collection followed a hierarchical structure. We had 32 parent folders for eachcar. Each folder contained numerous folders following the timestamp of the record. Thesefolders contained the corresponding drive cycle information in individual files. In total, wedeal with 5576 files. To automate processing this amount of data we need a high amount oforganization and precision.

One effective solution to this problem was to traverse the whole database and collect thenames and paths of all folders and files in the form of lists. This exploration strategy scansthe whole database once in the beginning. The consolidated list containing all filenames werethen used as input strings for further data processing.

Feature Selection

The WICE experiment is still in the developmental stage and the sensors are still expandingthe number of features recorded. Therefore, we have different types of sensors recordingdifferent type of features. Our aim is to get a list of features which will impact the batteryaging process.

We start with prior knowledge that features like current, voltage, resistance, state ofcharge, capacity etc. will affect the battery aging process. These features are stored in asignal list. Therefore, we extract the feature names recorded in each mdf file and find an

Page 46: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

32 Method

intersection among these features such that the signal list is always present in the file. Ifthe signal list is not present in the mdf file, we drop it since we have incomplete featuresamong the basic battery features in the signal list. Including this might give us erroneouspredictions. Two different feature lists are created using this method. One focusing on theMinimum Cell Capacity and the other is dependent on the State of Health of the battery. Thisoperation is performed for all the 32 cars across all sensors in order to extract two final listsof features which we want to extract from the mdf files for further processing. This yields us267 features for the SOH and 250 features for the minimum cell capacity.

Data Conversion

The output of the WICE sensor modules is stored in mdf files which is highly compressed.The aim of data conversion is to convert the file structure into a human readable format andsuitable to apply data processing algorithms.

Conventionally this can be achieved using built-in functions in various python librariesand matlab. However, conversion through the built-in python functions proved unreliablewhen comparing them to the matlab conversion output. Therefore, a custom function wasprogrammed in order to get a greater extent of control while handling individual signals ineach channel of the mdf file as explained in section 3.1.

We create a function to convert an mdf file into a csv file. We first extract the global timefrom the header file. Then go to the mdf data content which is split into several channels.These channels contain a local time axis and one or more corresponding signals. The listof features intersections obtained from the feature selection process are the signals we willbe interested in analyzing to understand battery aging. This list is fed into a function toextract these specific signals with their corresponding time axis. However, each signal maybe sampled with a different time step. Therefore, each signal is extracted with the time axisand then resampled to a 1 sec timestep. The global time is added to the local time axis toobtain one signal in the global timeframe. This process is iteratively applied to all signals inall channels. These signals are then merged in order to create one dataframe containing onecommon time axis with all the interesting features along the subsequent columns. Finally,to fill missing information in the resulting dataframe linear interpolation is used. Othercomplex forms of interpolations were deemed computationally intensive and unnecessaryas the variance between the signals between each time step was minimal. The resultingdataframe is stored in a csv file.

At the end of this process we will have converted one mdf file into a csv file. This has tobe repeated over 5576 files which is a very time and resource consuming process. Therefore,we run the code in parallel where we start one process for each car through the python

Page 47: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

4.2 Final pipeline 33

multiprocessing function. This process iterates through all the files in each car, collects allthe mdf files and converts them into csv files and saves them into another database consistingof only the human readable csv files. There were two main challenges faced in this process:

1. File Read/Write - The most time consuming part of this algorithm was the File read-/write operations. This was seen an an inevitable step as it was necessary to have arecord of the converted files to develop a proof of concept software toolkit. It can beskipped in the later stages of development.

2. Corrupt channels - There were a few channels where a few observations in the timeaxis was either negative or inf. These were regarded as corrupt signals and the wholefile was dropped.

3. Resource constraint - Even after making a parallel algorithm, it took about 8-10 hoursto make the whole dataset. This had to be done twice because we wanted one datasetfocusing on the minimum cell capacity and the other on State of Health.

Fig. 4.4 Duration(sec) per file for the Capacity Database

Page 48: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

34 Method

Fig. 4.5 Duration(sec) per file for the SOH Database

In the Figures 4.4 and 4.5, we have line plots where each color represents one of the 32cars. On the x-axis is the number of files locally stored for each car. On the y-axis is thenumber of datapoints or rows per file. Since the data is resampled, the y-axis also representsthe number of seconds per drive cycle recorded or the duration of the drive cycle in seconds.

Data Aggregation

Once we have a new database, we have the same hierarchical folder structure for each car asdescribed above. However, we have csv files in place of mdf files. Each csv file inside ofeach folder holds the information for its corresponding drive cycle. Taking into considerationthat the variation in battery health in each drive cycle is very less, we take the mean of allfeatures for that particular drive cycle. This gives us one data point for each drive cycle. Atthe end, we have 974 drive cycles when the data is aggregated focusing on SOH and 6863drive cycles when the data is aggregated focusing on Minimum Cell Capacity.

Page 49: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

4.2 Final pipeline 35

Feature Engineering

In this section we describe the procedure to perform data cleaning and reduction. We reducethe data further but also increase the importance of the significant variables in the aggregateddataset.

The first step in this process is to remove highly correlated signals. We calculate thepearson correlation between the output variable and the rest of the features. Then, filter outsignals where the correlation is more than 0.9. This is an important procedure because inthe features, we have a few complex signals which are calculated based on the other signalscollected in the raw data. By filtering out signals like these, we avoid the risk of including thedirectly dependent signals as an input to the battery model. Including them would produce avery high accuracy in the output as the neural networks outputs will only be dependent onthese signals. In the beginning of this process, we start with 250 features in the aggregateddataframe focused on Minimum Cell Capacity. At the end, we manage to reduce them to 207features. In case of the aggregated dataframe focused on SOH, we start with 268 featuresand end up with 216 features at the end.

We use Kmeans to add a cluster label as a feature in order for the system to understandthe interdependence of the rest of the features. Through Kmeans, all the cars are assigneda cluster label according to the exhibited features. The cars with similar features are giventhe same cluster labels. This can help us understand a pattern in the battery degradationaccording to driving behavior. In case, we had geographical position of operation for the car,it may be concluded that the cars within a cluster were approximately from the same climaticcondition.

The final dataframe is scaled so it follows a normal distribution. Each feature is normal-ized to have a zero mean and a unit variance. There are two main reasons for scaling:

1. Filtering - If the variance of a few features is very close to zero, we remove thosefeatures using a built-in variance threshold function. This is an important step becausethe WICE sensor records several different types of variables. We are interested inlooking at only those variables which change significantly during the lifetime of acar. This also, helps us in removing event driven status signals which may not changethroughout the analysis.

2. PCA - PCA is sensitive to scaling. Running PCA on an unscaled dataset would resultin getting the analysis based on covariances whereas on a scaled dataset, the analysisoutput will be based on correlations of the features in the dataset.

PCA is performed on the resulting dataframe. The results are shown in the Figure 4.6and for Figure 4.7 both datasets.

Page 50: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

36 Method

(a) Scree Plot (b) Explained Variance Plot

Fig. 4.6 PCA Analysis Results for Minimum Cell Capacity

(a) Scree Plot (b) Explained Variance Plot

Fig. 4.7 PCA Analysis Results for SOH

We concat the PCA projections to the existing dataset as new features. Then we findthe correlations between the whole dataset again. This time we aim to remove the very lesscorrelated variables. Therefore, we hardcode a lower limit on of 0.3 to eliminate the leastcorrelated features. At the end, we get a descriptive (hierarchial) dataframe which tells usthe PCA component (as index), the signal names inside and the correlation value of eachsignal. The correlation heat map of the first principle component shown in Figure 4.8 is forthe features in the Minimum Cell Capacity dataset. The correlation heat map of the firstprinciple component shown in Figure 4.9 is for the features in the SOH dataset. This isessentially a square matrix where the x and y axis consists of the feature names. (The nameshave been removed as per the confidentiality clause with Volvo Cars). The legend indicatesthat the features with low correlation are marked with a light blue color and as the correlationincreases, the color intensity increases.

At the end of feature engineering, we reduce our Minimum Cell Capacity dataset to 73features and the SOH dataset to 67 features. The summary statistics of the data is given inthe table 4.1

Page 51: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

4.2 Final pipeline 37

Fig. 4.8 Capacity Correlation Heatmap

4.2.2 Statistical Analysis

We start analyzing the data from Statistical Analysis. The purpose of Statistical Analysis isto get a global estimate of the chances of survival of a car battery beyond a particular periodof time with more comprehensive math algorithms before using Neural Networks. This isa proof-of-concept that the data available can be used for predictions and it is a learningexperience to get a deeper understanding of the battery degradation behavior. This processalso gives us a better idea of the predictive capabilities of the datasets. The outputs were alsouseful to understand the reliability of the collected features in the raw data.

From the data processing described in section 4.2.1, we have a list of features whichsignificantly affect the output parameter for each dataset. These features are to be lookedat while performing predictive analytics in this section. Therefore, the stored significantvariables for each output parameter are then taken into account for making a new databasespecific to the usage for each kind of survival analysis.

For Statistical Analysis, we take each car as one n-dimensional datapoint containingn-features. To make the new database, we traverse through the raw data containing csvfiles and collect the significant features for each car. Mean aggregation is used in order toconsolidate the information pertaining to each car into each row of a final dataframe. Weadd additional information to this dataframe like the ’Start Date’, ’End Date’, ’Duration’(in days) to give us the information on the lifetime of the car. We also want to capture the

Page 52: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

38 Method

Fig. 4.9 SOH Correlation Heatmap

behaviour which we do through a ’Censorship’ column. We declare a variable to check 5%degradation of the battery. For eg. We calculate the maximum degradation of the MinimumCell Capacity which is indicated by total degradation of Minimum Cell Capacity at theend of the experiment compared to its initial Minimum Cell Capacity. We apply the samealgorithm for the SOH dataset. If the degradation is more than 5%, the event we are tryingto track has occurred. Therefore, we classify the data point as observed and assign it a ’1’.If the degradation is less than 5%, the event we are trying to track will occur in the future.Therefore, we classify the data point as censored and assign it a ’0’.

Table 4.1 Summary Statistics of Clean Data

Output Variable Maximum Duration Censored Cars Observed CarsMinimum Cell Capacity 223 Days 12 19

State Of Health 79 Days 14 17

Kaplan Meier Estimate

In this method, we have a univariate analysis where we only take the censorship variableand the lifetime into consideration. For this analysis, only the duration and the censorshipvariable are used. The curve in Figure 4.10 represents a global battery degradation modelfor all cars in common. The curve is an estimate of the Minimum Cell Capacity degradation

Page 53: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

4.2 Final pipeline 39

based on the data already available to us. Using this, we try to predict the probability ofdegradation of the censored cars in the dataset in Figure 4.11.

In Figure 4.10 at around 135 days, we encounter the first observed event due to which thesurvivability of all the cars is reduced. In the same manner, the graph goes down till zero astime increases. The values for the KM estimate are given in the table 4.2.

In Figure 4.11 the red line represents the maximum time duration till which we have dataavailable among all cars. We can observe that the analysis around capacity shows that theremaining cars will reach a 5% degradation in capacity in around one month in the future.

Fig. 4.10 Battery Capacity Degradation Profile

We do the same analysis using SOH as an output variable. Figure 4.12 shows a 5% SOHdegradation in exactly 60 days for all cars which is well before the duration of the end of theexperiment. We also try to predict the future SOH values without any significant results asthe event has already occurred during the experiment.

Page 54: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

40 Method

Fig. 4.11 Battery Capacity Degradation Prediction

Nelson Aalen Estimate

We use the same dataset to perform another univariate analysis in order to capture a cumulativemeasure of the increase in the rate of occurrence of the event tracked in this dataset. Forthis analysis, only the duration and the censorship variable are used. The Figure 4.14 givesa global estimate of rate of increase in hazard from the available data for Minimum CellCapacity. In the graph, the cumulative hazard rate increases with time as we encounter the’observed’ cars achieving 5% degradation. The values for the NA estimate are given in thetable 4.3. We used the observed data in order to make the model and perform predictiveanalysis on the censored observations as shown in Figure 4.15. In this graphs we observethat the model predicts that all cars will be at peak hazard rate one month in the future.

The same analysis has been performed on the SOH dataset. The Figure 4.16 shows thatall the observed cars reached peak hazard rate in 60 days. Predictive analysis performed onthis dataset as shown in Figure 4.17 doesn’t yield any significant results.

Page 55: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

4.2 Final pipeline 41

Index KM_estimate0 1

20 140 160 180 1

100 1120 1140 1160 0.739180 0.622200 0.430220 0.101240 0.067260 0.067280 0.067300 0.067320 0.067

Table 4.2 Global Battery degradation Kaplan Meier Estimate

Cox Proportional Hazards Model

This method is a multivariate analysis of battery health. This type of statistical model uses allthe provided variables in the dataset in order to train a model which completely captures thebattery behaviour. In the Figures 4.18, we observe that every car has a different behaviourcurve. In this method, we have a better resolution into car battery behaviour according totheir individual battery parameters. Each line represents on car in this graph. Also, since weare able to utilize all the data provided to us we have a look ahead of almost 4 moths intothe future. With this analysis, we conclude that that all cars will reach 5% degradation inMinimum Cell Capacity in 4 months in the future after the experiment has ended. We havea hazard prediction plot in Figure 4.19 which also gives us a prediction 4 months after theexperiment has ended.

The same analysis is done on the SOH dataset. The Figure 4.20 shows the survival curvesof the cars. All cars in the experiment are expected to reach a 5% degradation in SOH by 60days since the beginning of the experiment. We observe that the Hazard increases for thecars till the duration of the experiment in the plot in Figure 4.21.

Page 56: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

42 Method

Fig. 4.12 Battery SOH Degradation Profile

4.2.3 Neural Network Analysis

Since the Survival Analysis models are essentially regression models, these can be substitutedby Neural Networks. Two models have been implemented in this thesis project and the resultsare discussed in the below section

WTTE-RNN

The input data was made in a slightly different manner in case of the this neural network.The network needs continuous measurements of features from the beginning to the end ofthe experiment. Therefore, from the database, we take each day as one datapoint for eachcar. At the end, for each car, the number of data points will be equal to the duration of thelifetime of the car in days. The overall neural network structure is given in Figure 4.22.

In this case, we need a special loss function which helps evaluate the accuracy andperform backpropagation through the whole network. A well researched formula has beenmade for Weibull distributions. It takes all samples into consideration while calculating thelog-likelihood. It will punish the model for predicting a high probability when we knowthat no event has occurred during that time frame. Therefore, if a car has not reached a 5%degradation in Minimum Cell Capacity in 200 days, and our model predicts a high probabilitythat it reaches 5% degradation during the same time frame, the loss function will punish themodel heavily. In addition, in case of observed cars, where the time of the event is known tous, the log-likelihood will reward distributions that give a high chance of experiencing the

Page 57: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

4.2 Final pipeline 43

Fig. 4.13 Battery SOH Degradation Prediction

event at that point in time. Censored observations are not taken into account during training.The loss function can be made as in Figure 4.23.

We use an exponential activation function for alpha and softplus for beta. Since Kerasdoesn’t support applying different activation functions to the individual neurons we make acustom activation function as in Figure 4.24.

In Figure 4.25, we can observe the Average Weibull distribution of all cars. Here thepredicted peak hazard periods of the cars is depicted by the gray histogram. The blue, redand green dotted lines are the mean, median and mode of the Weibull parameters. While theblue line plot is the Weibull distribution. Since the mean Weibull distribution overlaps thetrue distribution, we can conclude that our model is working.

The output of the neural network is given in the table 4.4. In this table, the mean, medianand mode are calculated for each car as follows:

Mean = αΓ(1+1/β ) (4.1)

Median = α(ln2)1/β (4.2)

Mode =

α

(β−1

β

)1/β

β > 1

0 β ≤ 1(4.3)

The resulting Weibull distributions obtained from this method are shown in Figure 4.26

Page 58: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

44 Method

Fig. 4.14 Battery Capacity Cumulative Hazard Rate

In Figure 4.26 different curves represent different car behaviours. Each car has its ownhazard curve. The available duration for each car is represented by the dots on the curves. TheWeibull curve represents the hazard rates for the cars to reach 5% Minimum Cell CapacityDegradation. All the curves are in the shape of bell curves which represents the fact thatwhen the car begins operation, the probability of 5% degradation is zero. As time passes,the probability increases. The car reaches 5% degradation in the stipulated amount of time,as covered inside the bell curve. In these curves, the neural network tries to predict a fewpeak hazard rates at the very beginning of the time duration for the cars in the test set. Thisis mainly due to very less training data. This can be improved by providing more timeseriestraining samples with cleaner data.

The loss function while training the neural network is shown in Figure 4.27a. We can seethat the loss is steadily going down which is a positive sign of progress while training of thisneural network. We don’t observe large fluctuations while checking against the test set aswell.

In Figure 4.27b, The bias of alpha has a decreasing trend symptom of overfitting whilethe beta term seems to stabilize at the end. The smaller the weights the less the predictionsdepend on the input covariates and the more the bias term acts as default predictor.

Page 59: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

4.2 Final pipeline 45

Fig. 4.15 Battery Capacity Hazard Rate Prediction

Cox-NN

We couldn’t obtain any useful results from the Cox-NN with the given data. The survivalwas predicted as one at all instants of time as shown in Figure 4.28a. The neural networkwas trained on the observed car timelines and the prediction was done on the censored cars.

However, we ran a simulation by increasing the number of cars and generating syntheticdata. The ideal probability of survival graph for 2 datasets with different degradationbehaviors may look like as in Figure 4.28b.

Page 60: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

46 Method

Index NA_estimate0 0

20 040 060 080 0

100 0120 0140 0160 0.298180 0.461200 0.813220 2.074240 2.324260 2.324280 2.324300 2.324320 2.324

Table 4.3 Global Battery Hazard Rate Estimate

Fig. 4.16 Battery SOH Cumulative Hazard Rate

Page 61: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

4.2 Final pipeline 47

Fig. 4.17 Battery SOH Hazard Rate Prediction

Fig. 4.18 Cox Survival Function (Capacity)

Page 62: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

48 Method

Fig. 4.19 Cox Hazard Function (Capacity)

Fig. 4.20 Cox Survival Function (SOH)

Page 63: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

4.2 Final pipeline 49

Fig. 4.21 Cox Hazard Function (SOH)

Car label Weibull_Alpha Weibull_Beta Mean Median Mode1 96.198 4.081 87.295 87.936 89.7962 94.869 1.030 93.708 66.480 3.1374 173.219 5.894 160.540 162.776 167.841

16 178.921 6.873 167.210 169.631 174.87519 94.850 1.003 94.725 65.821 0.30222 94.930 1.008 94.582 66.011 0.86524 184.053 7.751 173.063 175.553 180.80225 173.354 15.370 167.529 169.269 172.59727 94.899 1.011 94.436 66.061 1.16728 94.810 1.002 94.699 65.784 0.26730 195.857 3.679 176.699 177.290 179.68732 94.862 1.178 89.656 69.511 19.144

Table 4.4 Weibull predicted parameters for each car

Page 64: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

50 Method

Fig. 4.22 Weibull Time-to-event Neural Network Structure

Page 65: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

4.2 Final pipeline 51

def weibull_loglik_discrete(y_true, ab_pred):

y_ = y_true[:, 0]

u_ = y_true[:, 1]

a_ = ab_pred[:, 0]

b_ = ab_pred[:, 1]

hazard0 = k.pow((y_ + 1e-35) / a_, b_)

hazard1 = k.pow((y_ + 1) / a_, b_)

return -1 * k.mean(u_ * k.log(k.exp(hazard1 - hazard0) - 1.0) -

↪→ hazard1)

Fig. 4.23 Discrete log-likelihood for Weibull hazard functionwhere

y_true = tensor containing time-to-event (y) and a 0/1 event indicator (u)ab_pred = tensor containing predicted Weibull α (a) and β (b) parameters

def activate(ab):

a = k.exp(ab[:, 0])

b = k.softplus(ab[:, 1])

a = k.reshape(a, (k.shape(a)[0], 1))

b = k.reshape(b, (k.shape(b)[0], 1))

return k.concatenate((a, b), axis=1)

Fig. 4.24 Weibull Activation functionwhere

ab = tensor containing Weibull α (a) and β (b) parameters

Page 66: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

52 Method

Fig. 4.25 Average Weibull Distribution

Fig. 4.26 Weibull Distribution

Page 67: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

4.2 Final pipeline 53

(a) Loss Curve (b) Biases

Fig. 4.27 Weibull Distribution training Trend

(a) Survival Curve with given data (b) Reference Cox NN curve

Fig. 4.28 Cox Proportional Hazards Neural Network

Page 68: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of
Page 69: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

Chapter 5

Results

5.1 Conclusions

There are a few observations from this experiment which need to be highlighted to carry outefficient future research work:

1. Capacity estimates - The Minimum Cell Capacity estimation as plotted in Figure 4.10shows that the capacity degrades 5% in about 6 months for a few cars. However, itis impossible for a commercial battery to degrade at such an enhanced pace. Thecar batteries are given a warranty of 8 years and the battery degradation behaviour isknown to be non-linear. Therefore, there seems to be an error while calculating ormeasuring the Minimum Cell Capacity in the raw data.

2. SOH - The SOH of the battery was calculated based in the electrolyte resistance of thebattery. We observe that thse estimates are completely off because of its rapid declinein Figure 4.12. Therefore, this output parameter was deemed not useful for any furtheranalysis

3. More Cars - For Survival Analysis (Neural Networks) to work, we need more cars inthe experiment for the system to understand the degradation trend better. However, ifthe duration of the experiment is increased, it only helps in increasing the accuracy ofestimates in the future.

4. Feature Correlations - It was observed that at the end of data processing, therewere still a few status variables which proved to be highly correlated with the batteryMinimum Cell Capacity. This can be eliminated if the duration of the experiment isincreased.

Page 70: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

56 Results

5. Car 8 - The Minimum Cell capacity in Car #8 was observed to reset to 100% in themiddle of the experiment. The cause of this abrupt change is unknown. Therefore, itwas dropped from the experiment in all kinds of analysis.

6. Neural Networks - The Neural Networks were made only as a proof of concept.The models do not exhibit sufficient accuracy for the data available. However, theyshow promising results in capturing the battery behaviour and can be improved withincreased data size in the future.

7. Feature set - A fixed feature set right from the beginning for all cars may have helpedsignificantly. In this experiment, all cars were equipped with different sensors whichcollected different amounts of information. The data collection lacked consistency.

5.2 Future Research

The developed software toolchain is developed in a modular format. It can be modified toimplement more methods in feature engineering. A lot of my time was spent on understandingthe mdf dataset and coming up with a reliable conversion software. This part in particularwas not the most significant part of this thesis when the scientific impact is taken intoconsideration. Therefore, I have made this software flexible for future programmers at VolvoCars to add and remove parallely implemented functions for initial data processing. Thisthesis report is supposed to provide an in-depth understanding of the ongoing WICE datasetcollection protocol, the amount of data available for processing and the methods we canuse to process the data till atleast one of the batteries reaches End of Life. Therefore, if theproject continues data collection with the present constraints, the toolkit in this thesis workcan be used for the coming 8 years with increasing amounts of accuracy in predictions.

There is still a lot of scope for research on this dataset in the field of neural networks.The WICE data is ever expanding and it currently exceeds the capabilities of the hardwareresources provided. The immediate next step would be to convert the new dataset intocsv files to obtain more timeseries data per file. This can be used with the regular featureengineering pipeline to check if we can distill the final feature list to only the most essentialbattery aging related parameters. The aim is to remove all the status variables at the end offeature engineering. The Neural Networks can be trained by using different regularizationparameters for better results.

Page 71: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

References

[1] Barlow, T. J., Latham, S., McCrae, I. S., and Boulter, P. G. (2008). A reference book ofdriving cycles for use in the measurement of road vehicle emissions.

[2] Bensch, S. and Hellström, T., editors (2019). Performance comparison of deep learningwith traditional time series ‘ther forecasting techniques, number 19.02 in Report / UMINF.Umeå universitet.

[3] Brown, S. F., Branford, A. J., and Moran, W. (1997). On the use of artificial neuralnetworks for the analysis of survival data. IEEE Transactions on Neural Networks,8(5):1071–1077.

[4] Casella, G. and Berger, R. L. (2002). Statistical inference, volume 2. Duxbury PacificGrove, CA.

[5] Chaoui, H., Ibe-Ekeocha, C. C., and Gualous, H. (2017). Aging prediction and state ofcharge estimation of a lifepo4 battery using input time-delayed neural networks. ElectricPower Systems Research, 146:189 – 197.

[6] Chemali, E., Kollmeyer, P. J., Preindl, M., Ahmed, R., and Emadi, A. (2018). Longshort-term memory networks for accurate state-of-charge estimation of li-ion batteries.IEEE Transactions on Industrial Electronics, 65(8):6730–6739.

[7] Conz, N. (2008). Insurers shift to customer-focused predictive analytics technologies.Insurance & Technology.

[8] Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal StatisticalSociety. Series B (Methodological), 34(2):187–220.

[9] Deloche, F. (2017). Recurrent neural network unfold.svg. https://creativecommons.org/licenses/by/4.0/.

[10] Ecker, M., Gerschler, J. B., Vogel, J., Käbitz, S., Hust, F., Dechent, P., and Sauer, D. U.(2012). Development of a lifetime prediction model for lithium-ion batteries based onextended accelerated aging test data. Journal of Power Sources, 215:248 – 257.

[11] Editorial (2018). Big data needs a hardware revolution. Nature, 554:145–146.

[12] Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2):179–211.

[13] Fine, J. P. (2001). Regression modeling of competing crude failure probabilities .Biostatistics, 2(1):85–97.

Page 72: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

58 References

[14] Gensheimer, M. F. and Narasimhan, B. (2019). A scalable discrete-time survival modelfor neural networks. PeerJ, 7:e6257.

[15] Gilbert, E. and Karahalios, K. (2009). Predicting tie strength with social media. InProceedings of the SIGCHI conference on human factors in computing systems, pages211–220. ACM.

[16] Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., and Schmidhuber, J.(2008). A novel connectionist system for unconstrained handwriting recognition. IEEEtransactions on pattern analysis and machine intelligence, 31(5):855–868.

[17] Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., and Schmidhuber, J. (2016).Lstm: A search space odyssey. IEEE transactions on neural networks and learningsystems, 28(10):2222–2232.

[18] Hair Jr, J. F. (2007). Knowledge creation in marketing: the role of predictive analytics.European Business Review, 19(4):303–315.

[19] He, W., Williard, N., Chen, C., and Pecht, M. (2013). State of charge estimation forelectric vehicle batteries using unscented kalman filtering. Microelectronics Reliability,53(6):840 – 847.

[20] Ihwah, A. (2015). The use of cox regression model to analyze the factors that influenceconsumer purchase decision on a product. Agriculture and Agricultural Science Procedia,3:78 – 83. International Conference on Agro-industry (IcoA): Sustainable and CompetitiveAgro-industry for Human Welfare Yogyakarta-INDONESIA 2014.

[21] James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An introduction to statisticallearning, volume 112. Springer.

[22] Jardine, A. K. S., Anderson, P. M., and Mann, D. S. (1987). Application of theweibull proportional hazards model to aircraft and marine engine failure data. Qualityand Reliability Engineering International, 3(2):77–82.

[23] Jossen, A., Späth, V., Döring, H., and Garche, J. (1999). Reliable battery operation — achallenge for the battery management system. Journal of Power Sources, 84(2):283 – 286.

[24] Junping, W., Jingang, G., and Lei, D. (2009). An adaptive kalman filtering based stateof charge combined estimator for electric vehicle battery pack. Energy Conversion andManagement, 50(12):3182 – 3186.

[25] Kaplan, E. L. (1983). This week’s citation classic. Current Contents, 24:14.

[26] Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete obser-vations. Journal of the American statistical association, 53(282):457–481.

[27] Kartsonaki, C. (2016). Survival analysis. Diagnostic Histopathology, 22(7):263 – 270.Mini-Symposium: Medical Statistics.

[28] Keim, D. B. and Stambaugh, R. F. (1986). Predicting returns in the stock and bondmarkets. Journal of financial Economics, 17(2):357–390.

Page 73: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

References 59

[29] Lunn, M. (2007). Oxford Undergraduate lecture notes BS3b: Statistical Lifetime Mod-els. URL: http://www.stats.ox.ac.uk/~mlunn/lecturenotes1.pdf. Last visited on 2019/06/07.

[30] Martinsson, E. (2016). Wtte-rnn: Weibull time to event recurrent neural network. Ph.D. dissertation, Masters thesis, University of Gothenburg, Sweden.

[31] McPherson, G. (2013). Statistics in scientific investigation: its basis, application, andinterpretation. Springer Science & Business Media.

[32] Moertel, C. G., Fleming, T. R., Macdonald, J. S., Haller, D. G., Laurie, J. A., Tangen,C. M., Ungerleider, J. S., Emerson, W. A., Tormey, D. C., Glick, J. H., Veeder, M. H., andMailliard, J. A. (1995). Fluorouracil plus Levamisole as Effective Adjuvant Therapy afterResection of Stage III Colon Carcinoma: A Final Report. Annals of Internal Medicine,122(5):321–326.

[33] Mudholkar, G. S. and Srivastava, D. K. (1993). Exponentiated weibull family foranalyzing bathtub failure-rate data. IEEE Transactions on Reliability, 42(2):299–302.

[34] Nelson, W. (1972). Theory and applications of hazard plotting for censored failure data.Technometrics, 14(4):945–966.

[35] Nyce, C. and Cpcu, A. (2007). Predictive analytics white paper. American Institute forCPCU. Insurance Institute of America, pages 9–10.

[36] Petricca, M., Shin, D., Bocca, A., Macii, A., Macii, E., and Poncino, M. (2014).Automated generation of battery aging models from datasheets. In 2014 IEEE 32ndInternational Conference on Computer Design (ICCD), pages 483–488.

[37] Saha, B., Goebel, K., Poll, S., and Christophersen, J. (2009). Prognostics methods forbattery health monitoring using a bayesian framework. IEEE Transactions on Instrumen-tation and Measurement, 58(2):291–296.

[38] Sak, H., Senior, A., and Beaufays, F. (2014). Long short-term memory recurrent neuralnetwork architectures for large scale acoustic modeling. In Fifteenth annual conference ofthe international speech communication association.

[39] Serrao, L., Chehab, Z., Guezennee, Y., and Rizzoni, G. (2005). An aging model ofni-mh batteries for hybrid electric vehicles. In 2005 IEEE Vehicle Power and PropulsionConference, pages 8 pp.–.

[40] Shlens, J. (2014). A tutorial on principal component analysis. arXiv preprintarXiv:1404.1100.

[41] Tina, G. and Capizzi, G. (2008). Improved lead -acid battery modelling for photovoltaicapplication by recurrent neural networks. In 2008 International Symposium on PowerElectronics, Electrical Drives, Automation and Motion, pages 1170–1174.

[42] Wang, Y., Kung, L., and Byrd, T. A. (2018). Big data analytics: Understanding itscapabilities and potential benefits for healthcare organizations. Technological Forecastingand Social Change, 126:3–13.

Page 74: Deep learning for big data analytics for predictive …umu.diva-portal.org/smash/get/diva2:1360100/FULLTEXT01.pdfSurvival Analysis using Deep Learning for Predictive Aging Models of

60 References

[43] Wei, C.-P. and Chiu, I.-T. (2002). Turning telecommunications call details to churnprediction: a data mining approach. Expert systems with applications, 23(2):103–112.

[44] Wetz, D. A., Shrestha, B., Donahue, S. T., Wong, D. N., Martin, M. J., and Heinzel, J.(2015). Capacity fade of 26650 lithium-ion phosphate batteries considered for use withina pulsed-power system’s prime power supply. IEEE Transactions on Plasma Science,43(5):1448–1455.

[45] William R. Lane, S. W. L. and Wansley, J. W. (1986). An application of the coxproportional hazards model to bank failure. Journal of Banking & Finance, 10(4):511 –531.

[46] Wolf (1988). Recurrent nets for the storage of cyclic sequences. In IEEE 1988International Conference on Neural Networks, pages 53–60 vol.1.

[47] won You, G., Park, S., and Oh, D. (2016). Real-time state-of-health estimation forelectric vehicle batteries: A data-driven approach. Applied Energy, 176:92 – 103.

[48] Wu, C.-H., Ho, J.-M., and Lee, D.-T. (2004). Travel-time prediction with support vectorregression. IEEE transactions on intelligent transportation systems, 5(4):276–281.