Machine learning based decline curve analysis for short

Research Article

Machine learning baseddecline curve analysis forshort-term oil productionforecast

Amine Tadjer , Aojie Hong andReidar B Bratvold

Abstract

Traditional decline curve analyses (DCAs), both deterministic and probabilistic, use specific

models to fit production data for production forecasting. Various decline curve models have

been applied for unconventional wells, including the Arps model, stretched exponential model,

Duong model, and combined capacitance-resistance model. However, it is not straightforward to

determine which model should be used, as multiple models may fit a dataset equally well but

provide different forecasts, and hastily selecting a model for probabilistic DCA can underestimate

the uncertainty in a production forecast. Data science, machine learning, and artificial intelligence

are revolutionizing the oil and gas industry by utilizing computing power more effectively and

efficiently. We propose a data-driven approach in this paper to performing short term predictions

for unconventional oil production. Two states of the art level models have tested: DeepAR and

used Prophet time series analysis on petroleum production data. Compared with the traditional

approach using decline curve models, the machine learning approach can be regarded as” model-

free” (non-parametric) because the pre-determination of decline curve models is not required.

The main goal of this work is to develop and apply neural networks and time series techniques to

oil well data without having substantial knowledge regarding the extraction process or physical

relationship between the geological and dynamic parameters. For evaluation and verification

purpose, The proposed method is applied to a selected well of Midland fields from the USA.

By comparing our results, we can infer that both DeepAR and Prophet analysis are useful for

gaining a better understanding of the behavior of oil wells, and can mitigate over/underestimates

resulting from using a single decline curve model for forecasting. In addition, the proposed

approach performs well in spreading model uncertainty to uncertainty in production forecasting;

that is, we end up with a forecast which outperforms the standard DCA methods.

Department of Energy Resources, University of Stavanger, Stavanger, Norway

Corresponding author:

Amine Tadjer, University of Stavanger, Kjell Arholms gate 41, Stavanger 4036, Norway.

Email: [email protected]

Energy Exploration & Exploitation

0(0) 1–23

! The Author(s) 2021

DOI: 10.1177/01445987211011784

journals.sagepub.com/home/eea

Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution

4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and

distribution of the work without further permission provided the original work is attributed as specified on the SAGE and

Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).

https://orcid.org/0000-0003-3720-6734

mailto:[email protected]

http://dx.doi.org/10.1177/01445987211011784

journals.sagepub.com/home/eea

http://crossmark.crossref.org/dialog/?doi=10.1177%2F01445987211011784&domain=pdf&date_stamp=2021-05-17

Keywords

Deep learning, probabilistic modeling, production forecasting, time series analysis

Introduction

Hydrocarbon production forecasting includes estimation of the ultimate recoveries and thelifetimes of wells, which are material factors for decision-making in the oil and gas industrybecause they can impact significantly economic evaluation and field development planning.Although mathematically richer forecasting models (e.g., grid-based reservoir simulationmodels) have been developed over the past decades, decline curve analysis (DCA) is stillwidely used because of its simplicity: The mathematical formulations of DCA models aresimple with only a few parameters, and only production data are required to calibrate theparameters. The Arps model (Arps, 1945) has been used for DCA for more than 60 yearsand has been proved to perform well for conventional reservoirs. However, because of thecomplexity of flow behaviors in unconventional reservoirs as several flow regimes areinvolved (Adekoya, 2009; Joshi, 2012; Nelson, 2009) the Arps model may not be ideal,and many other models have been proposed (e.g., the Stretched Exponential declinemodel (Valk�o and Lee, 2010), the Duong model (Duong, 2011) and the combinedcapacitance-resistance model proposed by Pan (Pan, 2016). The Pan model is subsequentlyreferred as Pan CRM is this paper. Some researchers (e.g. (Gonzalez et al., 2012)) haveattempted to identify a single ed as Pan CRM is this paper. Some researchers (e.Hong et al.,2019) have argued that selecting a single ed as Pan CRM is this paper. Some researchers (e.g.(pacitance Resistaconfidence (i.e., trust the single model 100%) , which can cause significantover/underestimates. Thus, their proposed approach incorporates multiple models by usingMonte Carlo simulation to assess the probability of each model and consequently provides aprobabilistic forecast of production. Some limitations of Hong et al.’s approach are: (1) acollection of DCA models still needs to be predefined, and (2) the assessed probability ofeach model is only a measure of the model’s relative goodness to other models. If, forexample, all the candidate models overestimate production, using Hong et al.’s approachwill still result in an overestimated forecast. Thus, an approach that does not require thepredefinition of DCA models is deemed preferable; i.e., using a non-parametric model.Machine learning (ML) is still a relatively new technique in the oil and gas industry.Several researchers have discussed the applications of ML for DCA. For instance,(Gupta et al., 2014) used neural networks (NNs)d neural networkfor DCA. They firsttrained the NNs using historical data to capture the decline in production in shale forma-tions, and the trained model was then used for prediction. This study also used the autor-egressive integrated moving average (ARIMA) (George et al., 2015), a time series analysis toanalyze the historical data and identify the trends and relationships of historical and pre-dicted data. Although they applied these two methods for a sample size of around 30 wells,but they did not quantify uncertainties in the forecasted results. (Ma and Liu, 2018) pre-dicted the oil production using the novel multivariate nonlinear model based on traditionalArps decline model and a kernel method. (Aditya et al., 2017) developed a novel predictivemodeling methodology that linked well completion and location features to DCA modelparameters. The objective of the methodology was to generate predicted decline curves atpotential new well locations. (Han et al., 2020) used Random Forest (RF) to develop a

2 Energy Exploration & Exploitation 0(0)

predictive model that can be used to predict productivity during the early phase of produc-tion (within 6months). The required datasets were obtained from 150 wells, targeting shalegas, stationed at Eagle Ford shale formations. Reservoir properties, well stimulation andcompletion were considered as key input parameters whilst the cumulative production of gasduring a span of 3 years was identified as the target variable.Although (Aditya et al., 2017;Han et al., 2020) results were promising, the applicability of their methodology dependsheavily on the presence of specific geological, well stimulation and completion data, and thequality and accuracy of the data have a big impact and influence, any anomaly in dataconsequently make their results less promising.In the context of deep learning, (Luo et al.,2019) built non-linear models using RF and Deep Neural Network (DNN) algorithms toforecast the cumulative production of oil during a span of 6months. The whole dataset wasobtained from around 3600 wells positioned at Eagle Ford formations. Key parametersassociated with geological parameters such as structural depth, thickness of the formation,total organic carbon (TOC), number of calcite layers and average thickness of the layer(thickness of the formation divided by the total count of the layers) were identified as theinput variables that impacted the productivity of wells in Eagle Ford. In the context of DeepRecurrent Neural Networks, (Lee et al., 2019) used the Long short-term memory (LSTM)algorithm to develop a model for forecasting future shale-gas production. The gas produc-tion and shut-in period of the past were taken into account to deduce the input features. Thetraining dataset was collected from 300 wells located in Alberta, Canada, at the Duverneyformation. For 15 wells stationed in the same field, the model was tested. The trained modeldemonstrated the ability to predict production rates over a longer period (55months). Theyfound out that the method can be used even faster to forecast future production rates andanalyze the impact of added attributes such as the shut-in period. It was highlighted that theapproach would provide a more reliable and accurate forecast of the production of shale gasand that this method can be used in both traditional and unconventional scenarios. In termsof reliability and utilization, further tuning and improvement of the feature selection processwill produce a system with improved predictive capabilities. Stimulation parameters attrib-ute derived from geological knowledge, and refracturing were proposed to be included aspossible features that have a critical impact on shale gas production and improve themethode the methodgeological knowledge, and hat the ahose circumstances, the high-intensity drilling associated with unconventional hydrocarbon resources and the underper-formance of DCA make this technique more successful. (Zhan et al., 2019) checked theLSTMmethoduded as possible features that have a critiduction of oil over two years or evenfurther by using very little previous data acquired during the initial production phases.From over 300 wells stationed in unconventional onshore formations, the required datasetwas obtained. Over the first few production years, it is possible to recover around 70% ofthe total EUR from shale wells, after which a rapid decline is observed. The steepness of thedecline makes it hard to survey the trend that causes over-estimation. In such a dynamicsituation, they highlighted this methodology’s value for forecasting production and assess-ing the reservoir. They found that the average difference between the accumulated produc-tion estimated and observed stayed within 0.2%, while the variance did not exceed 5%.(Sagheer and Kotb, 2019) tested deep LSTM (DLSTM) network predictive efficacy in whichmore LSTM layers were stacked to address shallow structure limitations when operatingwith data from long interval time series. They figured that the proposed approach per-formed much better than other models used in the analysis, like those based on ARIMA,deep gated recurrent unit (GRU), and deep RNN. Based on their applicability, the models

Tadjer et al. 3

were tested and validated in two real-field case studies, such as India’s Cambay Basin oilfield and China’s Huabei oil field. An ensemble empirical mode decomposition (EEMD)based LSTM was suggested by (Liu et al., 2020) to increase the oil production forecastingspeed and accuracy. Two real-field events, the JD and SJ oilfields based in China, weretested to determine and verify the efficacy of the model. The EEMD-LSTM model wascontrasted with models EEMD-Suport Vector Machine (SVM) and EEMD-Neuralnetowrk. The EEMD-LSTM model has been found to work much better as compared tothe other models by producing the forecast perfectly and with great quality. Althoughseveral machine learning and deep learning models have been proposed to learn better onhow to handle multiple seasonal patterns in oil production data. However, to the best of ourknowledge, no studies have yet applied such probabilistic model for production forecasting.The novelty of this work is to improve upon the existing techniques used by petroleumengineers to analyze and appraise oil wells. Evaluating oil well potential is a lengthy inves-tigation process. This is because the production profiles can be complex, as they are drivenby reservoir physics and made even more challenging by a variety of operational events.Petroleum engineers analyze and evaluate the production profiles of oil wells, understandtheir underlying behavior, forecast their expected production, and identify opportunities forperformance improvements. The investigation process is, nevertheless, time-consuming.This introduces opportunities to optimize these processes. Thus, State-of-the-art level prob-abilistic machine learning methods are considered DeepAR (David et al., 2019) and Prophettime series analysis (Taylor and Letham, 2007), that are known to be effective in patternrecognition and outperforming the state-of-the-art forecasting methods on several prob-lems. These two algorithms can be used to understand and predict the behavior of oilwells. Our objective is to determine the viability of these algorithms in predicting the dis-tribution of future outcomes, specifically with time series data representing the oil produc-tion of petroleum without having substantial knowledge regarding the extraction process orphysical relationship between the geological and dynamic parameters. In the remainder ofthis paper, we first review the DL and time series analysis modeling that will be used toaccomplish the task. Thereafter, we explain the evaluation metrics used to assess the qualityof the forecast; and finally, we present the experimental results and a discussion of our work.

Time series analysis and DCA

Time series analysis

A time series is a sequence of data obtained at many regular or irregular time intervals andstored in a successive time order; for example, a sequence of measured oil production ratesover time. The objective of time series analysis is to extract useful statistical characteristics(e.g., trend, pattern, and variability) from a time series, to determine a model that describesthe characteristics, to use the model for forecasting, and ultimately to leverage insightsgained from the analysis for decision supporting and making. Traditionally, time seriesmodels can be classified into generative and discriminative models, depending on how thetarget outcomes are modeled (Ng and Jordan, 2002). The main difference between the twomodels is that generative models predict the conditional distribution of the future values ofthe time series given relevant covariates while the discriminative models use the past value.In this study, we will use discriminative models, as they are more flexible and require fewerparameters and structural assumptions than generative models. For more details about


generative and discriminative model, see (David et al., 2019; Gasthaus et al., 2019; Ng andJordan, 2002; Ruofeng et al., 2018).

A critical aspect of discriminative models is the process of reconstructing a singlesequence of data points to yield multiple response observations. To solve this, sequence-to-sequence (seq2seq) (Cho et al., 2014) and autoregressive recurrent networks (David et al.,2019) approaches were used to feed and generate output from time series prediction models.In seq2seq, the model is fed a sequence of time series as inputs, and it produces a time seriessequence as output, unlike the autoregressive model, which reduces the sequence predictionto a one-step-ahead problem.

DCA

DCA is a type of time series analysis with data type of oil production data. DCA aims topredict the future production of a well or a field based on historical data. The prediction isuseful for evaluating the economics of the future production and supporting decisions suchas whether a well or a field should be abandoned. Panutoregressive model, which reducesthe se(Pan CRM) is DCA method. It is designed to capture the major flow regimes–transientand semi-steady state flow regimes–relevant for an unconventional well. (Pan, 2016) pro-posed a model to capture the productivity index behavior over both linear transient andboundary-dominated flow. Its formula is given as:

J ¼ bffiffit

p þ J1 (1)

where J is the productivity, J1 is the constant productivity index that a well will eventuallyreach at boundary dominated flow, b is the parameter of the linear transient flow, b isrelated to the permeability in the analytical solution of linear flow into fractured wellspresented by (Wattenbarger et al., 1998). Pan obtained the empirical solution of rate overtime by combining the previous equation and a tank material balance equation.The stan-dard form is given as:

qðtÞ ¼ DPð bffiffit

p þ J1Þe�ð2b ffiffit

p þJ1t=ctVpÞ (2)

where ct the total compressibility, Vp the drainage pore volume, and DP is the differencebetween the initial reservoir pressure and the assumed constant flowing bottom hole pres-sure. For small t, the Pan CRM may offer an unrealistically high rate, as q(t) approachesinfinity when t approaches 0. The Pan CRM is analytically derived and has all the param-eters associated with a reservoir system For small ation.The standard form is given as:University ofct, Vp, DP, b and J1 are determined through history matching with the goalto minimize a predefined loss (or objective) function by adjusting the model parameters.

Machine-learning models and techniques for time series analysis

Prophet forecasting model

The Prophet forecasting is a bayesian nonlinear univariate generative model for time seriesforecasting, which was developed by the Facebook Research team (Taylor and Letham,

Tadjer et al. 5

2007) for the purpose of creating high-quality multistep-ahead forecasting. This model triesto address the following difficulties common to many types of time series forecasting andmodeling:

• Seasonal effects caused by human behavior: weekly, monthly, and yearly cycles; dips andpeaks on public holidays;

• Changes in trends due to new products and market events;• Outliers.

The Prophet forecasting model utilizes the additive regression model, which comprises ofthe following components:

y tð Þ ¼ g tð Þ þ s tð Þ þ h tð Þ þ et (3)

where y(t) is the variable of interest, g(t) is the piecewise linear or logistic growth curve formodeling non-periodic changes in a time series, seasonality s(t) represents periodic changes(e.g., weekly or yearly seasonality), h(t) reflects the effects of irregular holidays, and etrepresents the error term that accounts for any uncertain changes not accommodated bythe model (usually, et is modeled as normally distributed noise).

We invoke the growth trend g(t) as a core component of the entire Prophet model. Thetrend illustrates how the entire time series expands and how it is projected to evolve in thefuture. For analysts, Prophet proposes two models: a piecewise-linear model and asaturating-growth model.

Nonlinear, saturating growth is modeled using the logistic growth model, which occurs asfollows in its most basic form:

gðtÞ ¼ C

1þ exp�kð1�mÞ (4)

Where m is an offset parameter, k is the growth rate, and C is the carrying capacity.However, the value of C is not inherently a constant, which usually varies over time. It wasthen replaced by a time-varying capability C(t). Moreover, the growth rate of k is notconstant. Therefore, it is presumed that the change-point where growth rates change hasbeen integrated and the growth rate between two change-points is constant.

The piecewise logistic-growth model is formed as follows:

gðtÞ ¼ C

1þ exp�ðkþaðtÞTdÞðt�ðmþaðtÞTcÞÞ (5)

where c is the vector of rate adjustments, d is the vector of correct adjustments at change-points, and kþ aðtÞTd is the growth rate at time t. a(t) is defined by the following:

aðtÞ ¼ 1 t � s0 otherwise

� �(6)

where, s is the time point of change in the growth rate.


Linear growth is modeled using a constant growth rate piecewise, and its formula is

given as:

gðtÞ ¼ ðkþ aðtÞTdÞtþ ðmþ aðtÞTcÞ (7)

Where a(t), k, d, and c are the same as the nonlinear trend model.In the time series, seasonality reflects periodic changes daily, weekly monthly and yearly

seasonality. To provide a versatile model of periodic effects, the Prophet forecasting model

depends on a Fourier series. Its smooth fitting formula is given as:

sðtÞ ¼XNn¼1

ancos2pntP

� �þ bnsin

2pntP

� �� (8)

where P is a regular period that the time series may have (for example, P¼ 7 for weekly data

or P¼ 365 for annual data) and N is the number of such cycles that we want to use in the

model. The final seasonal model appears as follows when combining all seasonal time series

models in s(t) into a vector X(t):

sðtÞ ¼ XðtÞb (9)

where b Normalð0; r2Þ is needed before the seasonality to enforce a smoothing.Holidays and events: To completely understand the effect on holidays of a business time

series or other major events such as workover, production shutdown for operations (for

example, a workover), these constraints are explicitly set by the Prophet forecast model.

Recurrent neural network (RNN)

Compared with the traditional artificial neural network (ANN), the structure of RNN

neuron is different from that of ANN by adding a cyclic connection, which form feedback

loops in hidden layers, and hence the information of the last item in RNN can be trans-

mitted to the current item. The structure of RNN neuron is shown in Figure 1. When the

Figure 1. The structure of Recurrent Neural Network.

Tadjer et al. 7

time series X ¼ ðx1; x2; x3; . . . ; xnÞ is input, the sequence of hidden layer is H ¼ðh1; h2; h3; . . . ; hyÞ and the sequence of output layer is Y ¼ ðy1; y2; y3; . . . ; ynÞ.

The relationship of X, H and Y are listed in the following equations:

hn ¼ rðWxhxn þWhhhn�1 þ bhÞyn ¼ Whyhn þ by

( )(10)

where, r is the non-linear activation function, Wxh,Whh andWhy are the weight matrix from

input to hidden layer, hidden layer to hidden layer and hidden layer to output, respectively,

bh and by are biased terms.

Long short-term memory neural network (LSTM)

The LSTM neural network model (Greff et al., 2017; Hochreiter and Schmidhuber, 1997) is

a type of RNN structure, which is widely used to solve sequence problems. An LSTM tends

to learn long -term dependencies and solve the vanishing gradient problems1 (Grosse, 2017),

an issue observed in training ANN with gradient based learning techniques as well as

backpropagation algorithms. An LSTM allows the storage of information extracted from

data over an extended time period, and shares the same parameters (i.e., network, weights)

across all timesteps.The structure of the LSTM shown in Figure 2. consists of the long term state ðctÞ and

three multiplicative units N with gðitÞ, output gate ðotÞ, and forget gate ðftÞ— and equiva-

lently write, read, and reset information within the model’s cells. These three multiplicative

gates enable the LSTM memory cells to store and access information over long time periods.

The gates control the amount of information fed into the memory cell at a given timestep.

Unlike traditional RNN methods that overwrite new content at each timestep, the LSTM

state vector and weights are modified at each timestep to take into account any evolution of

the input-output relation occurring over time and carry that information over a long

Figure 2. Architecture of an LSTM cell (Geron, 2017).


distance. The LSTM functions are listed as follows:

it ¼ r Whiht�1 þWcict�1 þWxixt þ bið Þft ¼ r Whfht�1 þWcfct�1 þWxfxt þ bfð Þ�ct ¼ tanh Whcht�1 þWxcxt þ bcð Þct ¼ ft � ct�1 þ it � �ct

ot ¼ r Whoht�1 þWxoxt þ boð Þht ¼ ot � tanhðctÞ

8>>>>>>>><>>>>>>>>:

9>>>>>>>>=>>>>>>>>;

(11)

Where the input gate ðitÞ, a forget gate ðftÞ and previous cell state ð�ctÞ control the current cellstate ðhtÞ, and the output gate otÞ and current cell state ðctÞ are used to control the hidden stateðhtÞ at time t.r is the element-wise sigmoid function, � denotes the elementwise dot productoperator, xtÞ is the input vector at time t, and ht�1 is the hidden state vector that store all theuseful information prior to time t.Wxi,Wxf,Wxc, andWxo denote the weight matrices of differentgates for input xtÞ; Whi, Whf, Whc, and Who are the weight matrices for hidden state ht; Wci, andWcf denote the weight matrices of cell state ct�1; and bi, bf, bc, and bo denote the bias vector.

Gated recurrent unit (GRU)

The GRU is similar to the LSTM, but with a simplified structure and parameters. It was firstintroduced by (Kyunghyun et al., 2014). GRUs have been used in a variety of tasks thatrequire capturing long-term dependencies (Junyoung et al., 2014). Similar to the LSTM, theGRU contains gating units that modulate the flow of information inside the unit. However,unlike the LSTM, the GRU does not include separate memory cells, and contains only twogates—the update gate and the reset gate—as displayed Figure 3. The update gate zt decideshow often the unit updates its activation functions. This process takes a linear sum between theexisting state and a newly computed state. The second gate within the GRU, the reset gate rt,acts to forget the previously computed state. The updated functions are listed as follows:

ht ¼ 1� ztÞht�1 þ zt ~ht

� �zt ¼ r Wzxt þUzht�1ð Þ~ht ¼ tanh Wxt þUð rt � ht�1ð ÞÞrt ¼ r Wrxt þUrht�1Þð Þ

8>>>><>>>>:

9>>>>=>>>>;

(12)

Figure 3. Gated Recurrent Unit: rÞ and z are the reset and update gates, and ht and ~ht are the activationand the candidate activation (Kyunghyun et al., 2014).

Tadjer et al. 9

Where the activation ht of the GRU at time t is a linear interpolation between the pre-

vious activation ht�1and the candidate activation ~ht, W denotes the weight matrices, xt is the

input vector at time t,and U denotes the weight matrices of the cell state.

DeepAR

DeepAR is a generative, auto-regressive model. It consists of a recurrent neural network

(RNN) using Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU) cells

that takes the previous time points and covariates as input. In this study, We use the fore-

casting model from Salinas et al. (David et al., 2019). Unlike other methods of forecasting,

DeepAR jointly learns from every time series. In (David et al., 2019) publication, DeepAR

was outperforming the state-of-the-art forecasting methods on many problems.Let zi;t be the value of time series i at time t, the objective is to model the conditional

distribution Pðzi; t0:Tjzi;1:t0�1; xi;1:TÞ, of the future of each time series ½zi;t0 ; zi;t0þ1; . . . ; zi;T� :¼

zi;t0:T , given its past ½zi;1; . . . ; zi;t0�2; zi;t0�1

� :¼ zi;1:t0�1, where t0 represents the time point from

which zi;t is assumed to be unknown at prediction time, and xi;1:T are covariates that are

presumed to be known for all time points. The time ranges ½1 : t0 � 1� and ½t0 : T� are the

context range and the prediction range, respectively. The model is based on an auto-

regressive recurrent network, summarised in the Figure 4. The model distribution

QHðzi;t0 jzi;1:t0 ; xi;1:TÞ; xi;1:T is considered to be a product of likelihood factors:

QHðzi;t0 jzi;1:t0 ; xi;1:TÞ ¼YTt¼0

QHðzi;tjzi;1:t�1; xi;1:TÞ

¼YTt¼0

lðzi;tjhðhi;t;HÞÞ(13)

Figure 4. Model Summary: The network inputs are the covariates xi;t at each step t, the goal value atthe previous step zi;t�1, and the previous network output hi;t�1 at each step t. The network output hi;t ¼hðhi;t�1; zi;t�1; xi;t;HÞ is then used to measure the parameters hi;t ¼ hðhi;t;HÞ of the probability lðzjhÞ that isused to train the parameters of the model. A sample z i;t�lð�jhi;tÞ is fed back to the next step instead of thetrue value when zi;t is unknown (David et al., 2019).


with hi;t the autoregressive recurrent network output hi;t ¼ hðhi;t�1; zi;t�1; xi;t;HÞ - which will

be fed as the next timestep input for hi;tþ1- hð�Þ is a function that is implemented by a multi-

layer recurrent neural network with LSTM or GRU cells parametrized by k, summarised in

tood lðzi;tjhðhi;t;HÞ being a fixed distribution parametrized by a function hðhi;t;HÞ. Thehi;t0�1, initial state contains zi;t0�1, context range information required to predict values in

the prediction range.Given the model parameters required to pred zi;t0:T �QHðzi;t0 jzi;1:t0 ; xi;1:TÞ can be obtained

directly by ancestral sampling: First, hi;t0�1 is obtained as as a recurrent network output,

then we sample zi;t0:T � lð�jhðhi;t;HÞÞ for t ¼ 1; . . . ; t0 � 1, where hi;t ¼ hðhi;t�1; zi;t�1; xi;t;HÞis initialized with hi;t0�1 ¼ hi;t0�1 and zi;t0�1 ¼ zi;t0�1. The use of these samples makes it

possible to calculate quantities, like the value distribution quantities, at a particular time

in the prediction range.

Likelihood model. The probability of lðzjhÞ should at best reflect the data statistical properties.

It can be selected between any potential possibility, for example, Bernoulli, Gaussian,

Binomial-negative, etc.For instance, the mean and the standard deviation are the parameters h ¼ ðl; rÞ in the

Gaussian likelihood case. These are provided to the network output respectively by the

network output and softplus activation to ensure r > 0:

lGðzjl; rÞ ¼ 1ffiffiffiffiffiffiffiffiffiffi2pr2

p expð�ðz� lÞ2Þ2r2

Þ;

lðhi;tÞ ¼ wTlhi;t þ bl;

rðhi;tÞ ¼ logð1þ expðwTrhi;t þ brÞÞ

(14)

Loss function. The model parameter H which consists of the RNN hð�Þ parameters and the

hð�Þ parameters, can be learned by maximizing the log-likelihood, as follows:

i ¼XNi¼1

XTt¼t0

loglðzi;tjhðhi;tÞ (15)

with the time series dataset zi;1:Ti:1;...;N and known related covariates xi;1:T. No inference is

needed to calculate the previous equation compared to state-space models with latent var-

iables, as hi;t is a deterministic input function. It can therefore be explicitly optimized with

respect to th latent variables, as tplus activ

Measures for evaluating forecast

As previously mentioned, the purpose of this task is to predict several future timesteps in the

target time series. Confidence intervals are also given and predicting the exact values (such

as point forecasting). These are based on percentiles calculated from a probability

Tadjer et al. 11

distribution based on a fixed number of samples (e.g. DeepAR model). To evaluate theforecast accuracy, we use the mean continuous ranked probability score (mean CRPS).

Mean (CRPS): used to quantify both the accuracy and precision of a probabilistic fore-cast (Hersbach, 2000). A higher value of mean CRPS indicates less accurate results. CRPScan be defined as:

CRPS ¼Z 1

�1pðxÞ �Hðx� xobsÞ½ �2dx (16)

Here, pðxÞ ¼Z x

�1pðyÞdy is the cumulative distribution of a quantity of interest, andHðx� xobsÞ

is the step function, i.e.,

HðxÞ ¼ 0 if < 01 ifP 0

� �(17)

For N samples, the CRPS can be evaluated as follows:

CRPS ¼XNi¼0

cici ¼ aip2i þ bið1� piÞ2 (18)

where pi ¼ PðxÞ ¼ i=N; for xi < x < xiþ1 (piecewise constant function).

ai ¼0 if xobs < xi

xobs � xi if xi < xobs < xiþ1

xiþ1 � xi if xobs > xiþ1

8<:

9=; (19)

bi ¼xiþ1 � xi if xobs < xixiþ1 � xobs if xi < xobs < xiþ1

0 if xobs > xiþ1

8<:

9=; (20)

Data collection and preparation procedure

In this work, we use oil production data from wells in the Midland field. We have selected 22Midland wells, relatively smooth data, which indicates fewer significant operationalchanges. The selected Midland wells have been completed in a natural fractured reservoirand measured monthly. However, there are some missing measurements (i.e., no recordedvalues) for a few months for each selected well. We simply ignore these missing values. Somemeasurements have recorded zero values, and we suspect they indicate temporary shut-down for operations (e.g., a workover). The zero values may interfere with the trainingprocess, so we remove them from the data, then the datasets are rescaled with a standard-ization.The standardization is included in deep learning to improve neural networks con-vergence. Table 1 lists the lengths of production history of the selected wells. The lengthsrange from 105 to 362months. No matter how long a well’s production history is, we use the


data of the last 24months (regarded as a short term) for the blind test. Taking Well-ID3 as

an example: As shown in Figure 5, the data covers 108months. The data from Month 1 to

84 are used for building training and forecasting model using DeepAR and Prophet model,

and the data from Month 85 to 108 are used for blind testing to assess the performance of

prediction results. The same procedure is applied to the 22 selected wells individually.

Models implementation

The two models considered, DeepAR and the Prophet time series, are evaluated based on

Midland datasets. The experimental setup which is shared for each dataset evaluation is first

described before the dataset experiments.

DeepAR. DeepAR is a model presented by (David et al., 2019) and then implemented in

Gluon Time Series (GluonTS). GluonTS is a toolkit developed by Amazon scientists based

on the Gluon framework (Alexander et al., 2019). It aims to regroup all the tools required to

build deep learning models for time series forecasting and anomaly detection. DeepAR

configurations are trained as it is not implemented without early stopping during its

Table 1. Production history length of selected midland field wells.

Well-ID 3 5 8 9 11 14 15 16 17 18 20

Length (months) 108 105 105 109 106 308 315 319 362 344 311

Well-ID 21 22 72 142 156 157 171 181 206 249 524

Length (months) 314 307 112 235 246 253 162 136 134 133 105

Table 2. DeepAR fixed training hyperparameters.

Hyperparameter Value

Epochs 100

Batch size 32

Batches/epoch 100

Table 3. DeepAR hyperparameters to optimize for each well.

Hyperparameter Value

Context length 24

Layers 1, 2, 4

Cell type GRU, LSTM

Cell hidden state size 1,54,560

Gaussian number 1, 3, 8

Dropout rate 0.1, 0.4, 0.6

Learning rate 1e-4, 1e-3, 1e-2

Tadjer et al. 13

optimization (cf. Tables 2 and 3). If necessary, the final models are trained on the best

parameters without early stopping or validation failure on a new training set. For each

selected wells, the optimization is performed on the following parameters:

• The context length and the learning rate: The number of prior timesteps taken to make

the most precise forecasts. The tested context lengths depend on the data provided.• Stacking layer number: The number of layers in the recurrent neural network.• The cell type: Cell type in the recurrent neural network (GRU or LSTM).• The number of gaussians: The number of Gaussians considered to be the probability

distribution of each timestep in Gaussianssianse the• The dropout rate: The output of each LSTM cell is feed to a Zoneout2 cell which uses this

dropout rate (David et al., 2017).

Prophet time series implementation. An open-source implementation of the Temporal Prophet

time series model that was published with the paper (Taylor and Letham, 2007) can be

found on this web documentation. For each well, the main hyperparameters which can be

tuned are:

• Changepoint prior scale: This is likely the parameter that is most impactful. It determines

the trend change points in particular. If it is, too, the trend will overfit,if it is too small,

the trend will be underfitting, and variation that should have been modeled with trend

changes will be treated with the noise term instead. The default value of 0.05 works for

several time series, but this can be tuned; the range is [0.001, 0.5].• Sasonality prior scale: This parameter regulates the flexibility of seasonality. Similarly, a

large value helps the seasonality respond to large variations, a small value shrinks the

magnitude of the seasonality. The default parameter is 10, with practically no regularisa-

tion being applied. This is because overfitting occurs here very rarely (there is inherent

Figure 5. History of oil production rate, Well-ID3.


Figure 6. Oil production time series forecast- DeepAR. Left: Overview of forecast. Right: zoomed forecast.

Table 4. Mean CRPS of probabilistic forecast from DeepAR model.

Well-ID 3 5 8 9 11 14 15 16 17 18 20

Mean CRPS 69 29.15 74 93.5 80.9 15.25 7.89 19.17 9.91 12.01 15.18

Well-ID 21 22 72 142 156 157 171 181 206 249 524

Mean CRPS 23.18 22.14 8.35 27.39 3.47 10.50 21.67 21.37 19.69 27.80 29.91

Tadjer et al. 15

Figure 7. Oil production time series forecast- Prophet.


regularisation because it is modeled with a truncated Fourier sequence, so it is filtered

practically low-pass). [0.01, 10] would possibly be a good range for tuning it.• Growth: Options are le: This parameter regu

Results

Figure 6 demonstrates the forecast results for some selected wells using the DeepAR, the

means of forecasts (dashed steel blue curve) comparing to blind-test data (dashed red curve)

and Pan CRM model (black line curve). In general, the production forecast seems to be

reasonable, the DeepAR model can forecast both the upward and downward trends gen-

erally well and outperform the Pan CRM model, it is observed that the prediction intervals

are, mostly, containing the correct values, except for the well-ID11, this could be explained

by being incapable of predicting when changes in production are going to happen. We

quantify the accuracy of the probabilistic forecast using the mean CRPS score as listed in

Table 4. The results for prediction accuracy are quite satisfactory. In most cases, the mean

CRPS decreases as the length of production history increases. This indicates that a longer

production history (i.e., more data) will improve the DeepAR model forecast. A major

drawback of DeepAR is that it has very little to no interpretability. We cannot interpret

any physical meanings from the trained DeepAR model parameters.Figure 7 shows forecasts from the trained Prophet models. The means of forecasts (the

dashed Steel blue curve) follow the blind-test data (dashed red curve in Figure 7) generally

well. The P5-P95 prediction intervals (grey band in Figure 7) covers most of the blind-test

data. However, for Well-ID8, the forecast significantly deviates from blind-test data and

fails to capture both trends and the peaks and troughs reasonably; more specifically, the

forecast underestimates the oil production rates.Compared to Prophet, the DeepAR models represent distinct trends in the mean CRPS

score as listed in Table 5. This is possibly due to the DeepAR layerresent distinct trends in

the mean CRPS score as listed in est data and fails to capture both trends and the pf the

most previous historical data. Besides, DeepAR indicates that the lowest CRPS errors arise.

Simultaneously, the difference in values is minimal, even though this statement is only valid

for the 5-th and 95-th percentileses. This is demonstrated by the better coverage earned by

the longer periods that compensate for the 50-th percentile’s low accuracy.Limitation: In the previous section, we presented DeepAR and Prophet trends in the

mean CRPS score as listemonths (2 years). We evaluate the performance of the two methods

for a forecast horizon of 48months, as displayed in both Figures 8 and 9, It can be obviously

seen that the two methods exhibited quite similar performance almost equally well when the

length of wells more than 300months, for the most part, they well capture the trends of oil

production rate in blind tests, and the predictions yielded by each of the models appear to be

quite similar. The models were good at predicting trends and flat lines, but sometimes

Table 5. Mean CRPS of probabilistic forecast from Prophet model for each well.

Well-ID 3 5 8 9 11 14 15 16 17 18 20

Mean CRPS 63.4 29.15 174 153.5 180.9 22.83 14.45 32.06 9.41 16.45 25.73

Well-ID 21 22 72 142 156 157 171 181 206 249 524

Mean CRPS 38.21 33.06 14.55 34.27 13.42 14.20 24.65 27.24 18.39 28.63 36.21

Tadjer et al. 17

Figure 8. Oil production time series forecast- DeepAR - 48months horizon forecast.


Figure 9. Oil production time series forecast- Prophet - 48months horizon forecast.

Tadjer et al. 19

undershot/overshot the peaks and troughs, i.e., Well-ID8. However, both the Prophet andDeepAR did not match production data including quantifying uncertainty, with a smallhistorical data length. Based on the previous results, we can highlight that the two methodsenrich the family of time series analysis models by extracting the weighted differencing/trendfeature, and contribute to better performance in short-term oil production forecasts, and itcan be an alternative way for oil production forecasting in practical application.

Discussion and conclusion

The purpose of this work is to demonstrate a method of machine learning that could replaceor accelerate manual DCA for short-term oil and gas well forecasting. Probabilistic Prophettime series analysis and more accurate deep learning models, DeepAR, were considered tosolve this problem. These two have been selected as they outperform the state-of-the-artmethods of forecasting on many topics. For time series forecasting, Prophet is a Bayesiannon-linear univariate generative model proposed by Facebook. The Prophet is also a struc-tural time series analysis method that specifically models the impact of patterns, seasonality,and events. For the Prophet, the cyclical duration and event date parameters are set thesame as our model. In contrast, DeepAR is an auto-regressive model based on cells withGRU or LSTM recurrent neural networks. It learns the parameters for each forecast hori-zon from a given probability allocation. Then, by sampling several times, one can samplefrom certain probability distributions to forecast each horizon or compute confidence inter-vals. The model validation was carried out on 22 separate midland reservoir field oil pro-duction datasets. Each has had their outliers removed and missing data replaced. They werealso standardized as a pre- and post-processing to increase the model’s accuracy. Theirperformances were evaluated based on mean CRPS metrics. The prediction length wasinitially fixed to 24months and planned to be increased to 48months. The models firstwent through a hyperparameter optimization to select to optimal parameters of each meth-ods of each well. The results showed that the deep learning approach and Prophet analysisyield a satisfactory result in short term forecast, but they may fail to identify long-termtrends in predictions unless the predictions are constantly adjusted. However, The bothapproaches relies on the volume and granularity of data to develop capability for predictingproduction over a long-time horizon.

This approach can be regarded as lume and granularity of data to develop capability forpredicting production over a long-time horizon.pproaches relies on thIt is important tohighlight some potential drawbacks of applying time series deep learning for oil productionprediction. Deep learning models may suffer significant errors when used for long-termforecasts. This is in addition to their limited interpretability. That is because the predictionsare computed sequentially and depend on past predictions that have been appended to thedata. Thus, there is a gradual accumulation of error over time. Deep learning models have tobe retrained periodically as more data are collected. Otherwise, their predictions becomehighly inaccurate after a long period. Furthermore, another difficulty that may arise whenapplying deep learning is that an intermediate-to-expert level of knowledge may be requiredduring model creation and training, as opposed to other out-of-the-box machine learningmethods that can be trained easily by adjusting their hyperparameters. Therefore, generalNNs may require some adjustments to their cell architecture.

In conclusion, the precise prediction and learning performance presented in the papersuggests that both Prophet and DeepAR are eligible for use in the petroleum industry’s


non-linear short-term forecasting problems. Many steps should be taken to furtherimprove the performance of forecasting over long time horizons, such as the applicationto spatiotemporal tasks or the use of an encoder-decoder from sequence to sequence,where the contextual data (static and dynamic) would be integrated into the model archi-tecture. Additionally, integrating physics constraints during the training of a deep neuralnetwork. An advantage of such approach is that physics can be introduced into MLapproaches and could replace or speed up manual DCA to perform long term forecastof oil and gas well.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/

or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this

article.

Notes

1. The term vanishing gradient refers to the fact that in a feedforward network (FFN) the back-

propagated error signal typically decreases (or increases) exponentially as a function of the distance

from the final layer.2. Zoneout is an application of dropout where the values are reset to their previous state ðht ¼ ht�1Þ

instead of being dropped out ðht ¼ 0Þ.3. https://facebook.github.io/prophet/

References

Adekoya F (2009) Production decline analysis of horizontal well in gas shale reservoirs. Master thesis,

West Virginia University, Morgantown, WV, USA.Aditya V, Akhil DG and Srikanta M (2017) Modeling early time rate decline in unconventional

reservoirs using machine learning techniques. In: Abu Dhabi international petroleum exhibition

& conference, Abu Dhabi, UAE, 13–16 November.Alexander A, Konstantinos B, Michael BS, et al. (2019) Gluonts: Probabilistic time series models in

python. Available at: https://arxiv.org abs/1906.05264Arps JJ (1945) Analysis of decline curves. Trans.Cho K, van Merrienboer B, Bahdanau D, et al. (2014) On the properties of neural machine translation:

Encoder-decoder approaches. In: Eighth workshop on syntax, semantics and structure in statistical

translation. Available at: https://nyuscholars.nyu.edu/en/publications/on-the-properties-of-neural-

machine-translation-encoder-decoder-aDavid K, Tegan M, Janos K, et al. (2017) Zoneout: Regularizing rnns by randomly preserving hidden

activations.David S, Valentin F and Jan G (2019) Deepar: Probabilistic forecasting with autoregressive recurrent

networks.Duong AN (2011) Rate-decline analysis for fracture-dominated shale reservoir. SPE Reservoir

Evaluation & Engineering 14(3): 377–387.Gasthaus J, Benidis K, Wang Y, et al. (2019) Probabilistic forecasting with spline quantile function

rnns. In: Proceedings of machine learning research (eds K Chaudhuri and M Sugiyama), volume 89,

pp.1901–1910.

Tadjer et al. 21

https://facebook.github.io/prophet/

George EPB, Gwilym MJ, Gregory CR, et al. (2015) Time Series Analysis: Forecasting and Control.

Hoboken: John Wiley & Sons.Geron A (2017) Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and

Techniques to Build Intelligent Systems. USA: OSA: Intelligent SyGonzalez RA, Gong X and McVay DA (2012) Probabilistic decline curve analysis reliably quantifies

uncertainty in shale gas reserves regardless of stage of depletion. In: SPE eastern regional meeting.Greff K, Srivastava RK, Koutnik J, et al. (2017) Lstm: A search space odyssey. IEEE Transactions on

Neural Networks and Learning Systems 28(10): 2222–2232.Grosse R (2017) Lecture 15: Exploding and vanishing gradient. Available at: www.cs.toronto.edu/

�rgrosse/courses/csc321_2017/readings/L15\%20Exploding\%20and\%20Vanishing\%20Gradients.

pdfGupta S, Fuehrer F and Benin CJ (2014) Production forecasting in unconventional resources using

data mining and time serie analysis. In: SPE/CSUR unconventional resources conference, Canada,

30 September–2 October 2014.Han D, Jung J and Kwon S (2020) Comparative study on supervised learning models for productivity

forecasting of shale reservoirs based on a data-driven approach. Applied Sciences 10(4): 1267.Hersbach H (2000) Decomposition of the continuous ranked probability score for ensemble prediction

systems. weather and forecasting. Weather and Forecasting 15(5): 559–570.Hochreiter S and Schmidhuber J (1997) Long short term memory. Neural Computation 9(8):

1735–1780.

Hong A, Bratvold RB, Lake LW, et al. (2019) Integrating model uncertainty in probabilistic decline-

curve analysis for unconventional-oil-production forecasting. SPE Reservoir Evaluation &

Engineering 22(3): 861–876.Joshi KJ (2012) Comparison of various deterministic forecasting techniques in shale gas reservoirs with

emphasis on the duong method. PhD thesis, Texas A&M University, College Station, TX, USA.Junyoung C, Caglar G, KyungHyun C, et al. (2014), Empirical evaluation of gated recurrent neural

networks on sequence modeling.Kyunghyun C, Bart van M, Caglar G, et al. (2014) Learning phrase representations using rnn encoder-

decoder for statistical machine translation.Lee K, Lim J, Yoon D, et al. (2019) Prediction of shale-gas production at duvernay formation using

deep-learning algorithm. SPE Journal 24(6): 2423–2437.Liu W, Liu DW and Gu J (2020) Forecasting oil production using ensemble empirical model decom-

position based long short-term memory neural network. Journal of Petroleum Science and

Engineering 189: 107013.Luo G, Tian Y, Sharma A, et al. (2019) Eagle ford well insights using data-driven approaches. In:

International petroleum technology conference, Beijing, China, March 2019.Ma X and Liu Z (2018) Predicting the oil production using the novel multivariate nonlinear

model based on arps decline model and kernel method. Neural Computing and Applications

29(2): 579–591.Nelson PH (2009) Pore-throat sizes in sandstones, tight sandstones, and shales. AAPG Bulletin 93(3):

329–340.Ng AY and Jordan MI (2002) On discriminative vs. generative classifiers: A comparison of logistic

regression and naive bayes. Advances in Neural Information Processing Systems 14: 841–848.Pan Z (2016) Revised productivity index equation to improve transient history match for the capacitance

resistance model. Master thesis, University of Texas at Austin, Austin, USA.Ruofeng W, Kari T, Balakrishnan N, et al. (2018), A multi-horizon quantile recurrent forecaster.Sagheer A and Kotb M (2019) Time series forecasting of petroleum production using deep lstm

recurrent networks. Neurocomputing 323: 203–213.Taylor SJ and Letham B (2007) Forecasting at scale.Valk�o PP and Lee WJ (2010) A better way to forecast production from unconventional gas wells. In:

SPE annual technical conference and exhibition.


http://www.cs.toronto.edu/<rgrosse/courses/csc321_2017/readings/L15%20Exploding%20and%20Vanishing%20Gradients.pdf







Wattenbarger RA, El-Banbi AH, Villegas ME, et al. (1998), Production analysis of linear flow intofractured tight gas wells. In: SPE Rocky Mountain Regional/low-permeability reservoirs sympo-sium, Denver, USA, 5–8 April.

Zhan C, Sankaran S, LeMoine V, et al. (2019) Application of machine learning for production fore-casting for unconventional resources. In: Unconventional resources technology conference(URTEC).

Tadjer et al. 23

Documents

Machine learning based decline curve analysis for short