12

Click here to load reader

Sales Forecasting

Embed Size (px)

DESCRIPTION

Statistics for sales forecasting

Citation preview

  • [QMM-TN-05] Time series forecasting

    Introduction. This note presents some elementary methods for time series data. Thisexpression applies to data collected as repeated observations of the same variable atdifferent time points. These time points are denoted by t (t = 1, 2, 3, . . .). Standardexamples are monthly data on sales, weekly returns of a financial index and quarterlydata on GNP. For instance, with monthly data, t = 1 corresponds to the first monthin the data set, t = 2 to the second month, etc. It is easy to add a column with the tvalues in an Excel worksheet (with the command Fill/Series), as we do in the example.Figure 1 (page 7) is a graphical representation of a time series data (monthly sales) as aline plot. We put t in the horizontal axis and the data in the vertical axis.

    A typical problem associated with time series data is the forecasting of future values.There are numerous methods for this, from the simple ones presented in this note tothe complex algorithms used in modern finance. The methods described here are basedon modelling the series as the sum of a predictable component plus an unpredictable,random component. We use the future values of the predictable part as forecasts. Thispart is modelled in a simple way, using a trend plus seasonality approach.

    Trends. A parametric trend is given by an elementary function of t, such as a + bt(linear), or a + bt + ct2 (quadratic). This makes forecasting trivial, because the trendformula can be used for the future as well as for the past. The coefficients a and bcan be derived from a regression analysis (see notes QMM-TN-02 and QMM-TN-03), orobtained graphically, as we do in the example of this note using Excels Chart Tools.

    Under requirement, Excel can add one or several parametric trends to a line plot. Othertrend choices, such as polynomial, exponential and logarithmic, are allowed. These op-tions are not considered in this note.

    Nonparametric trends are not given as a function of t. If the actual value is available, thetrend value is a linear combination of the actual and the preceding values. The forecastof a future value is based on the last trend values available. Among the those based ona nonparametric trend, the Holt-Winters method is the preferred forecasting method inthe business context.

    Seasonality. Patterns that are periodically repeated in time series data are called sea-sonal effects. Seasonality is typical in monthly data (period 12), and also in quarterly data(period 4). In monthly data, seasonality is managed through 12 terms, called seasonals,which account for the typical evolution of the series within the year. The seasonals canbe additive or multiplicative.

    The additive seasonals, which come in the same units as the data, are summed to thetrend, to improve the approximation. The model is based on the formula

    PREDICTION = TREND+ SEASONAL.

    [QMM-TN-05] Time series forecasting / 1 20141128

  • Additive seasonals are positive or negative, reflecting that, in some months, the series isabove trend and, in other months, below trend.

    The multiplicative seasonals are numbers with no units, which operate as factors. Theequation has the form

    PREDICTION = TREND SEASONAL.

    In the same way in which additive seasonals can be positive or negative, multiplicativeseasonals can be higher or lower than one. For instance, when the seasonal factor is 1.2for a particular month, we understand that the expected value for that month is 20%above trend. If the seasonal is 0.8, the expected value is 20% below trend.

    Why multiplicative seasonals? Of course, additive seasonals are simpler, but they cannotcope with a situation like that of the example, where the oscillations above and belowtrend due to the seasonality increase with the trend. With monthly sales data, this isthe rule more than the exception. Multiplicative seasonals allow for describing theseoscillations in percentage terms.

    For the time interval covered by the data, the predictions can be compared to the actualdata. This allow the analyst to assess the model and, eventually to choose betweenrival models. To forecast the future values of the series, trend and seasonals have to becontinued. In parametric models, it is assumed that the trend formula and the seasonalswill remain valid during the time interval covered by the forecast. In nonparametricmodels, the forecast is based on the last available values for both the trend and theseasonals. This note explains how to do this in the Holt-Winters method.

    Prediction error. The difference of the actual value minus the predicted value isthe prediction error. The analysis of the prediction error allows the assessment of aforecasting model. Forecasts based on a parametric model are sometimes given withconfidence limits. These limits are derived from the distribution of the prediction errors.The rule MEAN 2 SIGMA, based on the normal distribution, is very popular.Holt-Winters forecasting (1): generalities. The last two sections of this note,dealing with Holt-Winters forecasting, are more technical than the rest. Nevertheless,with the use of analytical software, Holt-Winters forecasts are easy to obtain. Therefore,you can leave aside the formulas, focusing on a general understanding of the method.Holt-Winters forecasting is available in the computer from many sources, including some(commercial) add-ins for Excel, with slight differences in the formulas used. There is oneversion with additive seasonals and another version with multiplicative seasonals (theone used in this note). There is also a version with a multiplicative slope, and versionsfor quarterly data.

    Although the formulas presented below may look a bit complex, it is simple enough tobe easily implemented in an Excel worksheet. For the example, the companion Excelfile qmm-tn-05.xls includes a worksheet with all the calculations, which can be easilyadapted to another data set.

    As mentioned in the discussion of the example, there are some variants on the method toinitialize the Holt-Winters series. So, our forecasts can be slightly different from those givenby a particular commercial software.

    [QMM-TN-05] Time series forecasting / 2 20141128

  • The Holt-Winters model is based on a trend plus seasonality approach. The trend canbe seen as a linear trend in which the slope is updated every time that a new observationis available. The seasonals, either additive or multiplicative, are updated as well.

    The model is usually presented as based on three components, the trend, the slope andthe seasonal. Roughly speaking, the trend and the seasonal are used to calculate thepredicted values, while the slope is used to update the trend. More specifically, the slopeis interpreted as the change between consecutive trend values. The inspiration for thisis that, in a linear trend a + b t, the difference between two consecutive trend values isequal to the slope b.

    The Holt-Winters model is based on a linear trend, but one in which the slope is notfixed, but changes across months, being updated for each new observation. It can evenhappen that the slope is positive in one part of the series but negative in another part.When we arrive to the end of the actual data, the slope is no longer updated, so thatin forecasting the future values we use the last slope available. The seasonals, eitheradditive or multiplicative, are also continuously updated. The last values available arethose used to forecast future values.

    In the Holt-Winters case, the trend is usually called level, but we have preferred to keep theterm trend here, to make it simpler. Also, level may be wrongly understood as intercept,that is, as the constant term of the linear trend formula.

    For each component, there is a parameter used to weight the new observation whenupdating. These parameters, denoted by , and , are called the smoothing parameters.The choices = = = 0.2 are typical, and we use them in the example withoutdiscussion. Nevertheless, you can change these values (within the range 01) in theworksheet used for calculating the forecasts.

    Holt-Winters forecasting (2): additive seasonals. The additive Holt-Wintersmodel is based on the TREND + SEASONAL formula. Let us assume that we havemonthly data (the formulas can be easily adapted for quarterly data) and denote thelength of the series by N , the series by xt, the trend by at, the slope by bt and theseasonal by st.

    The basic formulas are:

    at = (xt st12

    )+ (1 )(at1 + bt1), (1)

    bt = (at at1

    )+ (1 )bt1, (2)

    st = (xt at

    )+ (1 )st12. (3)

    The Holt-Winters formulas are based on a weighted average of the actual and the preced-ing observations. Let us take = 0.2 in formula (1). Then, the trend value is obtained asa combination of the trend resulting from updating the previous trend value by summingthe slope (80%) and the new sales value after discounting the seasonal pattern (20%).With = 0.2, formula (2) updates the slope as a combination of the previous slope (80%)and the current change in the trend (20%). A similar idea is applied in formula (3).

    [QMM-TN-05] Time series forecasting / 3 20141128

  • For the first year, when there are not enough past data available, and for the forecast,when we do not know the actual sales, formulas (13) are slightly changed. Let us startby the first year. First, the subscript t 12 does not make sense in formulas (1) and (3).Also, the subscript t 1 does not make sense in any of the formulas for the first January.There are many ways to fix this, but the method is not relevant for the forecaster, becausethe model adjusts itself as time runs. In this note, we have chosen a simple option.

    For the seasonals, users input in the first year the best values available. The naturalchoice, in the example will be to use the additive seasonals obtained in a previous analysis.Then, the January slope is set to zero and the trend to the first observation minus theseasonal (a1 = x1 s1, b1 = 0). For the months FebruaryDecember, the slope iscalculated with formula (2), and the trend with

    at = (xt st

    )+ (1 )(at1 + bt1). (1)

    In textbooks, it is frequently recommended to set the first 12 seasonals to zero, which isreasonable if no information about seasonality is available. This has little effect on the finalforecasts, as you can check in the companion Excel file. The Holt-Winters corrects the wrongestimates of the seasonals that you may input in the first year.

    Finally, we adapt formulas (1-3) to forecast sales one-year ahead. Since there are no actualsales available, the weighted averages no longer make sense. So, we take = = = 0,getting

    at = at1 + bt1, (1)

    bt = bt1, (2)

    st = st12. (3)

    Holt-Winters forecasting (3): multiplicative seasonals. The additive Holt-Winters model is based on the TREND SEASONAL formula. The formulas for thetrend, the slope and the seasonal are similar, but replacing some of the minus signs byquotients. We rewrite them completely to avoid any confusion. The basic formulas are,now:

    at = xt

    st12+ (1 )(at1 + bt1), (1)

    bt = (at at1

    )+ (1 )bt1, (2)

    st = xtat

    + (1 )st12. (3)

    For the first year, for which the above formulas cannot yet be applied, the solutions aresimilar to those recommended in the additive case. We start with a1 = x1/s1, b1 = 0.For months 212, the updates are calculated with

    at = xtst

    + (1 )(at1 + bt1).The 12 seasonals of the first year are set either as one or as a set of reasonable values.Finally, to forecast 12 months ahead, we can use the formulas:

    at = at1 + bt1, (1)

    [QMM-TN-05] Time series forecasting / 4 20141128

  • bt = bt1, (2)

    st = st12. (3)

    You will find that, in practice, with either additive or multiplicative seasonals, the methodfits the actual data equally well, but, when the seasonals need much updating, as in theexample, the multiplicative seasonals would give better forecasts for the forthcoming year.

    Prepared by professors I Alegre (Universitat Internacional de Catalunya) and MA Canela(IESE Business School).

    [QMM-TN-05] Time series forecasting / 5 20141128

  • Example: Forecasting air passenger bookings

    Presentation. This example (companion Excel file qmm-tn-05.xls) uses data on thenumber of international passenger bookings (thousands) per month for the airline PanAm, obtained from the US Federal Aviation Administration, for the period 19401960.The company used the data to predict future demand before ordering new aircraft andstarting to train more aircrew. The data can be found in the worksheet Data.

    The line plot (Figure 1) suggests the presence of a linear trend and a strong seasonalpattern, which could be expected from the nature of the data. We explore first the use ofa parametric trend. In Figure 2 we find, superimposed to the series of bookings, a linearand a quadratic trend (dashed line). In Excel these trends can be obtained graphically, orextracted from a regression analysis. We skip the regression analysis, using the graphicalapproach.

    Estimating the trend. The trends are added to the line plot using the Chart Tools.To do this, we select the Layout toolbar and then Trendline. The option Linear Trend-line, give us the linear trend. For the quadratic trend, we select More Trendline Op-tions and then Polynomial and set Order as 2. These trends can be reformatted, se-lecting them and (right) mouse-clicking on them and then asking for Format Trendline.One interesting option is Display Equation on chart, which we use for this example.

    The linear trend has an equation

    PASSENGERS = 87.653 + 2.6572 t,

    in which t is a time index: t = 1 in January 1949, t = 2 in February 1949, t = 3 in March1949, and so on, until t = 144 in December 1960. If we add t as an additional column,we can use the equation to calculate the trend values, as it can be seen in the worksheetLinear trend.

    The equation of the quadratic trend is

    PASSENGERS = 112.380 + 1.641 t+ 0.007 t2.

    The trend values are calculated in worksheet Quadratic trend. We will use these trendvalues in our forecast, but you can slightly modify the worksheets that follow to replacethe quadratic trend by the linear, which has, in this example, a similar performance.

    Seasonality. Figure 2 shows clearly that additive seasonality would not be adequatehere, since the distance of the seasonal peak values to the trend increases with the trendvalue. So, we use multiplicative seasonals.

    A simple approach to the estimation of the seasonal factors can be as follows. By dividingthe actual bookings by the trend values we get a series of multiplicative deviations (the

    [QMM-TN-05] Time series forecasting / 6 20141128

  • Time

    Pass

    enge

    rs (1

    000s

    )

    1950 1952 1954 1956 1958 1960

    100

    200

    300

    400

    500

    600

    Figure 1. Air passenger data

    Time

    Pass

    enge

    rs (1

    000s

    )

    1950 1952 1954 1956 1958 1960

    100

    200

    300

    400

    500

    600

    Figure 2. Air passenger data with linear and quadratic trend

    column DEVIATION in the worksheet Seasonality). Then, for January, we take the meanof all the available January deviations as the seasonal factor. The same for February,March, etc. Thus, we obtain 12 seasonal values: 0.912, 0.893, 1.017, 0.985, 0.982, 1.109,1.230, 1.218, 1.054, 0.918, 0.795 and 0.891 (column SEASONAL). These are the factorsby which we multiply the trend to obtain a prediction for the corresponding month.The seasonal factors are plotted in Figure 3, which can be regarded as the profile of a

    [QMM-TN-05] Time series forecasting / 7 20141128

  • ll

    l

    l l

    l

    ll

    l

    l

    l

    l

    2 4 6 8 10 12

    0.8

    0.9

    1.0

    1.1

    1.2

    Month

    Figure 3. Seasonal factors

    typical year. Figure 4 compares the actual bookings with those predicted as TRENDSEASONAL (dashed line).

    Serious textbooks prefer to use also regression analysis to estimate the seasonals, but thiswould not to change much the results, and would involve the use of logarithms. Also,some authors advise to adjust the seasonals so that the average is one.

    Additive seasonals can be estimated in a similar way, but using additive deviations from thetrend.

    Forecasting future bookings. Now, we can use these results to forecast the air pas-senger bookings of 1961, with the formula

    FORECAST = TREND SEASONAL.

    The seasonal factors are fixed (those of Figure 3), and the trend values for 1961 arecalculated with the equation of the quadratic trend given above. By examining therandom part of the model, we can guess from the data available the extent to whichthese forecasts may be wrong.

    We calculate the prediction error as the difference or the actual minus the predictedbookings,

    ERROR = PASSENGER TREND SEASONAL.We see these errors in Figure 5. Note that they are not the residuals of any regression and,therefore, the sum is different from zero (but not far from it). A rough estimate of theuncertainty of the forecasts can be derived from a visual inspection of Figure 5. A morerigorous method is based on the residual standard deviation. The standard deviation ofthe prediction errors is 13.23, which represents a 11% of that of the raw data (119.97).

    [QMM-TN-05] Time series forecasting / 8 20141128

  • Time

    Pass

    enge

    rs (1

    000s

    )

    1950 1952 1954 1956 1958 1960

    100

    200

    300

    400

    500

    600

    Figure 4. Actual and predicted bookings (quadratic trend)

    Time

    Pass

    enge

    rs (1

    000s

    )

    1950 1952 1954 1956 1958 1960

    40

    20

    020

    40

    Figure 5. Prediction error (quadratic trend)

    Confidence limits for the error can be calculated using the fact that a 95% (approx.) of theobservations of a normally distributed variable fall within the limits MEAN 2 SIGMA.Table 1 gives the one-year ahead forecast, with the 95% limits.

    [QMM-TN-05] Time series forecasting / 9 20141128

  • Table 1. Forecast for 1961

    MONTH TREND SEASONAL FORECAST LOWER LIMIT UPPER LIMIT

    Jan 497.5 0.912 453.9 427.4 480.3Feb 501.2 0.894 447.9 421.3 474.4Mar 504.9 1.017 513.4 486.9 539.9Apr 508.6 0.985 500.8 474.3 527.3May 512.3 0.982 503.0 476.4 529.5Jun 516.0 1.109 572.1 545.5 598.8

    Jul 519.8 1.230 639.4 612.7 666.1Aug 523.5 1.219 638.0 611.2 664.8Sep 527.3 1.054 555.7 528.8 582.6Oct 531.1 0.918 487.4 460.4 514.4Nov 534.9 0.795 425.3 398.2 452.4Dec 538.7 0.891 480.1 452.9 507.2

    Holt-Winters forecasting. On the left side of the worksheet Holt-Winters of thecompanion Excel file, we find the approximation of the data calculated with the formulasof the Holt-Winters method with multiplicative seasonals. The approximation of theactual bookings is based on the three columns TREND, SLOPE and SEASONAL, andit is calculated as TREND SEASONAL. The prediction errors are, in general, a bitbetter than those obtained with the quadratic trend. The standard deviation is now 8.97.Figures 6 and 7 are the same as Figures 4 and 5, but obtained with the Holt-Wintersmethod.

    Table 2. Holt-Winters approximation for the first year

    MONTH ACTUAL TREND SLOPE SEASONAL HOLT-WINTERS

    Jan-49 112 123.00 0.00 0.912 112.00Feb-49 118 124.63 0.37 0.894 111.37Mar-49 132 125.96 0.56 1.017 128.09Apr-49 129 127.42 0.74 0.985 125.47May-49 121 127.18 0.55 0.982 124.86Jun-49 135 126.53 0.31 1.109 140.29

    Jul-49 148 125.53 0.05 1.230 154.42Aug-49 148 124.76 0.12 1.219 152.02Sep-49 136 125.52 0.06 1.054 132.27Oct-49 119 126.40 0.22 0.918 116.00Nov-49 104 127.46 0.39 0.795 101.34Dec-49 118 128.76 0.57 0.891 114.74

    In Table 2, we find the bookings data, the three components of the Holt-Winters modeland the prediction for the first year (1949). All the calculations have been performedwith the formulas given in this note. The first value of the level series is equal to thefirst observation and the first value of the slope is zero. The rest of these two columns

    [QMM-TN-05] Time series forecasting / 10 20141128

  • Time

    Pass

    enge

    rs (1

    000s

    )

    1950 1952 1954 1956 1958 1960

    100

    200

    300

    400

    500

    600

    Figure 6. Actual and predicted bookings (Holt-Winters)

    is calculated using the smoothing parameters = = 0.2. The seasonals are those ofFigure 3.

    Table 3. Holt-Winters approximation for the last year

    MONTH ACTUAL TREND SLOPE SEASONAL HOLT-WINTERS

    Jan-60 417 460.14 6.10 0.910 418.82Feb-60 391 461.93 5.23 0.873 403.11Mar-60 419 455.98 3.00 0.999 455.45Apr-60 461 460.67 3.34 0.989 455.68May-60 472 465.54 3.64 1.003 467.09Jun-60 535 469.28 3.66 1.139 534.64

    Jul-49 622 476.78 4.43 1.272 606.45Aug-60 606 481.97 4.58 1.251 602.98Sep-60 508 485.25 4.32 1.056 512.40Oct-60 461 491.98 4.80 0.923 453.92Nov-60 390 495.14 4.47 0.796 394.21Dec-60 432 496.89 3.93 0.885 439.75

    Table 3 shows the Holt-Winters results for the last year (1960). Although the seasonalshave been updated (using = 0.2), you can see they have not changed much with respectto the starting values in Table 2. Indeed, if you inspect the variation in the columnSEASONAL, you may conclude that, in this example, there is no need of updating theseasonals (this is equivalent of setting = 0).

    [QMM-TN-05] Time series forecasting / 11 20141128

  • Time

    Pass

    enge

    rs (1

    000s

    )

    1950 1952 1954 1956 1958 1960

    30

    20

    10

    010

    2030

    Figure 7. Prediction error (Holt-Winters)

    Table 4. Holt-Winters forecast for 1961

    MONTH FORECAST MONTH FORECAST

    Jan-61 455.85 Jul-61 667.01Feb-61 440.47 Aug-61 660.96Mar-61 508.08 Sep-61 562.02Apr-61 507.05 Oct-61 494.69May-61 518.26 Nov-61 430.01Jun-61 592.95 Dec-61 481.48

    The last values of the level (496.37) and the slope series (3.92) and the twelve seasonalsof Table 3 are used to forecast the future bookings. The forecast for January 1961 is(

    496.89 + 3.93) 0.910 = 455.85,

    for February 1961, (496.89 + 2 3.93) 0.873 = 440.47,

    and so on, until December 1961,(496.89 + 12 3.93) 0.885 = 481.48

    These forecasts are presented in Table 4. The calculations are done on the right side ofthe worksheet Holt-Winters.

    Source: GEP Box, GM Jenkins & GC Reinsel (1976), Time Series Analysis, Forecasting andControl, Holden-Day.

    [QMM-TN-05] Time series forecasting / 12 20141128