15
MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002

MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

MGT 267 PROJECT

Forecasting the United States Retail

Sales of the Pharmacies and Drug

Stores

Done by: Shunwei Wang &

Mohammad Zainal

Dec. 2002

Page 2: MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

ABSTRACT

The present study aims at forecasting the pharmacy and drug store retail sales in US.

Different forecasting techniques are examined in the present study namely the moving average,

simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing,

simple regression, multiple regression, time series decomposition and ARIMA model. Quarterly

data are used to predict the retail sales using the above mentioned models. The forecast results

obtained by the ARIMA are found to be the best among other models. The assessment criteria are

based on the minimum RMSE, MAPE, and maximum R2.

1. INTRODUCTION

The retail sales of pharmacies and drug stores in the US represent essential economical data

for the Pharmaceutical companies. It has a significant impact on the market decisions made by the

mangers to predict future sales, inventory needs, personnel requirements, and other important

economic or business forecasting. However, there are many variables that may affect forecasting

of retail sales. Therefore, we are interested in forecasting the retail sales of pharmacies and drug

stores in the US, and want to build up a possible forecasting model.

Monthly and quarterly data of the real economic variable are obtained from the following

source: (http://www.economagic.com/em-cgi/data.exe/cenret/nrt28). The monthly data are

arranged in quarterly format in the present investigation. Forty quarters data points from 1992 to

2001 are utilized.

The retail sales of pharmacies and drug stores in the US

16500

21500

26500

31500

36500

41500

Q1-

92

Q3-

92

Q1-

93

Q3-

93

Q1-

94

Q3-

94

Q1-

95

Q3-

95

Q1-

96

Q3-

96

Q1-

97

Q3-

97

Q1-

98

Q3-

98

Q1-

99

Q3-

99

Q1-

00

Q3-

00

Q1-

01

Q3-

01

Time

The r

eta

il sale

(M

illio

n)

It is clearly evident from the time series plot that there are certain characteristics in the retail

sales of pharmacies and drug stores in the US from 1992 to 2001. These aspects can be

summarized as follows

1. There is a positive trend in the above time series plot. As such there an upward

movement in the pattern due to an increase in the population and health care standards.

Accordingly significant amount of money is spent. Moreover, the recent advancements in

the field of Pharmacy led to the development of more effective and expensive drugs

compared with conventional ones.

2. A seasonal pattern occurs in the data. There is a significant increase of the retail of sail in

the fourth quarter. The reasons are expected due to the followings:

An increase in the cold and flu diseases is noticed in this quarter.

Page 3: MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

Fourth quarter is the holidays season as such the pharmaceutical products and

some other related ones are largely purchased as gifts compared to the other

quarters.

In general, due to globalization, companies nowadays are involved in many other

types of business. One company may invest in another sister company and the

whole retail of the company takes effect at the fourth quarter.

The used data are separated into two groups. One is the historical data for the forecasting

model, with 36 periods from Q1 – 1992 to Q4 – 2000; another is holdout to test the goodness of

the fit, with 4 periods from Q1 – 2001 to Q4 – 2001.

2. FORECASTING TECHNIQUES AND THEIR RESULTS:

2.1 Moving Average

Moving average technique is used as a forecast model for the retail sales data. Four-quarter

moving average is invoked since the seasonal pattern occurs every four quarters.

The US Retail Sales: Pharmacies and Drug Stores

16500

21500

26500

31500

36500

41500

Apr-

92

Oct-

92

Apr-

93

Oct-

93

Apr-

94

Oct-

94

Apr-

95

Oct-

95

Apr-

96

Oct-

96

Apr-

97

Oct-

97

Apr-

98

Oct-

98

Apr-

99

Oct-

99

Apr-

00

Oct-

00

Apr-

01

Oct-

01

Series 1 Forecast of Series 1 Fitted Values

Method 4-Quarter Moving Average

Mean Absolute Percentage Error (MAPE) 4.35%

R-Square 89.54%

Root Mean Square Error Historic before 2001 1,540.34

RMSE / Mean Holdout Q1, 2001-Q4, 2001 1.93%

-.4000

-.2000

.0000

.2000

.4000

.6000

.8000

1.0000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

ACF

Upper Limit

Lower Limit

Page 4: MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

-.4000

-.2000

.0000

.2000

.4000

.6000

.8000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

PACF

Upper Limit

Low er Limit

2.2 Simple Exponential Smoothing (SES)

Another approach is implemented herein to forecast the pharmaceutical and drug stores

retail in US using SES. The outcome of the ForecastX is shown below

16500

21500

26500

31500

36500

41500

Apr-

92

Oct-

92

Apr-

93

Oct-

93

Apr-

94

Oct-

94

Apr-

95

Oct-

95

Apr-

96

Oct-

96

Apr-

97

Oct-

97

Apr-

98

Oct-

98

Apr-

99

Oct-

99

Apr-

00

Oct-

00

Apr-

01

Oct-

01

Y Forecast of Y Fitted Values

Method Exponential Smoothing

Mean Absolute Percentage Error (MAPE) 4.17%

R-Square 89.54%

Root Mean Square Error Historic before 2001 1,540.52

RMSE / Mean Holdout Q1, 2001-Q4, 2001 1.52%

Method Statistics Value

Alpha 0.62

2.3 Holt’s Exponential Smoothing

This method can be used in order to bring the forecast values closer to the values observed if

the data series exhibits a trend and seasonality. This is true for our scenario.

Page 5: MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

16500

21500

26500

31500

36500

41500

Apr-

92

Oct-

92

Apr-

93

Oct-

93

Apr-

94

Oct-

94

Apr-

95

Oct-

95

Apr-

96

Oct-

96

Apr-

97

Oct-

97

Apr-

98

Oct-

98

Apr-

99

Oct-

99

Apr-

00

Oct-

00

Apr-

01

Oct-

01

Series 1 Forecast of Series 1 Fitted Values

Method Exponential Smoothing

Mean Absolute Percentage Error (MAPE) 3.47%

R-Square 94.48%

Root Mean Square Error Historic before 2001 1,118.62

RMSE / Mean Holdout Q1, 2001-Q4, 2001 0.89%

Method Statistics Value

Alpha 0.10

Gamma 0.89

2.4 Winters’ Exponential Smoothing

This method along with the previous method is an extension of the basic smoothing

model. They are used for data that exhibit both trend and seasonality.

16500

21500

26500

31500

36500

41500

Apr-

92

Oct-

92

Apr-

93

Oct-

93

Apr-

94

Oct-

94

Apr-

95

Oct-

95

Apr-

96

Oct-

96

Apr-

97

Oct-

97

Apr-

98

Oct-

98

Apr-

99

Oct-

99

Apr-

00

Oct-

00

Apr-

01

Oct-

01

Series 1 Forecast of Series 1 Fitted Values

Page 6: MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

Method Exponential Smoothing

Mean Absolute Percentage Error (MAPE) 1.08%

R-Square 99.51%

Root Mean Square Error Historic before 2001 334.20

RMSE / Mean Holdout Q1, 2001-Q4, 2001 0.50%

Method Statistics Value

Alpha 0.80

Beta 0.82

Gamma 0.25

Just as stated previously, there is seasonality in the retail sale data. The seasonal index of

the fourth quarter is 1.07, which has a significant increment compare with other three

quarters.

Season Seasonal Indices

Q 1 0.99

Q 2 1.00

Q 3 0.97

Q 4 1.07

-.4000

-.3000

-.2000

-.1000

.0000

.1000

.2000

.3000

.4000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

ACF

Upper Limit

Low er Limit

-.4000

-.3000

-.2000

-.1000

.0000

.1000

.2000

.3000

.4000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

PACF

Upper Limit

Low er Limit

Page 7: MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

2.5 Simple Regression

We hypothesize that Personal Consumption Expenditures in Medical care (X1) is

influential in determining US Retail Sales: Pharmacies and Drug Stores (Y). So we look at a

scatter plot of these two variables.

400000030000002000000

40000

30000

20000

PCE

RS

S = 1396.76 R-Sq = 94.3 % R-Sq(adj) = 94.1 %

RS = -7635.55 + 0.0113290 PCE

Linear regression model

From this scatter plot, it is obvious that there is a positive linear relationship between

these two variables. So, simple regression method can be used here.

16500

21500

26500

31500

36500

41500

Apr-

92

Oct-

92

Apr-

93

Oct-

93

Apr-

94

Oct-

94

Apr-

95

Oct-

95

Apr-

96

Oct-

96

Apr-

97

Oct-

97

Apr-

98

Oct-

98

Apr-

99

Oct-

99

Apr-

00

Oct-

00

Apr-

01

Oct-

01

Y Forecast of Y Fitted Values

The regression equation is

Y = - 7170 + 0.0112 X1

Predictor Coef SE T P

Constant -7170 1683 -4.26 0.000

X1 0.0111557 0.0005885 18.96 0.000

Analysis of Variance

Source DF SS MS F P

Regression 1 746023507 746023507 359.39 0.000

Residual Error 34 70577628 2075813

Total 35 81660113

Page 8: MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

Essential diagnostic check based on residual analysis is carried out as shown in the figure

below. One can see an existing pattern which means that the simple regression model can not

fit the data properly. To overcome this drawback, a nonlinear term may be added to the regression line.

15000 25000 35000

-2000

-1000

0

1000

2000

3000

FITS1

RE

SI1

Residual Analysis

3500000300000025000002000000

35000

30000

25000

20000

PCE

RS

S = 1068.70 R-Sq = 95.4 % R-Sq(adj) = 95.1 %

+ 0.0000000 PCE**2

RS = 41072.4 - 0.0232084 PCE

Linear regression model

35000300002500020000

3000

2000

1000

0

-1000

-2000

FITS3

RE

SI3

Residual Analysis

The above figures illustrates that the addition of a quadratic term improved the model and

satisfied the assumption.

Page 9: MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

Method Linear Regression

Mean Absolute Percentage Error (MAPE) 4.99%

R-Square 95.4%

Root Mean Square Error Historic before 2001 24,401.03

RMSE / Mean Holdout Q1,2001-Q4,2001 1.04%

The regression equation is

Y = 41072.4 - 2.32E-02X + 5.99E-09X12

R-Sq = 95.4 %

Analysis of Variance

SOURCE DF SS MS F P

Regression 2 7.79E+08 3.89E+08 340.994 0 0.00

Error 33 37689909 1142118

Total 35 8.17E+08

SOURCE DF Seq SS F P

Linear 1 7.46E+08 359.389 0 0.00

1Quadratic 1 32887719 28.7954 6.26E-06

2.6 Multiple-Regression Model

There are many variables that may affect forecasting of retail sales pharmacies and drug

stores in the US, includes the total population, gross domestic product (GDP), personal income,

personal consumption expenditures in health insurance and number of outpatient visits, etc.

However, a correlation may exist between some of the proposed variables, which will result in

the serious error in the forecast regression model. Three explanatory variables are chosen as:

1. X1: Personal Consumption Expenditures in Medical Care

(http://www.economagic.com/em-cgi/data.exe/beana/m206u033)

There is a high relationship between the retail sales of the pharmacy and drug stores with

the personal consumption expenditures in medical care. Generally, this explanatory

variable implicitly represents the information resulted from increasing the population and

personal income. A positive correlation coefficient is expected for this variable.

2. X2: Unemployment rate

(http://www.economagic.com/em-cgi/data.exe/feddal/ru)

Unemployment rate is an index for the economical condition. The monthly data of the

employment rate are averaged to approximate the quarterly unemployment rate.

3. X3: Inflation in Consumer Price

(http://www.economagic.com/em-cgi/data.exe/var/inflation-ar-cpiu)

The amount of retail sale is affected by the inflation in consumer price. To forecast the

retail sale of pharmacies and drug stores, this explanatory variable is incorporated in our

Page 10: MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

model. In the same manner, the monthly data are averaged to estimate the quarterly data of

inflation in consumer price.

The correlation among three explanatory variables:

Correlations: X1, X2, X3

X1 X2

X2 -0.649

0.000

X3 -0.210 0.037

0.194 0.818

Cell Contents: Pearson correlation P-Value

From the result above, there is not serious multicollinearity among these three explanatory

variables.

Personal Consumption Expenditures in Medical care (X1), Unemployment rate (X2) and

Inflation in Consumer Price (X3) are used as the explanatory variables.

The regression equation is

Y = - 19104 + 0.0133 X1 + 954 X2 + 225 X3

Predictor Coef SE Coef T P

Constant -19104 4510 -4.24 0.000

X1 0.0132927 0.0008566 15.52 0.000

X2 953.7 367.1 2.60 0.014

X3 224.5 184.9 1.21 0.233

Given that the other two variables are in the model, X3 is not significant in this model. The

regression process is carried out again to have the regression equation as

Y = - 16854 + 0.0129 X1 + 833 X2

Predictor Coef SE Coef T P

Constant -16854 4138 -4.07 0.000

X1 0.0129425 0.0008117 15.94 0.000

X2 832.5 355.6 2.34 0.025

Analysis of Variance

Source DF SS MS F P

Regression 2 1226570711 613285355 351.44 0.000

Residual Error 37 64568180 1745086

Total 39 1291138890

Dummy variables are added to the model in order to capture the seasonality in the data. As

such Q2, Q3 and Q4 are coded as follows

Page 11: MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

Q2=1 for all second quarters and zero otherwise

Q3=1 for all third quarters and zero otherwise

Q4=1 for all fourth quarters and zero otherwise

The regression equation is

Y = - 16182 + 0.0127 X1 + 771 X2 + 207 Q2 - 659 Q3 + 1731 Q4

Predictor Coef SE Coef T P

Constant -16182 3143 -5.15 0.000

X1 0.0127178 0.0006208 20.49 0.000

X2 770.6 270.7 2.85 0.007

Q2 207.1 448.4 0.46 0.647

Q3 -658.8 449.8 -1.46 0.152

Q4 1731.0 451.7 3.83 0.001

The variables Q2 and Q3 are not significant in the occurrence of the others parameters and

the regression process is carried out again. This gives the equation to be

Y = - 15970 + 0.0126 X1 + 746 X2 + 1886 Q4

Predictor Coef SE Coef T P

Constant -15970 3230 -4.94 0.000

X1 0.0126391 0.0006355 19.89 0.000

X2 745.8 277.6 2.69 0.011

Q4 1886.5 377.9 4.99 0.000

S = 1029 R-Sq = 97.0% R-Sq(adj) = 96.8%

This makes sense because only the retail of the fourth quarter has significant impact on the retail.

2.7 Time Series Decomposition

The trend cycle can be estimated by smoothing the series to reduce the random

variation.

2X4 Moving Average

16500

21500

26500

31500

36500

41500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

From above 2x4 MA plot, a trend in the RS data is shown.

Page 12: MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

16500

21500

26500

31500

36500

41500

Oct-9

7

Dec

-97

Feb-9

8

Apr

-98

Jun-

98

Aug

-98

Oct-9

8

Dec

-98

Feb-9

9

Apr

-99

Jun-

99

Aug

-99

Oct-9

9

Dec

-99

Feb-0

0

Apr

-00

Jun-

00

Aug

-00

Oct-0

0

Dec

-00

Y-Q Fitted Values

Above is a weighted MA Smoothing technique. From the pattern, one can find that the

forecast value in the right side of curve obviously smaller than the real values. It means that there

is a quickly increase of the RS in the coming year.

After removing the trend and isolating the seasonal component, Exponential Smoothing is

used to fit the data.

0

5000

10000

15000

20000

25000

30000

35000

40000

Ap

r-92

Oct

-92

Ap

r-93

Oct

-93

Ap

r-94

Oct

-94

Ap

r-95

Oct

-95

Ap

r-96

Oct

-96

Ap

r-97

Oct

-97

Ap

r-98

Oct

-98

Ap

r-99

Oct

-99

Ap

r-00

Oct

-00

Ap

r-01

Oct

-01

Y Forecast of Y Fitted Values

Method Exponential Smoothing

Mean Absolute Percentage Error (MAPE) 0.57%

R-Square 99.77%

RMSE / Mean Holdout Q1,2001-Q4,2001 1.91%

2.8 ARIMA Model

Second-order difference is implemented to remove non-stationarity from time

series.

Page 13: MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

Second-Order Differences

-6000

-4000

-2000

0

2000

4000

6000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37

Also seasonal differencing is used to remove the seasonal factor.

Second Seasonal Difference

-1500

-1000

-500

0

500

1000

1500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

ARIMA model is used to fit the data as

16500

21500

26500

31500

36500

41500

Apr-

92

Oct-

92

Apr-

93

Oct-

93

Apr-

94

Oct-

94

Apr-

95

Oct-

95

Apr-

96

Oct-

96

Apr-

97

Oct-

97

Apr-

98

Oct-

98

Apr-

99

Oct-

99

Apr-

00

Oct-

00

Apr-

01

Oct-

01

Y Forecast of Y Fitted Values

ARIMA (2,2,0)*(1,2,1).

Method ARIMA (p,d,q)*(P,D,Q)

Mean Absolute Percentage Error (MAPE) 0.91%

R-Square 99.36%

Page 14: MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

RMSE / Mean Holdout Q1,2001-Q4,2001 0.24%

This model is good for the forecast value of the last year.

Method Statistics Value

Method Selected Box Jenkins

Model Selected ARIMA(2,2,0) * (1,2,1)

Error plot

-1,000.00

-800.00

-600.00

-400.00

-200.00

0.00

200.00

400.00

600.00

800.00

1,000.00

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

3. DISCUSSION

The moving average method is suitable for the stationery data. However, our situation

involves non-stationery data. The R-square in this model is only about 89.54% and the holdout

RMSE/Mean is about 1.93%. From ACF and PACF, it is noticed that some autocorrelations are

significantly different from zero (at lag 4, 8 and 12) which assures the seasonality at fourth

quarter.

No significant difference is found between the SES technique and the moving average

technique. However, SES attained smaller holdout RMSE/Mean (SES: 1.93%, MA(4): 1.52%).

Again SES is designated for a stationery data which is not true for our case.

Since Holt’s Exponential Smoothing adds a growth factor (or trend factor) to the equation as

a way of adjusting for the trend, the model is better than former. The holdout RMSE in 2001 is

reduced to 0.89% in this model. However, the seasonality factor in this model is still not

considered. So, still there is a space to improve our model.

For the Winter’s Exponential Smoothing method, one can see that the holdout RMSE/Mean

is 0.5% for the last year. Also, MAPE has significantly reduced, and R-Square is nearly 100%

(99.51%). The forecast error has only 0.50% for the last year. Also, no significant autocorrelation

is found for this forecasting technique.

It is found in the results that the MAPE of the simple linear regression is bigger than previous

forecast models (4.99%), and the R-Square for both of the simple linear and multiple linear

regression are not very high yet.

Page 15: MGT 267 PROJECT · simple exponential smoothing, Holt’s exponential smoothing, Winters’ exponential smoothing, simple regression, multiple regression, time series decomposition

The time series decomposition fits the historic data seems well, R-Square is 99.77%,

However, the RMSE for the last year is a bit larger (1.91%).

Finally, ARIMA model is evaluated using 2

nd order difference to achieve stationarity in

the data. Also, 2nd

order difference is implemented to de-seasonalize the data. ARIMA model is

found to have the minimum RMSE/MEAN ratio (0.24%) compared to other models. Error pattern

seems to follow a white noise model.

4. CONCLUSION

Different forecasting methods are utilized to predict the retail sale in US. The ARIMA

technique exhibits best performance among other models. The RMSE and MAPE are found to be

optimum for ARIMA (2,2,0)*(1,2,1). The table below shows the predicted values for the next two

years using ARIMA model along with the holdout period for 2002.

Forecast -- Box Jenkins Selected

Actual Forecast

Date Quarterly Quarterly Annual

Mar-2002 34346 33,984.84

Jun-2002 35358 34,980.57

Sep-2002 34932 33,868.30

Dec-2002 38412 37,141.17 139,974.88

Mar-2003 34,119.04

Jun-2003 34,624.10

Sep-2003 33,437.70

Dec-2003 35,973.18 138,154.01

Mar-2004 33,306.69

Jun-2004 33,326.83

Sep-2004 31,281.33

Dec-2004 33,003.27 130,918.14

Avg 34,087.25 136,349.01

Max 37,141.17 139,974.88

Min 31,281.33 130,918.14

Holdout period