20
Senior Design Final Report TopGolf Inc: Demand Forecasting Prepared by: Kaitlyn Farmer, Janie Flowers, Zach Taher May 8th, 2014

TopGolf Inc: Demand Forecasting - s2.smu.edubarr/4395/history/Presentations/2014/TopGolf... · study is to help TopGolf to predict customer treads by creating a demand forecasting

  • Upload
    lekhanh

  • View
    220

  • Download
    2

Embed Size (px)

Citation preview

Senior Design Final Report

TopGolf Inc: Demand Forecasting Prepared by: Kaitlyn Farmer, Janie Flowers, Zach Taher May 8th, 2014

2

Table of Contents

Management Summary ............................................................................. 1

Background and Description of Project .............................................. 2

Analysis of the Situation .......................................................................... 3

Technical Description of the Model ..................................................... 4

Dallas R Software Output ...................................................................... 6

Dallas R Software Output (No 2014) ................................................. 8

Austin R Software Outputs ................................................................ 10

Analysis and Managerial Interpretation ......................................... 12

Austin, TX Site Regression Chart ..................................................... 12

Dallas, TX Site Regression Chart ..................................................... 12

Dallas, TX Site (No 2014) Regression Chart................................. 13

Conclusions and Critique ...................................................................... 14

Appendix A - Austin Diagnosis ............................................................ 16

Appendix B - Austin Plot 2 .................................................................... 16

Appendix C - Dallas Plot ........................................................................ 17

Appendix D - Dallas Diagnostics ......................................................... 17

Appendix E- Dallas Plot (No 2014) .................................................... 18

Appendix F - Dallas Diagnostics (No 2014) .................................... 18

1

Management Summary

TopGolf is a unique entertainment facility that provides guests with a new type of golfing

experience that allows players of all ages and skill levels to participate in a fun point-scoring golf

game. Players can hit microchip golf balls at targets located in the outfield and track their hits with

the balls unique tracking system. TopGolf is also a premier event venue that provides guests with

upscale food and beverage options, VIP memberships, special events, private event services, and

much more. They currently have 10 locations in two countries and will have opened 90 more sites

within the next five years. That’s over 900% growth in five years. Despite TopGolf’s great amount

of success and growth, they currently have no form of revenue forecasting models or programs to

help them better predict their daily business operations. TopGolf’s quick growth and massive

success has allowed them to get by without having to forecast their customers spending patterns.

However, without any type of demand forecasting models, TopGolf indicated to us the difficulty this

creates in many daily business decisions such as optimizing the staffing schedule.

With this in mind, our team decided that our senior design project would be to create a type

of forecasting model to help TopGolf better track revenue data and predict daily demand. Our team

was also interested in finding out if weather conditions had any effect on customer play and sales at

TopGolf. Our project would look at these variables and create a future revenue forecasting model to

be implemented to help staff scheduling, so that TopGolf can better meet Key Performance

Indicators, such as sales per employee hour.

Our team was given over 1.8 million data points from TopGolf’s data revenue collection file.

Our team set out to identify potential factors that could influence revenue, build an accurate model

that will forecast future demand with respect to these factors. Over the course of our project our

team used AWK, a software tool, to break down the large revenue file into multiple files based on

specific site data. We also used Microsoft Excel to organize, manipulate, and run basic regressions

of the weather and revenue data. Finally, we used R software to create models, analyze the data,

and build our final forecasting models.

The first step of analysis was discovering which variables would be significant in

forecasting revenue. We looked at TopGolf’s revenue data, day of the week, month of the year, and

historical weather data to create a forecasting model. The data was inconclusive for the Dallas, TX

location due to the new collection process; however, the Austin, TX site had a good collection of

data we were able to use to create a model. We concluded using the Austin data that day of the

week plays the largest role in determining revenue for a given day at TopGolf. Through our model,

we found that weekends yield better results than the weeknights. In addition, we found that

TopGolf also performs very well when the weather is between 51 and 70 degrees Fahrenheit.

2

Background and Description of Project TopGolf is a new company, with operations only spanning over five years. After Ken May,

Chief Operating Officer, came on board with the company he made it his mission to incorporate

Business Intelligence strategies into daily business operations. Less than a year ago, Ken May

created a Business Intelligence and Analytics group at the company to start tracking and modeling

data. Coming into this project, the Business Intelligence and Analytics group at TopGolf was still in

its preliminary stages of working towards condensing all the data collection that had been

untouched for the entire life of the company. TopGolf’s had many problems when it came to

Business Intelligence. TopGolf has all this data that is not being tracked, organized, or put into any

standardization models. They have no current forecasting software. The company is manually

forecasting daily sales in spreadsheets. With all of these problems, it is very difficult for TopGolf to

come up with a standardization in making daily business decisions such as creating optimal staffing

schedules for their multiple sites.

Our senior design team was motivated to apply our expertise in Management Science

modeling and analytics to help TopGolf create this solution to this problem. The purpose of our

study is to help TopGolf to predict customer treads by creating a demand forecasting model, which

will allow them to make more informed business decisions.

Our team had to first decide what software we would use to create the forecasting model.

TopGolf currently was using an advanced Microsoft Business Package. So we decided to start

running our models in excel. We ran preliminary regressions in excel to see if there was any

correlation. However, Excel is only able run regressions using 16 variables. So after seeing some

correlation in the data using excel and speaking with TopGolf about what software they could

utilize, we converted the data to CSV format and uploaded it into R software which is able to handle

more variables.

Another decision we had to make is how to group the variables. We decided to group the

variables by the day of the week, month of the year, and temperature range. We used these

variables against the TopGolf revenue data to analyze if they affected the revenue for a given time.

We chose these variables through deductive reasoning using personal experiences to hypothesize

what would be largest influencers on revenue.

There were many variables to consider when looking at what weather data to include in the

study. We were able to find historic daily weather data that included tons information from

humidity to wind direction. Our team decided to only focus on a few key weather factors that we

believed might have an impact on customers TopGolf usage.

This model will affect the Business Intelligence (BI) group because they will be able to use

our model to implement future decisions in regards to staffing. Our results will help the BI group

compare the trends they have seen in-house and what we found. As TopGolf employees become

more aware of customer trends they will be able to better forecast revenues and become a more

efficient company.

3

Analysis of the Situation Our senior design teams problem/ goal to tackle this project was to create a forecasting

model that looked at the daily correlation between date, climate factors, and daily revenues

collected at a specific TopGolf sites. Our team’s general approach to the problem of demand

forecasting for TopGolf began with data mining the revenue data we received. We converted this

data to a txt file to ensure that it could later be put into excel. After reviewing the data file, we found

that it had over 1.8 million data points. Therefore from this point, we decided that to even consider

tackling the data, we must first break it down. We decided the most practical way to divide up the

data was to break the file down by the different TopGolf site locations. We were able to use AWK to

split this data up into the different sites. Once we had separated the data for the different TopGolf

sites, we were able to input it into Excel for further data scrubbing.

First, we started out comparing weather data and revenue data against each other in basic

Excel pivot charts and graphs. Next, our team created regression models in Excel using the Dallas,

TX site revenue data and Dallas weather data for further analysis. After analysis of the results we

saw a large spike in 2014 revenue data. After speaking with TopGolf, they stated that this spike was

due to the cause of their new revenue collection where customers rented bays per hour and no

longer by purchasing physical quantities of balls. Our team decided to first take a look at the results

without 2014 data, to see the effect the new pricing strategy had on the regression. Surprisingly,

these results turned out to provide an even worse result for forecasting the revenue. After finally

finding a somewhat significant correlation with the Austin, TX site, we decided to starting running

the data in R software to create models using larger amounts of data. Our team started running the

Dallas and Austin, Texas site data in R software. We decided to not use the Houston, TX site data

because it showed poor correlation. It seems that the reporting for the Houston site was off due to

the total revenues were very low and often below $10,000 for certain days. This caused a significant

impact in the results and was not even able to produce R2 values above 0.1 in Excel, despite running

it many different ways. Fortunately, the results of the Austin data were promising, achieving an R2

value around 0.7. The only factor that reduces this results credibility is the fact that the Austin,

Texas TopGolf site data only holds about nine months of data.

Our team was able to find distinct weekly and monthly trends in this data which can be seen

in the graphs in the Appendix. Dallas regressions were not as promising; with the R2 value resulting

in 0.2, so we decided to run the regression twice. The first time we used the data is that was

provided to our team by TopGolf. However, as you can see in the Dallas plot graph (Appendix C),

there is an obvious change in reporting between 2013 and 2014 which throws off the regression.

To account for this change, we took away all of the 2014 data (see Appendix E) and ran the

regression again. Unfortunately, the results were still not providing strong correlation, as you can

see from the actual vs. forecasted plotted points (Appendix E). The full models created can be found

further in this report to further clarify variables, technical analysis, and project results.

4

Technical Description of the Model Data was initially received from Top Golf in a crystal reports format. Crystal Reports is a

SAP program which pulls database queried information into a viewable report. Another reason files

are exported in this format is so that the data can be compressed and large amounts of data are not

hogging computer processing power. We initially tried to install a free report viewer for the data

however the data would not open using this program and we were forced to look for other methods

to view the data. By changing the file extension to a .txt file, we were able to open the file with

Notepad. The data file was very large, over 800 megabytes in size, and contained over 1.8 million

rows of data. This was due to the fact that the file contained every transaction for the last two years

at all 10 operating top golf sites.

Once we were able to actually view the data, we needed to find a method in which to clean

up the data to find the chunks that would be useful to us. Initially we read the data into Excel,

however Excel would only take slightly over one million rows of data in, leaving half of our data file

unused. Fortunately, the data was ordered by site ID. The Top Golf site ID for the Dallas location

was 5 and we were able to run data validation in Excel which left us only with the data for the

Dallas location. A problem we ran into was that the date recorded for each entry was not split up by

month day and year. Instead it was provided as a “Date ID” with a value such as “20130411”. By

analyzing multiple lines of data, we were able to infer that the first four numbers of the Date ID

contained the year, the next two numbers contained the day of the week, and the last two numbers

were the month number. We were able to use Excel’s =RIGHT(string, #characters), =MID(string,

start character #, #characters) and =LEFT(string, #characters), functions to change the info into

separate year, month, and day columns. Using this data we were then able to use the

=WEEKDAY(date) function in Excel to determine the weekday for each data point, which would be

7 of the variables that we used in our analysis.

The data still required more scrubbing because it could not yet be used in a regression. We

needed to delete all of the entries with no reported revenue. These showed up as NULL and could

not be used as their value would greatly skew any regression models. Unneeded columns such as

transaction type and transaction amount were deleted. We considered using transaction type as a

variable however there were over 25 different transaction types and it was not known how we

would assign numerical values to each different transaction type. Lastly, we put the revenue into a

pivot table so that we would have daily sums.

After cleaning up all the data, we needed to add weather data. We pulled the data from

www.wunderground.com which had daily reports of weather statistics for the past five years. Our

data only went back three years so we had weather data for each TopGolf operating. The weather

data files contained much more data than we needed and had info such as minimum temperature,

maximum temperature, average temperature, wind speed, humidity, visibility, and precipitation.

We decided that the temperature and precipitation data were the data that would have the greatest

impact and decided to use these in our model. This data was able to be added to the TopGolf data

using multiple VLookUps in excel where the formula would search for the date in the weather

spreadsheet and then pull the relevant information. To illustrate, we would use the date in the

5

TopGolf revenue sheet, the formula would then look for the corresponding date in the weather

spreadsheet and then look into the specified column (temperature or precipitation) for that row.

This accurately and quickly pulled all of the relevant weather data into our main top golf data file.

After the data had been cleaned, we needed to select variables which looked to have a

significant impact on revenue. To help us decide on these variables, we graphed the raw daily

revenue data in which there were obvious seasonality and weather trends. We also predicted

originally that weather would have a significant impact on the revenue and decided to use

temperature and precipitation as variables as well. Originally, and mistakenly, we were using the

variables as typical numerical values. Thus we essentially only had four variables, weekday (with

values ranging from 1-7), month of year (values ranging from 1-12), precipitation, and temperature

(ranging from 40-100+). We ran an initial regression on the data using Excel’s regression analysis

tool and came away with an R2 value less than 0.1 indicating poor correlation. We concluded that

this was poor model design due to our variables. In this type of scenario, we would be assuming

that later in the week, later in the year, and the hotter it was, the greater revenue would be. Under

this assumption, a 110 degree day on a Sunday in December would produce the highest revenue.

Thus we decided to change our variables into binary variables. This meant each day of the week

and each month would become its own independent variable. The weather data we would then

stratify and use IF formulas in excel to assign either a 1 or 0 depending on if the average

temperature fell within that range. We kept precipitation as a normal variable assuming that it

would have a negative coefficient and that the more it rained in a day, the less revenue top golf

would recognize.

We again attempted to run the regression in Excel; however Excel’s regression tool will only

take 16 variables. We used day of the week, and temperature data, and precipitation for a total of

16 variables. The regression gave us a correlation coefficient of 0.28 which was still poor. The

coefficient for the precipitation data was strangely positive as well indicating that with more rain,

there would be more revenue. This seemed incorrect and after reassessing the precipitation data,

determined that it was incorrect. There had been days where there had been rain according to

other sources where our data did not indicate such. As a result we decided to remove the

precipitation data from our model. This increased our R2 value to 0.3 which was a slight

improvement.

We noticed there were a few data points that were large outliers. There were certain days

with revenue over $500,000 when a typical day sat between $50,000 and $100,000 in revenue. We

decided to remove these days from our model. We also noticed a large jump in sales for 2014 data.

Daily revenue was almost 4 times what it was in previous years. This was not simply a result

increased business but was a reporting difference. We deleted the 2014 entries and ran another

regression which resulted in a R2 = .35 which was another improvement.

In an effort to get better results, we decided to look at another site. That site ended up being

the Austin site which we chose because the site was opened with the new reporting system. The

methods above were repeated to clean the Austin data and after running a regression, we were able

6

to achieve an R2 = .65 which showed that there was actually correlation between the daily revenue

and the variables that we selected. We still had not run the regression with all 25 variables due to

Excel’s variables limits so we decided to use the R statistical package.

To get the data into R, we saved the Excel files into csv format which can easily be read into

R using the read.csv(csv name) command. Each file was read into the program and saved as its own

data frame. We then used R’s lm command to designate a linear regression and designated which

variables we wanted to use. The variables analyzed in the regression are below:

Temperature range (50 and below, 51-60 degrees, 61-70 degrees, 71-80 degrees, 81-90

degrees, above 90 degrees)

Day of the Week (Monday, Tuesday, Wednesday, etc)

Month of the year (January, February, March, etc)

The regression was run three different times, once for all of the Dallas data, once for the Dallas data

without 2014 revenue, and once for the Austin location. The outputs of R’s linear regression are

below:

Dallas R Software Output

Call:

lm(formula = tgd$Total.Amount ~ tgd$Below.50 + tgd$X51.60 + tgd$X61.70 +

tgd$X71.80 + tgd$X81.90 + tgd$ABOVE.90 + tgd$Monday + tgd$Tuesday +

tgd$Wednesday + tgd$Thursday + tgd$Friday + tgd$Saturday +

tgd$Sunday + tgd$January + tgd$February + tgd$March + tgd$April +

tgd$May + tgd$June + tgd$July + tgd$August + tgd$September +

tgd$October + tgd$November + tgd$December)

Residuals:

Min 1Q Median 3Q Max

-10332 -2551 -71 1049 60006

Coefficients: (2 not defined because of singularities)

Estimate Std. Error t value Pr(>|t|)

7

(Intercept) 7402.8 2307.2 3.209 0.001400 **

tgd$Below.50 -1076.8 2141.7 -0.503 0.615273

tgd$X51.60 741.3 2134.4 0.347 0.728465

tgd$X61.70 245.9 2158.8 0.114 0.909337

tgd$X71.80 -881.5 2322.0 -0.380 0.704334

tgd$X81.90 -696.6 2456.5 -0.284 0.776832

tgd$ABOVE.90 -886.8 2738.1 -0.324 0.746151

tgd$Monday -593.8 1039.2 -0.571 0.567940

tgd$Tuesday -42.1 1044.0 -0.040 0.967842

tgd$Wednesday 207.8 1041.4 0.200 0.841934

tgd$Thursday 430.1 1044.6 0.412 0.680701

tgd$Friday 2052.2 1041.0 1.971 0.049108 *

tgd$Saturday 3652.9 1037.2 3.522 0.000459 ***

tgd$Sunday NA NA NA NA

tgd$January -6666.0 1300.8 -5.125 3.94e-07 ***

tgd$February 576.4 1338.5 0.431 0.666884

tgd$March 1090.6 1398.4 0.780 0.435731

tgd$April -7034.9 1692.5 -4.156 3.67e-05 ***

tgd$May -5736.4 1754.1 -3.270 0.001131 **

tgd$June -6266.9 1871.1 -3.349 0.000857 ***

tgd$July -6370.5 1908.2 -3.338 0.000891 ***

tgd$August -6465.6 1920.7 -3.366 0.000807 ***

tgd$September -6597.4 1772.3 -3.722 0.000214 ***

tgd$October -7311.5 1451.8 -5.036 6.17e-07 ***

tgd$November 277.6 1346.8 0.206 0.836777

tgd$December NA NA NA NA

8

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7199 on 647 degrees of freedom

Multiple R-squared: 0.2278, Adjusted R-squared: 0.2004

F-statistic: 8.3 on 23 and 647 DF, p-value: < 2.2e-16

Dallas R Software Output (No 2014)

Call:

lm(formula = tgd$Total.Amount ~ tgd$Below.50 + tgd$X51.60 + tgd$X61.70 +

tgd$X71.80 + tgd$X81.90 + tgd$ABOVE.90 + tgd$Monday + tgd$Tuesday +

tgd$Wednesday + tgd$Thursday + tgd$Friday + tgd$Saturday +

tgd$Sunday + tgd$January + tgd$February + tgd$March + tgd$April +

tgd$May + tgd$June + tgd$July + tgd$August + tgd$September +

tgd$October + tgd$November + tgd$December)

Residuals:

Min 1Q Median 3Q Max

-1657.9 -381.0 -35.0 255.5 6455.2

Coefficients: (2 not defined because of singularities)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1662.86 274.89 6.049 2.87e-09 ***

tgd$Below.50 -831.70 254.58 -3.267 0.001163 **

tgd$X51.60 -681.00 245.29 -2.776 0.005705 **

tgd$X61.70 -567.34 246.78 -2.299 0.021920 *

9

tgd$X71.80 -445.17 257.40 -1.730 0.084340 .

tgd$X81.90 -487.23 268.99 -1.811 0.070697 .

tgd$ABOVE.90 -473.90 294.07 -1.612 0.107704

tgd$Monday -454.83 116.93 -3.890 0.000114 ***

tgd$Tuesday -491.54 117.24 -4.193 3.27e-05 ***

tgd$Wednesday -429.33 116.45 -3.687 0.000252 ***

tgd$Thursday -324.84 116.74 -2.783 0.005597 **

tgd$Friday 145.94 116.49 1.253 0.210851

tgd$Saturday 601.05 116.18 5.173 3.34e-07 ***

tgd$Sunday NA NA NA NA

tgd$January 245.35 182.08 1.348 0.178434

tgd$February 185.24 187.48 0.988 0.323611

tgd$March 507.19 182.88 2.773 0.005758 **

tgd$April 254.09 189.55 1.340 0.180712

tgd$May 796.26 194.63 4.091 5.01e-05 ***

tgd$June 226.26 205.13 1.103 0.270555

tgd$July -11.34 208.43 -0.054 0.956623

tgd$August -38.72 209.47 -0.185 0.853425

tgd$September -123.22 196.21 -0.628 0.530309

tgd$October -185.41 176.51 -1.050 0.294014

tgd$November 59.00 188.43 0.313 0.754335

tgd$December NA NA NA NA

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 707.3 on 496 degrees of freedom

10

Multiple R-squared: 0.3243, Adjusted R-squared: 0.293

F-statistic: 10.35 on 23 and 496 DF, p-value: < 2.2e-16

Austin R Software Outputs

Residuals:

Min 1Q Median 3Q Max

-61981 -9307 106 10391 56856

Coefficients: (4 not defined because of singularities)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 37867 13806 2.743 0.006516 **

tga$MONDAY -11384 4360 -2.611 0.009553 **

tga$TUESDAY -2987 4335 -0.689 0.491455

tga$WEDNESDAY 3502 4360 0.803 0.422497

tga$THURSDAY 11532 4360 2.645 0.008675 **

tga$FRIDAY 23180 4366 5.309 2.37e-07 ***

tga$SATURDAY 40430 4374 9.243 < 2e-16 ***

tga$SUNDAY NA NA NA NA

tga$BELOW.50 34155 14310 2.387 0.017716 *

tga$X51.60 49210 14492 3.396 0.000792 ***

tga$X61.70 47933 14593 3.285 0.001162 **

tga$X71.80 -2500 5325 -0.469 0.639165

tga$X81.90 -1034 7665 -0.135 0.892813

tga$Above.90 6951 10313 0.674 0.500899

tga$January -20957 5032 -4.164 4.25e-05 ***

tga$February -52789 5184 -10.183 < 2e-16 ***

11

tga$March -54662 5911 -9.247 < 2e-16 ***

tga$April NA NA NA NA

tga$May NA NA NA NA

tga$June -61289 9510 -6.445 5.62e-10 ***

tga$July -24679 9012 -2.738 0.006603 **

tga$August -25144 9270 -2.712 0.007125 **

tga$September -28664 8418 -3.405 0.000767 ***

tga$October -32609 5885 -5.541 7.38e-08 ***

tga$November -35586 5166 -6.889 4.25e-11 ***

tga$December NA NA NA NA

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19280 on 259 degrees of freedom

Multiple R-squared: 0.618, Adjusted R-squared: 0.587

F-statistic: 19.95 on 21 and 259 DF, p-value: < 2.2e-16

The results of the regression were a little worse than what we had achieved in excel. We

then graphed the forecasted revenue against the actual revenue for each scenario using R as well as

ran regression analysis and graphed these to illustrate the accuracy/inaccuracy of our models.

These graphs can be found in the Appendix.

12

Analysis and Managerial Interpretation We came up with the following forecasting models for each of the three scenarios.

Austin, TX Site Regression Chart

Austin Regression

Day of Week Month Temperature

Variable Coefficient Variable Coefficient Variable Coefficient

Monday -11,384 January -20,957 50 and Below 34,155

Tuesday -2,987 February -52,798 51-60 Degrees 49,210

Wednesday 3,502 March -54,662 61-70 Degrees 47,933

Thursday 11,532 April N/A 71-80 Degrees -2,500

Friday 23,180 May N/A 81-90 Degrees -1,034

Saturday 40,430 June -61,289 Above 90 Degrees 6,951

Sunday N/A July -24,679

August -25,144

September -28,664

October -32,609

November -35,586

December N/A

Intercept of 37867

*Coefficients with N/A are due to singularities

Dallas, TX Site Regression Chart

Dallas Regression

Day of Week Month Temperature

Variable Coefficient Variable Coefficient Variable Coefficient

Monday -594 January -6,666 50 and Below -1,077

Tuesday -42 February 576 51-60 Degrees 741

Wednesday 208 March 1,091 61-70 Degrees 246

Thursday 430 April -7,035 71-80 Degrees -882

Friday 2,052 May -5,736 81-90 Degrees -697

Saturday 3,653 June -6,267 Above 90 Degrees -887

Sunday N/A July -6,371

August -6,466

September -6,597

October -7,312

November 278

December N/A

Intercept of 7402.8

*Coefficients with N/A are due to singularities

13

Dallas, TX Site (No 2014) Regression Chart

Dallas Regression (No 2014 Data)

Day of Week Month Temperature

Variable Coefficient Variable Coefficient Variable Coefficient

Monday -455 January 245 50 and Below -832

Tuesday -492 February 185 51-60 Degrees -681

Wednesday -429 March 507 61-70 Degrees -567

Thursday -325 April 254 71-80 Degrees -445

Friday 146 May 796 81-90 Degrees -487

Saturday 601 June 226 Above 90 Degrees -474

Sunday N/A July -11

August -39

September -123

October -185

November 59

December N/A

Intercept of 1662.9

*Coefficients with N/A are due to singularities

The results for the Dallas location showed poor correlation between our selected variables

and the revenue reported. This is most likely due to the change in reporting at the Dallas site. The

system that Top Golf had in place prior to January of 2014 poorly reported revenue on transactions

and showed in the results, generating a poor regression model. This is one of the reasons we

decided to repeat our methods in Austin where the new reporting software had been used since the

opening of the facility. Our results from the Austin model show that there is some validity to our

forecasted model. With time, as Top Golf collects more and more revenue, it is likely that more

trends will begin to emerge in the data and it will be easier to accurately forecast revenue.

The Austin location is the best site to look at if we are going to make any conclusions from

the model. First, we can look at the coefficients from the day of the week data. It is obvious that

revenues are much higher on weekends than they are on weekdays. In fact, as the week goes on,

revenue slightly increases with a peak on Saturday, a decrease on Sunday, and then a sharp decline

for Monday of the following week. Another conclusion that can be made is that revenue seems to be

higher when it is below 70 degrees or above 90 degrees. This could be due to customers knowing

that Top Golf has heated and air conditioned bays. Our hypothesis on this result is that people

would rather be outside when the weather is ideal. They know that if it is cold outside, they can

play Top Golf with heaters on when they could not play golf on a golf course. The same applies for

the high temperatures when customers may want to avoid staying out of the blistering Texas heat.

Another possible reason for increased revenue in high temperature is that these high temperatures

happen in the summer when customers are more likely to go out for entertainment than during the

school year.

14

We believe that the Austin regression model should be applied but cautiously so. The issues

with the model is that it is coming from less than a year of data, the results are somewhat correlated

but not very highly correlated, and the coefficients are large which could cause very large variations

in day to day predictions. The best application of this model would be to consider these variables

when TopGolf implements their new forecasting software in the future. I would also recommend

looking into many more variables and combinations of variables for these forecasts. Revenue

prediction is as much of an art as it is a science and it takes many iterations of trial and error to find

an accurate revenue model. It would be interesting to look at information such as how prior day

revenue affects the next day revenue. Other variables such as events in the area, promotions

(coupons, sales, etc), the number of memberships sold, and corporate bookings, may have large

impacts on revenue and should be considered. In addition to the many variables that need to be

considered, every day new data is added which may change the model. This means that analysis

will have to be done iteratively as new data is received. This is a large and complex issue which

requires a lot of dedication to solve.

Conclusions and Critique Based on our analysis and testing of various cases, we recommend to management that

correlation can be found, as shown in the Austin, TX site data. Also this information can be used to

forecast future demand using our model. However at this time we recommend that TopGolf,

continue to gather revenue data for all sites to create stronger models for all site locations.

Hopefully after multiple years of data collection, they will have more knowledge that will better

explain customer behaviors and clearly define all possible outliers to encompass into future

forecasting models that will allow them to create a very accurate model.

This potential for creating accurate forecasting models with future site date, the business

intelligence team to supply the rest of the company with demand forecast to help the business make

better daily operating decisions, such as staffing. Also, these results conclude that there is a

possibility for future models to be created to build software that can automatically calculate

amount of employees that should be staffed based on the weather forecast and date. This could

create huge opportunity for TopGolf to help reduce their operating cost and move towards

becoming a more efficient workplace.

As a senior design team we were faced with a lot of unavoidable problems. One area we

believe we could have approved upon would be to research outlying events that could affect data

before deciding to move forward with a project. It would have been helpful to know before running

multiple forecasts with the Dallas, TX site data that the whole company had switched over to a new

revenue collection model. TopGolf provided us with many years of data that we thought would

allow us to create very accurate forecasting models; however our team did not know that the

company had changed its revenue collection over to a new system a few years ago. Luckily we were

able to discover this information early on because any old revenue data before the change in

revenue collection would be pointless in forecasting future revenue and predicting demand.

However, this had a large impact in the forecasting of future revenues with their new pricing model

because very few data of this type existed. If our team was to have known about this change before

15

hand, it would have been easy to conclude that the data would not be mature enough to provide

sound results for TopGolf to implement in their daily business operations. As TopGolf does continue

to collect data from their sites over the next few years we are hoping that they will be able to

implement a more accurate version of the revenue forecasting models we created.

16

Appendix A - Austin Diagnosis

Appendix B - Austin Plot 2

17

Appendix C - Dallas Plot

Appendix D - Dallas Diagnostics

18

Appendix E- Dallas Plot (No 2014)

Appendix F - Dallas Diagnostics (No 2014)