Upload
doctorfay
View
539
Download
1
Tags:
Embed Size (px)
Citation preview
GSBS6002 Foundation of Business Analysis Assignment 1: Data Analysis Report
Postgraduate student of the University of Newcastle, Australia
Page 1 of 51
Table of Contents
Executive Summary .................................................................................................................... 5
1. Introduction ............................................................................................................................ 6
2. Data Screening ........................................................................................................................ 6
3. Demographic Profile of Respondents ............................................................................ 8
4. Data Analysis and Findings ............................................................................................ 11
5. Conclusion ............................................................................................................................. 50
References ................................................................................................................................... 51
Page 2 of 51
Executive Summary
This data analysis report provides recommendations for decision making with regards
to Divine Elegance, a fine upscale restaurant to be opened in a large metropolitan area.
The data screening, demographic profile of respondents, analysis methods, results of
each analysis and appropriate recommendations are discussed in detail.
The ten areas of analysis and findings are as follows:
Q1: Price of entrée items. Potential patrons are willing to pay around $18 for an entrée
item.
Q2: Amount spent per month by potential patrons. The average amount spent per month is expected to be less than $200.
Q3: Location of the restaurant. The best location for the restaurant is in location B.
Q4: Likelihood to patronise and household income level. Likelihood to patronise is likely to be high when income level is high and likely to be low if income level is
low. Q5: Restaurant décor. The restaurant should have simple décor.
Q6: Live entertainment. The restaurant should have jazz combo live entertainment.
Q7: Advertising in radio programmes. Advertisements should be placed during Rock and Easy Listening radio programmes.
Q8: Likelihood of patronage. The significant predictors of likelihood to patronise are:
Prefer Waterfront View, Prefer Formal Waitstaff Wearing Tuxedos, Prefer Large Variety of Entrées, Prefer Unusual Entrées, Prefer Simple Décor, Prefer Elegant Décor, and Prefer Jazz Combo.
Q9: Average age of probable and non-probable patrons. Average age of probable
patrons is higher than non-probable patrons. Q10: Gender of probable patrons. Both men and women are equally likely to be a
probable patron.
Recommendations are also given to conduct a subsequent qualitative analysis to gather
more insights.
Page 3 of 51
1. Introduction
The objective of this data analysis report is to provide recommendations for decision
making with regards to Divine Elegance, a fine upscale restaurant to be opened in a
large metropolitan area. The collected survey data is analysed to determine a variety of
factors such as the most successful location for the restaurant and price of entrée items.
The following sections provide details of the data screening process, demographic
profile of the respondents, analysis methods, results of each analysis and appropriate
recommendations.
2. Data Screening
Before data analysis, a data screening process was performed on the sample data to
determine if there were any errors. One of the contingency tables revealed the
following.
It is highly likely that the single case circled in red had an error since the respondent
responded “Very Unlikely” to patronize the new restaurant and yet responded “Yes” to
being a probable patron. The error could have been made by the respondent or during
Page 4 of 51
data entry. If it is indeed an error, the case should be removed from the sample data so
that it does not affect the subsequent analysis.
Note:
For the purpose of this assignment, no amendment is made to the sample data.
Page 5 of 51
3. Demographic Profile of Respondents
The sample data consists of 400 cases. 49% of the respondents are female and almost
28% of the respondents indicated that they will probably patronise the new restaurant.
Page 6 of 51
Respondents from Location B and Location C form the majority, taking up 30% and 55%
respectively.
The majority of respondents belong to households with before tax income in the range
$50,000 to $74,999, followed by $25,000 to $49,999, and $150,000+ in descending
order.
Page 7 of 51
68% of the respondents are married, while 23% are single. Family size of respondents
ranges from one to seven.
Page 8 of 51
4. Data Analysis and Findings
Definitions
Potential patrons Respondents who answered “Yes” to the question “Do you eat at this type of restaurant at least once every two weeks?”
In the sample data file, all cases are potential patrons.
Probable patrons Respondents who answered “Yes” to the question “Probable Patron of the new restaurant?”
Page 9 of 51
Q1 Price of entrée items
The following histogram shows the frequency of expected entrée item prices from the
respondents. The distribution is unimodal and skewed to the right, with a mean of
$18.84 and standard deviation of $9.828. Its median and mode are both $16.
Hypothesis testing A one-sample t-test for the mean is appropriate for comparing the mean value of a sample to an assumed population mean (Sharpe, De Veaux & Velleman, 2010).
Null hypothesis H0: µ = $18
Alternate hypothesis HA: µ ≠ $18
Where µ is the average expected price of an entrée item
Assumptions
Independence Assumption
Since the sample is random, the data values should be independent.
Randomisation Condition The data is obtained from a random sample.
Page 10 of 51
10% Condition The sample size is fewer than 10% of the total population.
Normal Population Assumption
It is assumed that the population follows a Normal distribution. Nearly Normal Condition The sample distribution is unimodal and skewed to the right. The Q-Q plots below also indicate that the sample distribution is not Normal, but since the sample size is large, this is not a concern.
Method/Technique A Student’s t-model with n-1 = 339 degrees of freedom is used and a one-sample t-test for the mean is performed.
Page 11 of 51
Failed to reject the null hypothesis at 5% significance level (t339 =1.567, p=0.118 ÷ 2 >
0.025). There is insufficient evidence to suggest that the average expected price is not
$18. The 95% confidence interval for the average expected price is ($18 - $0.2131, $18
+ $1.8837) = ($17.79, $19.88).
This means that potential patrons expect (and thus willing to pay) around $18 for an
entrée item, hence the entrée items should be priced around this amount.
Page 12 of 51
Q2 Amount spent per month by potential patrons
The following histogram shows the frequency of amount spent per month in restaurants
from the respondents. The distribution is multimodal and skewed to the right, with a
mean of $150.05 and standard deviation of $92.706. Its median and mode are $135 and
$110 respectively.
Hypothesis testing A one-sample t-test for the mean is appropriate for comparing the mean value of a sample to an assumed population mean (Sharpe et al., 2010).
Null hypothesis H0: µ = $200
Alternate hypothesis HA: µ ≠ $200
Where µ is the average amount spent per month in restaurants
Assumptions
Independence Assumption
Since the sample is random, the data values should be independent.
Randomisation Condition The data is obtained from a random sample.
Page 13 of 51
10% Condition The sample size is fewer than 10% of the total population.
Normal Population Assumption
It is assumed that the population follows a Normal distribution. Nearly Normal Condition The sample distribution is multimodal and skewed to the right. The Q-Q plots below also indicate that the sample distribution is not Normal.
Method/Technique Under these conditions, a Student’s t-model with n-1 = 399 degrees of freedom is used and a one-sample t-test for the mean is performed with caution.
Page 14 of 51
The null hypothesis is rejected at 5% significance level (t399 = -10.775, p<0.001<0.025).
There is a statistically significant difference between that the average amount spent per
month in restaurants and $200. The 95% confidence interval for the average amount
spent per month is ($200 - $59.0602, $200 - $40.8348) = ($140.94, $159.17).
This means that it is not realistic to expect all patrons to spend an average of $200 per
month in restaurants, instead they are likely to spend between $141 to $159 on
average. Since this is significantly lower than $200, more marketing efforts are required
to attract more customers in order to sustain the business.
Page 15 of 51
Q3 Location of the restaurant
The following stacked bar charts (count and percentage) indicate that a larger
proportion of the potential patrons in location B is more likely to patronise the
restaurant than not, hence location B may be a good choice.
Page 16 of 51
The following stacked bar charts (count and percentage) indicate that a larger
proportion of potential patrons in location B prefer a drive of less than 30 minutes to
the restaurant, whereas a larger proportion of potential patrons in other locations do
not prefer a drive of less than 30 minutes. This also indicates that the restaurant should
be located in location B.
Page 17 of 51
The following side-by-side box plots show the amount spent per month in restaurants
by respondents from different locations. This indicates that respondents in Location B
spend more money in restaurants per month on average.
Hypothesis testing A one-way ANOVA test is appropriate for testing more than two independent means (Sharpe et al., 2010).
Null hypothesis H0: µA = µB = µC = µD
Alternate hypothesis HA: at least one mean is different
Where µA, µB, µC, µD = average amount spent per month from potential patrons in locations A, B, C and D respectively
Assumptions
Independence Assumption
Since the sample is random, the groups should be independent of each other.
Randomisation Condition The data is from a random sample.
Equal Variance Assumption
It is assumed that the variances of each group are equal. Similar Variance Condition The box plots of the four groups in the sample data above
Page 20 of 51
indicate that their variances are not similar.
Normal Population Assumption
It is assumed that the residuals follow a Normal distribution.
Method/Technique Under these conditions, a one-way ANOVA test is performed with caution on the means with a post hoc Tukey test.
The null hypothesis is rejected at 5% significance level (F3,396 =313.333, p<0.001<0.05).
There is a statistically significant difference between average amounts spent in
restaurants per month by potential patrons from the four locations.
Page 21 of 51
A Tukey post hoc test indicated that potential patrons from location B (M=$250.7250,
n=120) have a statistically significant higher average amount spent per month than
those from location C (M=$132.5455, n=220). Also, potential patrons from location C
have a statistically significant higher average amount spent per month than those from
location A or D.
It is recommended that the restaurant be located at location B because there will likely
be more patrons and they are likely to spend more on meals.
Page 22 of 51
Q4 Likelihood to patronise and household income level
The following stacked bar charts (count and percentage) indicate that potential patrons
with higher household income are more likely to patronise the restaurant.
Page 23 of 51
The following contingency table shows the likelihood to patronise the restaurant by
household income.
If likelihood to patronise is independent of income level, the expected values for each
cell would be:
More than 20% of cells have expected values less than 5. This means that the conditions
to perform a chi square test are not satisfied. Hence recoding is performed on the
household income variable so that the rows are combined to the following:
Page 24 of 51
Now, all cells have an expected value > 5.
Hypothesis testing A chi-square test of independence is appropriate for testing whether there is an association between two categorical/ordinal variables (Sharpe et al., 2010).
Null hypothesis H0: Likelihood to patronise and household income are
independent
Alternate hypothesis HA: Likelihood to patronise and household income are
not independent
Assumptions
Counted Data Condition
The data are counts of respondents categorised on two categorical/ordinal variables.
Independence Assumption
Since the sample is random, the data values should be independent.
Randomisation Condition The data is from a random sample.
Sample Size Assumption
Expected Cell Frequency Condition No more than 20% of expected counts < 5 No expected counts < 1
Method/Technique The conditions are satisfied, so a χ2 model with (5 – 1) × (5 – 1) = 16 df is used and a chi- square test of independence is performed.
Page 25 of 51
The null hypothesis is rejected at 5% significance level (χ216 =633.842, p<0.001<0.05).
There is a moderate, positive statistically significant association between likelihood to
patronise and household income (Cramer’s V =0.629, p<0.001<0.05).
Note: Cramer’s V correlation coefficient is used because the table is larger than 2×2.
There is strong evidence to suggest an association between likelihood to patronise and
household income. Likelihood to patronise is likely to be high when income level is high
and likely to be low if income level is low. This is consistent with market research that
frequent restaurant diners are more likely to have household income of at least
$150,000 (Casual & Fine Dining, 2008). Hence marketing efforts should be directed at
potential patrons with higher income.
Page 26 of 51
Q5 Restaurant décor
The following bar charts show potential patrons’ preference for simple and elegant
décor. There are more respondents who prefer simple décor and also more respondents
who do not prefer elegant décor.
Page 27 of 51
Hypothesis testing A paired t-test is appropriate for testing the means of two variables where both are from the same cases (respondents) i.e. the testing is on the difference between the paired variables (Sharpe et al., 2010).
Null hypothesis H0: Mean score for Prefer Simple Décor and Prefer
Elegant Décor are the same: mean difference is zero: µd = 0
Alternate hypothesis HA: Mean score for Prefer Simple Décor and Prefer
Elegant Décor are not the same: mean difference is not zero: µd ≠ 0
Assumptions
Paired Data Assumption
The data for the two variables are paired because the same respondents answered both questions.
Independence Assumption
Since the sample is random, the pairwise differences should be independent.
Randomisation Condition The data is obtained from a random sample.
10% Condition The sample size is fewer than 10% of the total population.
Normal Population Assumption
It is assumed that population of pairwise differences follow a Normal model.
Nearly Normal Condition The sample distribution of pairwise differences is not Normal according to the following histogram and Q-Q plots.
Page 28 of 51
Page 29 of 51
Method/Technique Under these conditions, a Student’s t-model with n-1 = 399 degrees of freedom is used and a paired t-test is performed with caution.
The null hypothesis is rejected at 5% significance level (t399 =8.564, p<0.001<0.025).
The mean score for Prefer Simple Décor (M=3.58, SD=1.492) is significantly different
from the mean score for Prefer Elegant Décor (M=2.33, SD=1.510). The 95% confidence
interval of the difference is (0.961, 1.534).
This means that potential patrons’ average preference for simple décor is likely to be
between 1 to 1.5 survey points higher than elegant décor. Hence the restaurant should
have simple décor. This is contrary to other fine-dining restaurants that use exotic
décor as a “competitive weapon” (Duecy, 2005, p.65), hence a qualitative analysis is
recommended to understand the reasons (see Conclusion).
Page 30 of 51
Q6 Live entertainment
The following bar charts show potential patrons’ preference for string quartet and jazz
combo. There are more respondents that do not prefer string quartet and more
respondents that prefer jazz combo.
Page 31 of 51
Hypothesis testing A paired t-test is appropriate for testing the means of two variables where both are from the same cases (respondents) i.e. the testing is on the difference between the paired variables (Sharpe et al., 2010).
Null hypothesis H0: Mean score for Prefer String Quartet and Prefer
Jazz Combo are the same: mean difference is zero: µd = 0
Alternate hypothesis HA: Mean score for Prefer String Quartet and Prefer
Jazz Combo are not the same: mean difference is not zero: µd ≠ 0
Assumptions
Paired Data Assumption
The data for the two variables are paired because the same respondents answered both questions.
Independence Assumption
Since the sample is random, the pairwise differences should be independent.
Randomisation Condition The data is obtained from a random sample.
10% Condition The sample size is fewer than 10% of the total population.
Normal Population Assumption
It is assumed that population of pairwise differences follow a Normal model.
Nearly Normal Condition The sample distribution of pairwise differences is not Normal according to the following histogram and Q-Q plots.
Page 32 of 51
Page 33 of 51
Method/Technique Under these conditions, a Student’s t-model with n-1 = 399 degrees of freedom is used and a paired t-test is performed with caution.
The null hypothesis is rejected at 5% significance level (t399 = -10.030, p<0.001<0.025).
The mean score for Prefer String Quartet (M=2.50, SD=1.420) is significantly different
from the mean score for Prefer Jazz Combo (M=3.70, SD=1.221). The 95% confidence
interval for the difference is (-1.426, -0.959).
Page 34 of 51
This means that potential patrons’ average preference for string quartet is likely to be
between 1.4 to 1 survey points lower than jazz combo. Hence the restaurant should
provide live entertainment by a jazz combo band.
Page 35 of 51
Q7 Advertising in radio programmes
The following pie chart shows the percentage of potential patrons who listen to each
type of radio programme. The largest portion listens to Rock (39.75%), indicating that
advertising should be placed during Rock programmes.
Hypothesis testing A one-way ANOVA test is appropriate for testing more than two independent means (Sharpe et al., 2010).
Null hypothesis H0: µC = µE = µR = µT
Alternate hypothesis HA: at least one mean is different
Where µC, µE, µR, µT = average score of likelihood to patronise for potential patrons that listen to Country & Western, Easy Listening, Rock and Talk/News respectively
Assumptions
Independence Assumption
Since the sample is random, the groups should be independent of each other.
Page 36 of 51
Randomisation Condition The data is from a random sample.
Equal Variance Assumption
It is assumed that the variances of each group are equal. Similar Variance Condition The box plots of the four groups in the sample data are shown below, which indicate that their variances are not similar.
Normal Population Assumption
It is assumed that the residuals follow a Normal distribution.
Method/Technique Under these conditions, a one-way ANOVA test is performed with caution on the means with a post hoc Tukey test.
The null hypothesis is rejected at 5% significance level (F3,381 =131.581, p<0.001< 0.05).
There is a statistically significant difference between average score of likelihood to
patronise of the different groups of potential patrons.
Page 37 of 51
A Tukey post hoc test indicated that potential patrons that listen to Easy Listening have
a statistically significant higher average score of likelihood to patronise (M=4.24, n=78)
than those that listen to Talk/News (M=3.65, n=82), Rock (M=2.72, n=159) and Country
& Western (M=1.61, n=66).
Page 38 of 51
This means that potential patrons who listen to Easy Listening are most likely to
patronise the restaurant. Advertisements should be placed at radio stations that provide
Easy Listening and Rock programmes. The former is to create awareness amongst those
who are mostly likely to patronise and the latter is to attract the largest pool of potential
patrons.
Page 39 of 51
Q8 Likelihood of patronage
Multiple regression is appropriate for analysing relationships between a dependent
variable (likelihood to patronise) and multiple independent variables (variables 11 –
20, age, family size and gender) (Sharpe et al., 2010).
Note:
The dependent variable is an ordinal variable, thus a more appropriate regression analysis
is ordinal regression. For the purpose of this assignment, multiple regression is performed
instead.
Hypothesis Testing (F-test) Null hypothesis H0: β1 = β2 = β3 = . . . = β13 = 0
Alternate hypothesis HA: at least one β ≠ 0
Where β1 to β13 = slope coefficients of variables 11 – 20, age, family size and gender.
Assumptions
Linearity Assumption It is assumed that there is a linear relationship between the dependent variable and each of the predictor variables.
Independence Assumption
Since the sample is random, the data values should be independent.
Randomisation Condition The data is obtained from a random sample.
Equal Variance Assumption
It is assumed that the variances of residuals are equal.
Normality Assumption It is assumed that the residuals follow a Normal distribution.
Method/Technique Under these conditions, a multiple regression analysis is performed with caution.
Page 40 of 51
The null hypothesis is rejected at 5% significance level (F13,386 =59.427, p<0.001<0.05).
It is statistically significant that at least one slope coefficient is not zero. The adjusted R2
indicates that 65.6% of the variation in likelihood to patronise can be explained by the
regression model.
Page 41 of 51
When all the predictor variables are considered simultaneously, regression equation is:
Likelihood to patronise = 1.4 + 0.189(Prefer Waterfront View) + 0.002(Prefer Drive Less than 30 Minutes) + 0.305(Prefer Formal Waitstaff Wearing Tuxedos) + 0.091(Prefer Unusual Desserts) – 0.194(Prefer Large Variety of Entrées) + 0.130(Prefer Unusual Entrées) – 0.288(Prefer Simple Décor) + 0.162(Prefer Elegant Décor) + 0.075(Prefer String Quartet) + 0.131(Prefer Jazz Combo) + 0.003(Age) + 0.000(Family Size) – 0.34(Gender)
The results indicate that the significant predictors of likelihood to patronise are:
Prefer Waterfront View (t = 3.141, p = 0.002) Prefer Formal Waitstaff Wearing Tuxedos (t = 4.298, p < 0.001) Prefer Large Variety of Entrées (t = -3.390, p = 0.001) Prefer Unusual Entrées (t = 2.082, p = 0.038) Prefer Simple Décor (t = -4.249, p < 0.001) Prefer Elegant Décor (t = 2.343, p = 0.020) Prefer Jazz Combo (t = 3.009, p = 0.003)
To construct a regression model with only significant predictors, the regression analysis
is re-performed without the non-significant predictors.
Page 42 of 51
The refined regression equation is:
Likelihood to patronise = 2.143 + 0.149(Prefer Waterfront View) + 0.327(Prefer Formal Waitstaff Wearing Tuxedos) – 0.206(Prefer Large Variety of Entrées) + 0.163(Prefer Unusual Entrées) – 0.336(Prefer Simple Décor) + 0.194(Prefer Elegant Décor) + 0.112(Prefer Jazz Combo)
This means that the above variables have a significant impact on whether a potential
patron is likely to patronise the restaurant. Hence they should be considered in detail
during the decision making process.
Page 43 of 51
Q9 Average age of probable and non-probable patrons
The following shows the histograms and side-by-side box-plots of the ages of probable
and non-probable patrons. They indicate that the average age of probable patrons
(M=62.15, SD=4.779) is likely to be older than that of non-probable patrons (M=51.61,
SD=9.262).
Page 44 of 51
Hypothesis testing A two-sample t-test (independent samples t-test) is appropriate for comparing the mean value of two independent samples (Sharpe et al., 2010).
Null hypothesis H0: µprobable = µnon-probable
Alternate hypothesis HA: µprobable ≠ µnon-probable
Where µprobable is the average age of probable patrons µnon-probable is the average age of non-probable patrons
Assumptions
Independence Assumption
Since the sample is random, the data values within each group should be independent.
Randomisation Condition The data is from a random sample.
10% Condition The sample size is fewer than 10% of the total population.
Normal Population Assumption
It is assumed that the population of both groups follow a Normal distribution.
Page 45 of 51
Nearly Normal Condition The histograms show that the distributions are skewed for both groups. However, since the sample size is large, this is not a concern.
Independent groups assumption
Probable and non-probable patrons are independent groups and there is no reason to think that those in one group can affect the other group.
Method/Technique A Student’s t-model is used and a two-sample (independent samples) t-test for equality of means is performed.
Based on Levene’s test for equality of variances, F=22.723, p<0.001<0.05, so the “equal
variances not assumed t-test” is used.
The null hypothesis is rejected at 5% significance level (t365.652 =14.868,
p<0.001<0.025). The average age of probable patrons (M=62.1532, SD=4.77912,
n=111) is significantly different from the average age of non-probable patrons
(M=51.6125, SD=9.26212, n=289). The 95% confidence interval of the difference is
(9.14657, 11.93482).
Page 46 of 51
This means that the average age of probable patrons is likely between 9 to 12 years
older than that of non-probable patrons, which is consistent with market research that a
larger proportion of frequent restaurant diners belong to older age groups (Casual &
Fine Dining, 2008). The restaurant should be designed to cater to the needs of elderly
patrons e.g. sufficient movement space for wheelchairs.
Page 47 of 51
Q10 Gender of probable patrons
The following contingency table shows the number of probable patrons by gender.
If probable patron is independent of gender, the expected values for each cell would be:
All cells have an expected value > 5.
Hypothesis testing A chi-square test of independence is appropriate for testing whether there is an association between two categorical variables (Sharpe et al., 2010).
Null hypothesis H0: Probable patron and gender are independent
Alternate hypothesis HA: Probable patron and gender are not independent
Assumptions
Counted Data Condition
The data are counts of respondents categorised on two categorical/ordinal variables.
Independence Assumption
Since the sample is random, the data values should be independent.
Randomisation Condition The data is from a random sample.
Sample Size Assumption
Expected Cell Frequency Condition No more than 20% of expected counts < 5 No expected counts < 1
Page 48 of 51
Method/Technique The conditions are satisfied, so a χ2 model with (2 – 1) × (2 – 1) = 1 df is used and a chi- square test of independence is performed.
Failed to reject the null hypothesis at 5% significance level (χ21 =0.285, p=0.593>0.05).
There is insufficient evidence to suggest a relationship between probable patron and
gender.
This means that men and women are equally likely to be a probable patron, hence
marketing efforts should be carried out for both gender.
Page 49 of 51
5. Conclusion
In summary, it is recommended that the restaurant be located in Location B (post codes
3, 4 and 5). It should have simple décor, jazz combo live entertainment and the price of
entrée items should be around $18. More marketing efforts should be carried out on
both male and female potential patrons with higher income and during Rock and Easy
Listening radio programmes. Last but not least, the restaurant should cater to the needs
of elderly patrons. However, there is insufficient data to provide evidence for the
conditions required under the forecasting model.
It is recommended to perform a qualitative analysis (e.g. interviews and focus groups)
following this quantitative analysis using an explanatory sequential approach or
embedded approach (Creswell, 2011). Such mixed methods will help to provide more
comprehensive insights and better understanding of the key success factors.
(2,059 words computed by Microsoft Word from Introduction to Conclusion, excluding
tables and charts.)
Page 50 of 51
References
Casual & Fine Dining. (2008). Leisure Market Research Handbook (pp. 189-192). Richard
K. Miller & Associates.
Creswell, J. W. (2011). Educational Research: Planning, Conducting, and Evaluating
Quantitative and Qualitative Research (4th ed.). Upper Saddle River, New Jersey:
Pearson Education Inc.
Duecy, E. (2005, July 18). Fine-dining restaurants: Exotic decors a 'competitive weapon'.
Nation's Restaurant News, 39(29), 65.
Sharpe, N. D., De Veaux, R. D., & Velleman, P. (2010). Business Statistics (2nd ed.). Upper
Saddle River, New Jersey: Pearson Education Inc.
Page 51 of 51