Gsbs6002 Sample Data Report (1)

GSBS6002 Foundation of Business Analysis Assignment 1: Data Analysis Report

Postgraduate student of the University of Newcastle, Australia

Page 1 of 51

Table of Contents

Executive Summary .................................................................................................................... 5

1. Introduction ............................................................................................................................ 6

2. Data Screening ........................................................................................................................ 6

3. Demographic Profile of Respondents ............................................................................ 8

4. Data Analysis and Findings ............................................................................................ 11

5. Conclusion ............................................................................................................................. 50

References ................................................................................................................................... 51

Page 2 of 51

Executive Summary

This data analysis report provides recommendations for decision making with regards

to Divine Elegance, a fine upscale restaurant to be opened in a large metropolitan area.

The data screening, demographic profile of respondents, analysis methods, results of

each analysis and appropriate recommendations are discussed in detail.

The ten areas of analysis and findings are as follows:

Q1: Price of entrée items. Potential patrons are willing to pay around $18 for an entrée

item.

Q2: Amount spent per month by potential patrons. The average amount spent per month is expected to be less than $200.

Q3: Location of the restaurant. The best location for the restaurant is in location B.

Q4: Likelihood to patronise and household income level. Likelihood to patronise is likely to be high when income level is high and likely to be low if income level is

low. Q5: Restaurant décor. The restaurant should have simple décor.

Q6: Live entertainment. The restaurant should have jazz combo live entertainment.

Q7: Advertising in radio programmes. Advertisements should be placed during Rock and Easy Listening radio programmes.

Q8: Likelihood of patronage. The significant predictors of likelihood to patronise are:

Prefer Waterfront View, Prefer Formal Waitstaff Wearing Tuxedos, Prefer Large Variety of Entrées, Prefer Unusual Entrées, Prefer Simple Décor, Prefer Elegant Décor, and Prefer Jazz Combo.

Q9: Average age of probable and non-probable patrons. Average age of probable

patrons is higher than non-probable patrons. Q10: Gender of probable patrons. Both men and women are equally likely to be a

probable patron.

Recommendations are also given to conduct a subsequent qualitative analysis to gather

more insights.

Page 3 of 51

1. Introduction

The objective of this data analysis report is to provide recommendations for decision

making with regards to Divine Elegance, a fine upscale restaurant to be opened in a

large metropolitan area. The collected survey data is analysed to determine a variety of

factors such as the most successful location for the restaurant and price of entrée items.

The following sections provide details of the data screening process, demographic

profile of the respondents, analysis methods, results of each analysis and appropriate

recommendations.

2. Data Screening

Before data analysis, a data screening process was performed on the sample data to

determine if there were any errors. One of the contingency tables revealed the

following.

It is highly likely that the single case circled in red had an error since the respondent

responded “Very Unlikely” to patronize the new restaurant and yet responded “Yes” to

being a probable patron. The error could have been made by the respondent or during

Page 4 of 51

data entry. If it is indeed an error, the case should be removed from the sample data so

that it does not affect the subsequent analysis.

Note:

For the purpose of this assignment, no amendment is made to the sample data.

Page 5 of 51

3. Demographic Profile of Respondents

The sample data consists of 400 cases. 49% of the respondents are female and almost

28% of the respondents indicated that they will probably patronise the new restaurant.

Page 6 of 51

Respondents from Location B and Location C form the majority, taking up 30% and 55%

respectively.

The majority of respondents belong to households with before tax income in the range

$50,000 to $74,999, followed by $25,000 to $49,999, and $150,000+ in descending

order.

Page 7 of 51

68% of the respondents are married, while 23% are single. Family size of respondents

ranges from one to seven.

Page 8 of 51

4. Data Analysis and Findings

Definitions

Potential patrons Respondents who answered “Yes” to the question “Do you eat at this type of restaurant at least once every two weeks?”

In the sample data file, all cases are potential patrons.

Probable patrons Respondents who answered “Yes” to the question “Probable Patron of the new restaurant?”

Page 9 of 51

Q1 Price of entrée items

The following histogram shows the frequency of expected entrée item prices from the

respondents. The distribution is unimodal and skewed to the right, with a mean of

$18.84 and standard deviation of $9.828. Its median and mode are both $16.

Hypothesis testing A one-sample t-test for the mean is appropriate for comparing the mean value of a sample to an assumed population mean (Sharpe, De Veaux & Velleman, 2010).

Null hypothesis H0: µ = $18

Alternate hypothesis HA: µ ≠ $18

Where µ is the average expected price of an entrée item

Assumptions

Independence Assumption

Since the sample is random, the data values should be independent.

Randomisation Condition The data is obtained from a random sample.

Page 10 of 51

10% Condition The sample size is fewer than 10% of the total population.

Normal Population Assumption

It is assumed that the population follows a Normal distribution. Nearly Normal Condition The sample distribution is unimodal and skewed to the right. The Q-Q plots below also indicate that the sample distribution is not Normal, but since the sample size is large, this is not a concern.

Method/Technique A Student’s t-model with n-1 = 339 degrees of freedom is used and a one-sample t-test for the mean is performed.

Page 11 of 51

Failed to reject the null hypothesis at 5% significance level (t339 =1.567, p=0.118 ÷ 2 >

0.025). There is insufficient evidence to suggest that the average expected price is not

$18. The 95% confidence interval for the average expected price is ($18 - $0.2131, $18

+ $1.8837) = ($17.79, $19.88).

This means that potential patrons expect (and thus willing to pay) around $18 for an

entrée item, hence the entrée items should be priced around this amount.

Page 12 of 51

Q2 Amount spent per month by potential patrons

The following histogram shows the frequency of amount spent per month in restaurants

from the respondents. The distribution is multimodal and skewed to the right, with a

mean of $150.05 and standard deviation of $92.706. Its median and mode are $135 and

$110 respectively.

Hypothesis testing A one-sample t-test for the mean is appropriate for comparing the mean value of a sample to an assumed population mean (Sharpe et al., 2010).

Null hypothesis H0: µ = $200

Alternate hypothesis HA: µ ≠ $200

Where µ is the average amount spent per month in restaurants

Assumptions




Page 13 of 51



It is assumed that the population follows a Normal distribution. Nearly Normal Condition The sample distribution is multimodal and skewed to the right. The Q-Q plots below also indicate that the sample distribution is not Normal.

Method/Technique Under these conditions, a Student’s t-model with n-1 = 399 degrees of freedom is used and a one-sample t-test for the mean is performed with caution.

Page 14 of 51

The null hypothesis is rejected at 5% significance level (t399 = -10.775, p<0.001<0.025).

There is a statistically significant difference between that the average amount spent per

month in restaurants and $200. The 95% confidence interval for the average amount

spent per month is ($200 - $59.0602, $200 - $40.8348) = ($140.94, $159.17).

This means that it is not realistic to expect all patrons to spend an average of $200 per

month in restaurants, instead they are likely to spend between $141 to $159 on

average. Since this is significantly lower than $200, more marketing efforts are required

to attract more customers in order to sustain the business.

Page 15 of 51

Q3 Location of the restaurant

The following stacked bar charts (count and percentage) indicate that a larger

proportion of the potential patrons in location B is more likely to patronise the

restaurant than not, hence location B may be a good choice.

Page 16 of 51

The following stacked bar charts (count and percentage) indicate that a larger

proportion of potential patrons in location B prefer a drive of less than 30 minutes to

the restaurant, whereas a larger proportion of potential patrons in other locations do

not prefer a drive of less than 30 minutes. This also indicates that the restaurant should

be located in location B.

Page 17 of 51

The following side-by-side box plots show the amount spent per month in restaurants

by respondents from different locations. This indicates that respondents in Location B

spend more money in restaurants per month on average.

Hypothesis testing A one-way ANOVA test is appropriate for testing more than two independent means (Sharpe et al., 2010).

Null hypothesis H0: µA = µB = µC = µD

Alternate hypothesis HA: at least one mean is different

Where µA, µB, µC, µD = average amount spent per month from potential patrons in locations A, B, C and D respectively

Assumptions


Since the sample is random, the groups should be independent of each other.

Randomisation Condition The data is from a random sample.

Equal Variance Assumption

It is assumed that the variances of each group are equal. Similar Variance Condition The box plots of the four groups in the sample data above

Page 20 of 51

indicate that their variances are not similar.


It is assumed that the residuals follow a Normal distribution.

Method/Technique Under these conditions, a one-way ANOVA test is performed with caution on the means with a post hoc Tukey test.

The null hypothesis is rejected at 5% significance level (F3,396 =313.333, p<0.001<0.05).

There is a statistically significant difference between average amounts spent in

restaurants per month by potential patrons from the four locations.

Page 21 of 51

A Tukey post hoc test indicated that potential patrons from location B (M=$250.7250,

n=120) have a statistically significant higher average amount spent per month than

those from location C (M=$132.5455, n=220). Also, potential patrons from location C

have a statistically significant higher average amount spent per month than those from

location A or D.

It is recommended that the restaurant be located at location B because there will likely

be more patrons and they are likely to spend more on meals.

Page 22 of 51

Q4 Likelihood to patronise and household income level

The following stacked bar charts (count and percentage) indicate that potential patrons

with higher household income are more likely to patronise the restaurant.

Page 23 of 51

The following contingency table shows the likelihood to patronise the restaurant by

household income.

If likelihood to patronise is independent of income level, the expected values for each

cell would be:

More than 20% of cells have expected values less than 5. This means that the conditions

to perform a chi square test are not satisfied. Hence recoding is performed on the

household income variable so that the rows are combined to the following:

Page 24 of 51

Now, all cells have an expected value > 5.

Hypothesis testing A chi-square test of independence is appropriate for testing whether there is an association between two categorical/ordinal variables (Sharpe et al., 2010).

Null hypothesis H0: Likelihood to patronise and household income are

independent

Alternate hypothesis HA: Likelihood to patronise and household income are

not independent

Assumptions

Counted Data Condition

The data are counts of respondents categorised on two categorical/ordinal variables.




Sample Size Assumption

Expected Cell Frequency Condition No more than 20% of expected counts < 5 No expected counts < 1

Method/Technique The conditions are satisfied, so a χ2 model with (5 – 1) × (5 – 1) = 16 df is used and a chi- square test of independence is performed.

Page 25 of 51

The null hypothesis is rejected at 5% significance level (χ216 =633.842, p<0.001<0.05).

There is a moderate, positive statistically significant association between likelihood to

patronise and household income (Cramer’s V =0.629, p<0.001<0.05).

Note: Cramer’s V correlation coefficient is used because the table is larger than 2×2.

There is strong evidence to suggest an association between likelihood to patronise and

household income. Likelihood to patronise is likely to be high when income level is high

and likely to be low if income level is low. This is consistent with market research that

frequent restaurant diners are more likely to have household income of at least

$150,000 (Casual & Fine Dining, 2008). Hence marketing efforts should be directed at

potential patrons with higher income.

Page 26 of 51

Q5 Restaurant décor

The following bar charts show potential patrons’ preference for simple and elegant

décor. There are more respondents who prefer simple décor and also more respondents

who do not prefer elegant décor.

Page 27 of 51

Hypothesis testing A paired t-test is appropriate for testing the means of two variables where both are from the same cases (respondents) i.e. the testing is on the difference between the paired variables (Sharpe et al., 2010).

Null hypothesis H0: Mean score for Prefer Simple Décor and Prefer

Elegant Décor are the same: mean difference is zero: µd = 0

Alternate hypothesis HA: Mean score for Prefer Simple Décor and Prefer

Elegant Décor are not the same: mean difference is not zero: µd ≠ 0

Assumptions

Paired Data Assumption

The data for the two variables are paired because the same respondents answered both questions.


Since the sample is random, the pairwise differences should be independent.




It is assumed that population of pairwise differences follow a Normal model.

Nearly Normal Condition The sample distribution of pairwise differences is not Normal according to the following histogram and Q-Q plots.

Page 28 of 51

Page 29 of 51

Method/Technique Under these conditions, a Student’s t-model with n-1 = 399 degrees of freedom is used and a paired t-test is performed with caution.

The null hypothesis is rejected at 5% significance level (t399 =8.564, p<0.001<0.025).

The mean score for Prefer Simple Décor (M=3.58, SD=1.492) is significantly different

from the mean score for Prefer Elegant Décor (M=2.33, SD=1.510). The 95% confidence

interval of the difference is (0.961, 1.534).

This means that potential patrons’ average preference for simple décor is likely to be

between 1 to 1.5 survey points higher than elegant décor. Hence the restaurant should

have simple décor. This is contrary to other fine-dining restaurants that use exotic

décor as a “competitive weapon” (Duecy, 2005, p.65), hence a qualitative analysis is

recommended to understand the reasons (see Conclusion).

Page 30 of 51

Q6 Live entertainment

The following bar charts show potential patrons’ preference for string quartet and jazz

combo. There are more respondents that do not prefer string quartet and more

respondents that prefer jazz combo.

Page 31 of 51

Hypothesis testing A paired t-test is appropriate for testing the means of two variables where both are from the same cases (respondents) i.e. the testing is on the difference between the paired variables (Sharpe et al., 2010).

Null hypothesis H0: Mean score for Prefer String Quartet and Prefer

Jazz Combo are the same: mean difference is zero: µd = 0

Alternate hypothesis HA: Mean score for Prefer String Quartet and Prefer

Jazz Combo are not the same: mean difference is not zero: µd ≠ 0

Assumptions

Paired Data Assumption

The data for the two variables are paired because the same respondents answered both questions.


Since the sample is random, the pairwise differences should be independent.




It is assumed that population of pairwise differences follow a Normal model.

Nearly Normal Condition The sample distribution of pairwise differences is not Normal according to the following histogram and Q-Q plots.

Page 32 of 51

Page 33 of 51

Method/Technique Under these conditions, a Student’s t-model with n-1 = 399 degrees of freedom is used and a paired t-test is performed with caution.

The null hypothesis is rejected at 5% significance level (t399 = -10.030, p<0.001<0.025).

The mean score for Prefer String Quartet (M=2.50, SD=1.420) is significantly different

from the mean score for Prefer Jazz Combo (M=3.70, SD=1.221). The 95% confidence

interval for the difference is (-1.426, -0.959).

Page 34 of 51

This means that potential patrons’ average preference for string quartet is likely to be

between 1.4 to 1 survey points lower than jazz combo. Hence the restaurant should

provide live entertainment by a jazz combo band.

Page 35 of 51

Q7 Advertising in radio programmes

The following pie chart shows the percentage of potential patrons who listen to each

type of radio programme. The largest portion listens to Rock (39.75%), indicating that

advertising should be placed during Rock programmes.

Hypothesis testing A one-way ANOVA test is appropriate for testing more than two independent means (Sharpe et al., 2010).

Null hypothesis H0: µC = µE = µR = µT

Alternate hypothesis HA: at least one mean is different

Where µC, µE, µR, µT = average score of likelihood to patronise for potential patrons that listen to Country & Western, Easy Listening, Rock and Talk/News respectively

Assumptions


Since the sample is random, the groups should be independent of each other.

Page 36 of 51



It is assumed that the variances of each group are equal. Similar Variance Condition The box plots of the four groups in the sample data are shown below, which indicate that their variances are not similar.


It is assumed that the residuals follow a Normal distribution.

Method/Technique Under these conditions, a one-way ANOVA test is performed with caution on the means with a post hoc Tukey test.

The null hypothesis is rejected at 5% significance level (F3,381 =131.581, p<0.001< 0.05).

There is a statistically significant difference between average score of likelihood to

patronise of the different groups of potential patrons.

Page 37 of 51

A Tukey post hoc test indicated that potential patrons that listen to Easy Listening have

a statistically significant higher average score of likelihood to patronise (M=4.24, n=78)

than those that listen to Talk/News (M=3.65, n=82), Rock (M=2.72, n=159) and Country

& Western (M=1.61, n=66).

Page 38 of 51

This means that potential patrons who listen to Easy Listening are most likely to

patronise the restaurant. Advertisements should be placed at radio stations that provide

Easy Listening and Rock programmes. The former is to create awareness amongst those

who are mostly likely to patronise and the latter is to attract the largest pool of potential

patrons.

Page 39 of 51

Q8 Likelihood of patronage

Multiple regression is appropriate for analysing relationships between a dependent

variable (likelihood to patronise) and multiple independent variables (variables 11 –

20, age, family size and gender) (Sharpe et al., 2010).

Note:

The dependent variable is an ordinal variable, thus a more appropriate regression analysis

is ordinal regression. For the purpose of this assignment, multiple regression is performed

instead.

Hypothesis Testing (F-test) Null hypothesis H0: β1 = β2 = β3 = . . . = β13 = 0

Alternate hypothesis HA: at least one β ≠ 0

Where β1 to β13 = slope coefficients of variables 11 – 20, age, family size and gender.

Assumptions

Linearity Assumption It is assumed that there is a linear relationship between the dependent variable and each of the predictor variables.





It is assumed that the variances of residuals are equal.

Normality Assumption It is assumed that the residuals follow a Normal distribution.

Method/Technique Under these conditions, a multiple regression analysis is performed with caution.

Page 40 of 51

The null hypothesis is rejected at 5% significance level (F13,386 =59.427, p<0.001<0.05).

It is statistically significant that at least one slope coefficient is not zero. The adjusted R2

indicates that 65.6% of the variation in likelihood to patronise can be explained by the

regression model.

Page 41 of 51

When all the predictor variables are considered simultaneously, regression equation is:

Likelihood to patronise = 1.4 + 0.189(Prefer Waterfront View) + 0.002(Prefer Drive Less than 30 Minutes) + 0.305(Prefer Formal Waitstaff Wearing Tuxedos) + 0.091(Prefer Unusual Desserts) – 0.194(Prefer Large Variety of Entrées) + 0.130(Prefer Unusual Entrées) – 0.288(Prefer Simple Décor) + 0.162(Prefer Elegant Décor) + 0.075(Prefer String Quartet) + 0.131(Prefer Jazz Combo) + 0.003(Age) + 0.000(Family Size) – 0.34(Gender)

The results indicate that the significant predictors of likelihood to patronise are:

Prefer Waterfront View (t = 3.141, p = 0.002) Prefer Formal Waitstaff Wearing Tuxedos (t = 4.298, p < 0.001) Prefer Large Variety of Entrées (t = -3.390, p = 0.001) Prefer Unusual Entrées (t = 2.082, p = 0.038) Prefer Simple Décor (t = -4.249, p < 0.001) Prefer Elegant Décor (t = 2.343, p = 0.020) Prefer Jazz Combo (t = 3.009, p = 0.003)

To construct a regression model with only significant predictors, the regression analysis

is re-performed without the non-significant predictors.

Page 42 of 51

The refined regression equation is:

Likelihood to patronise = 2.143 + 0.149(Prefer Waterfront View) + 0.327(Prefer Formal Waitstaff Wearing Tuxedos) – 0.206(Prefer Large Variety of Entrées) + 0.163(Prefer Unusual Entrées) – 0.336(Prefer Simple Décor) + 0.194(Prefer Elegant Décor) + 0.112(Prefer Jazz Combo)

This means that the above variables have a significant impact on whether a potential

patron is likely to patronise the restaurant. Hence they should be considered in detail

during the decision making process.

Page 43 of 51

Q9 Average age of probable and non-probable patrons

The following shows the histograms and side-by-side box-plots of the ages of probable

and non-probable patrons. They indicate that the average age of probable patrons

(M=62.15, SD=4.779) is likely to be older than that of non-probable patrons (M=51.61,

SD=9.262).

Page 44 of 51

Hypothesis testing A two-sample t-test (independent samples t-test) is appropriate for comparing the mean value of two independent samples (Sharpe et al., 2010).

Null hypothesis H0: µprobable = µnon-probable

Alternate hypothesis HA: µprobable ≠ µnon-probable

Where µprobable is the average age of probable patrons µnon-probable is the average age of non-probable patrons

Assumptions


Since the sample is random, the data values within each group should be independent.




It is assumed that the population of both groups follow a Normal distribution.

Page 45 of 51

Nearly Normal Condition The histograms show that the distributions are skewed for both groups. However, since the sample size is large, this is not a concern.

Independent groups assumption

Probable and non-probable patrons are independent groups and there is no reason to think that those in one group can affect the other group.

Method/Technique A Student’s t-model is used and a two-sample (independent samples) t-test for equality of means is performed.

Based on Levene’s test for equality of variances, F=22.723, p<0.001<0.05, so the “equal

variances not assumed t-test” is used.

The null hypothesis is rejected at 5% significance level (t365.652 =14.868,

p<0.001<0.025). The average age of probable patrons (M=62.1532, SD=4.77912,

n=111) is significantly different from the average age of non-probable patrons

(M=51.6125, SD=9.26212, n=289). The 95% confidence interval of the difference is

(9.14657, 11.93482).

Page 46 of 51

This means that the average age of probable patrons is likely between 9 to 12 years

older than that of non-probable patrons, which is consistent with market research that a

larger proportion of frequent restaurant diners belong to older age groups (Casual &

Fine Dining, 2008). The restaurant should be designed to cater to the needs of elderly

patrons e.g. sufficient movement space for wheelchairs.

Page 47 of 51

Q10 Gender of probable patrons

The following contingency table shows the number of probable patrons by gender.

If probable patron is independent of gender, the expected values for each cell would be:

All cells have an expected value > 5.

Hypothesis testing A chi-square test of independence is appropriate for testing whether there is an association between two categorical variables (Sharpe et al., 2010).

Null hypothesis H0: Probable patron and gender are independent

Alternate hypothesis HA: Probable patron and gender are not independent

Assumptions

Counted Data Condition

The data are counts of respondents categorised on two categorical/ordinal variables.




Sample Size Assumption

Expected Cell Frequency Condition No more than 20% of expected counts < 5 No expected counts < 1

Page 48 of 51

Method/Technique The conditions are satisfied, so a χ2 model with (2 – 1) × (2 – 1) = 1 df is used and a chi- square test of independence is performed.

Failed to reject the null hypothesis at 5% significance level (χ21 =0.285, p=0.593>0.05).

There is insufficient evidence to suggest a relationship between probable patron and

gender.

This means that men and women are equally likely to be a probable patron, hence

marketing efforts should be carried out for both gender.

Page 49 of 51

5. Conclusion

In summary, it is recommended that the restaurant be located in Location B (post codes

3, 4 and 5). It should have simple décor, jazz combo live entertainment and the price of

entrée items should be around $18. More marketing efforts should be carried out on

both male and female potential patrons with higher income and during Rock and Easy

Listening radio programmes. Last but not least, the restaurant should cater to the needs

of elderly patrons. However, there is insufficient data to provide evidence for the

conditions required under the forecasting model.

It is recommended to perform a qualitative analysis (e.g. interviews and focus groups)

following this quantitative analysis using an explanatory sequential approach or

embedded approach (Creswell, 2011). Such mixed methods will help to provide more

comprehensive insights and better understanding of the key success factors.

(2,059 words computed by Microsoft Word from Introduction to Conclusion, excluding

tables and charts.)

Page 50 of 51

References

Casual & Fine Dining. (2008). Leisure Market Research Handbook (pp. 189-192). Richard

K. Miller & Associates.

Creswell, J. W. (2011). Educational Research: Planning, Conducting, and Evaluating

Quantitative and Qualitative Research (4th ed.). Upper Saddle River, New Jersey:

Pearson Education Inc.

Duecy, E. (2005, July 18). Fine-dining restaurants: Exotic decors a 'competitive weapon'.

Nation's Restaurant News, 39(29), 65.

Sharpe, N. D., De Veaux, R. D., & Velleman, P. (2010). Business Statistics (2nd ed.). Upper

Saddle River, New Jersey: Pearson Education Inc.

Page 51 of 51

Documents

Gsbs6002 Sample Data Report (1)