59
Data Analysis: Data Analysis: Review and Review and Practical Practical Application using Application using SPSS SPSS

Data Analysis: Data Analysis: Review and Practical Application using SPSS

  • View
    234

  • Download
    7

Embed Size (px)

Citation preview

Page 1: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Data Analysis:Data Analysis:

Review and Practical Review and Practical

Application using SPSSApplication using SPSS

Page 2: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Data of InterestData of Interest

National Insurance Company– 1000 questionnaires sent– 285 respondents

Questionnaire Presentation – Copy given in class

Page 3: Data Analysis: Data Analysis: Review and Practical Application using SPSS

CodingCoding

Coding broadly refers to the set of all tasks associated with transforming edited responses into a form that is ready for analysis

Steps– Transforming responses to each question into a set of

meaningful categories– Assigning numerical codes to the categories– Creating a data set suitable for computer analysis

Page 4: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Transforming Responses into Transforming Responses into Meaningful CategoriesMeaningful Categories

A structured question is pre-categorizedResponses to a nonstructured or open-ended

question to be grouped into a meaningful and manageable set of categories

Q 1:Q 1: In this questionnaire, how many non-categorized questions?

Page 5: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Missing-Value CategoryMissing-Value Category

A missing value can stem from– A respondent's refusal to answer a question– An interviewer's failure to ask a question or

record an answer or a "don't know" that does not seem legitimate

Best way to treat missing value responses– Sound questionnaire design– Tight control over fieldwork

Page 6: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Assigning Numerical CodesAssigning Numerical Codes

Assign appropriate numerical codes to responses that are not already in quantified form

To assign numerical codes, the researcher should facilitate computer manipulation and analysis of the responses

Page 7: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Multiple Response Question Multiple Response Question ––Rank Order QuestionRank Order Question

Please rank the following Insurance companies by placing a 1 beside the company you think is best overall, a 2 beside the company you think is second best, and so on.__________Progressive__________All State__________National

Q2Q2 How would you code the previous question to be added to the questionnaire ?

This question requires as many variables (and columns) as there are objects to be ranked: 3 separate variables are needed

Page 8: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Creating a Data SetCreating a Data Set

Organized collection of data records Each sample unit within the data set is called a

Case or Observation Structure of a Data Set

– The number of observations = n– The total number of variables embedded in the

questionnaire is m, then Data set = n x m matrix of numbers Importance of Coding Sheet: Anybody can

enter /check data set. (Copy of coding sheet)

Page 9: Data Analysis: Data Analysis: Review and Practical Application using SPSS

SPSS Data SetSPSS Data Set

2 Views : Variable and Data.Raw Variable (labels and values)Transformed Variable (compute and

recode)

Page 10: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Preliminary Data Analysis: Preliminary Data Analysis: Basic Descriptive Statistics Basic Descriptive Statistics

Preliminary data analysis examines the central tendency and the dispersion of the data on each variable in the data set

Measurement level dictates what to doFeeling for the data

What can we do: limitations on next slide? Run descriptives. (outputs 1)

Page 11: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Measures of Central Tendency and Measures of Central Tendency and Dispersion for Different Types of Dispersion for Different Types of

VariablesVariables

Page 12: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Why Averages May be Why Averages May be MisleadingMisleading

Researchers tested a new sauce product and found– Mean rating of the taste test was close to the

middle of the scale, which had "very mild" and "very hot" as its bipolar adjectives

Researcher’s conclusion – Consumers need really neither really hot nor

really mild sauce

Page 13: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Why Averages May be Why Averages May be Misleading (Cont’d)Misleading (Cont’d)

Deeper examination revealed – The existence of a large proportion of

consumers who wanted the sauce to be mild and an equally large proportion who wanted it to be hot nor really mild sauce

Moral of the story:– A clear understanding of the distribution of

responses can help a researcher avoid erroneous inferences. Talk about Skewness and Kurtosis.

Page 14: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Crosstabs: Occurencies in Crosstabs: Occurencies in specific condition.specific condition.

Most of the time with categorical variables

Examples to run

Page 15: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Cross-TabulationsCross-Tabulations- Comparing - Comparing frequenciesfrequencies: Chi-square : Chi-square

Contingency TestContingency Test Technique used for determining whether there is a

statistically significant relationship between two categorical (nominal or ordinal) variables

Page 16: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Cross-Tabulation Using SPSS for Cross-Tabulation Using SPSS for National Insurance CompanyNational Insurance Company

One crucial issue in the customer survey of National Insurance Company was how a customer's education was associated with whether or not she or he would recommend National to a friend.

Page 17: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Need to Conduct Chi-square Test Need to Conduct Chi-square Test to Reach a Conclusionto Reach a Conclusion

The hypotheses are:

– H0:There is no association between educational level and willingness to recommend National to a friend (the two variables are independent of each other).

– Ha:There is some association between educational level and willingness to recommend National to a friend (the two variables are not independent of each other).

– Let’s do it….

Page 18: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Conducting the TestConducting the Test

Test involves comparing the actual, or observed, cell frequencies in the cross-tabulation with a corresponding set of expected cell frequencies(Eij)

Page 19: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Expected ValuesExpected Values

ninj

Eij = -----n

where ni and nj are the marginal frequencies, that is, the total number of sample units in category i of the row variable and category j of the column variable, respectively

Page 20: Data Analysis: Data Analysis: Review and Practical Application using SPSS

where r and c are the number of rows and columns, respectively, in the contingency table. The number of degrees of freedom associated with this chi‑square statistic are given by the product (r - 1)(c - 1).

r c (Oij - Eij)2

2 = -----------------

i=1 j=1 Eij

Chi-square Test StatisticChi-square Test Statistic

Page 21: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Computed Chi-square value

P-value

National Insurance Company National Insurance Company StudyStudy

Page 22: Data Analysis: Data Analysis: Review and Practical Application using SPSS

National Insurance Company National Insurance Company Study --P-Value SignificanceStudy --P-Value Significance

The actual significance level (p-value) = 0.019 the chances of getting a chi-square value as high

as 10.007 when there is no relationship between education and recommendation are less than 19 in 1000.

The apparent relationship between education and recommendation revealed by the sample data is unlikely to have occurred because of chance.

We can safely reject null hypothesis.

Page 23: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Precautions in Interpreting Cross Precautions in Interpreting Cross Tabulation ResultsTabulation Results

Two-way tables cannot show conclusive evidence of a causal relationship

Watch out for small cell sizes

Increases the risk of drawing erroneous inferences when more than two variables are involved

Page 24: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Overview of Techniques for Overview of Techniques for Examining AssociationsExamining Associations

Spearman Correlation Coefficient Technique The technique is appropriate when

– The degree of association between two sets of ranks (pertaining to two variables) is to be examined

Illustrative Research Question(s) This Technique Can Answer:– Is there a significant relationship between motivation levels of

salespeople and the quality of their performance? Assume that the data on motivation and quality of

performance are in the form of ranks, say, 1through 20, for 20 salespeople who were evaluated subjectively by their supervisor on each variable

Page 25: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Overview of Techniques for Overview of Techniques for Examining AssociationsExamining Associations

(Cont’d)(Cont’d) Pearson Correlation Coefficient Technique This technique is appropriate when

– The degree of association between two metric-scaled (interval or ratio) variables is to be examined

Illustrative Research Question(s) This Technique Can Answer:– Is there a significant relationship between customers'

age (measured in actual years) and their perceptions of our company's image (measured on a scale of 1to 7)?

Page 26: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Spearman Correlation Spearman Correlation CoefficientCoefficient

A Spearman correlation coefficient is a measure of association between two sets of ranks

di = the difference between the ith sample unit's ranks on the

two variables  n = the total sample size

n 6 d2

i i =1 rs = 1 - ---------------------------- n(n2 - 1)

Page 27: Data Analysis: Data Analysis: Review and Practical Application using SPSS

The Pearson correlation coefficient is the degree of association between variables that are interval-or ratio-scaled.

Pearson correlation coefficient (rxy) between them is given by

n = sample size (total number of data points)

X and Y = means

Xi and Yi = values for any sample unit i

sx and sy = standard deviations

n

i = 1 (Xi – X)(Yi – Y)

rxy = -----------------------------(n-1) sx sy

Pearson Correlation Pearson Correlation CoefficientCoefficient

Page 28: Data Analysis: Data Analysis: Review and Practical Application using SPSS

National Insurance Company– Computing National Insurance Company– Computing Pearson Correlation Pearson Correlation AAmong Service Quality mong Service Quality

ConstructsConstructs

National Insurance Company was interested in the correlations between respondents’ overall service-quality perceptions (on the 10-point scale) and their average ratings along each of the five dimensions of Service Quality

Page 29: Data Analysis: Data Analysis: Review and Practical Application using SPSS

National Insurance Company–National Insurance Company– C Computing omputing Pearson Correlation Pearson Correlation AAmong Service Quality mong Service Quality

Constructs Using SPSSConstructs Using SPSS

Page 30: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Interpreting Pearson Interpreting Pearson Correlation CoefficientsCorrelation Coefficients

Each of the five service-quality measures (reliability, empathy, tangibles, responsiveness, and assurance) is significantly related to the overall quality (OQ) at the .001 level of significance

Responsiveness has the strongest correlation (.8625)

Tangibles have the weakest correlation (.5038) All the correlations are strong enough to be

meaningful

Page 31: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Comparing MeansComparing Means

Mainly T-tests and ANOVAs

T-test on OQ and gender.

Page 32: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Independent T-testsIndependent T-tests

Independent Variable with 2 categories max.

Equality of variance (cf output)

88% of chance that the difference of .04 is due to chance (random effect). Cannot reject the null hypothesis.

Page 33: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Analysis of VarianceAnalysis of Variance

ANOVA is appropriate in situations where the independent variable is set at certain specific levels (called treatments in an ANOVA context) and metric measurements of the dependent variable are obtained at each of those levels

Page 34: Data Analysis: Data Analysis: Review and Practical Application using SPSS

ExampleExample24 Stores Chosen randomly for the study

8 Stores randomly chosen for each treatment

Treatment 1

Store brand sold at the regular price

Treatment 2Store brand sold at 50¢ off the regular price

Treatment 3

Store brand sold at 75¢ off the regular

price

monitor sales of the store brand for a week in each store

Page 35: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Table 15.2 Table 15.2 Unit Sales Data Under Unit Sales Data Under ThreeThree Pricing TreatmentsPricing Treatments

Treatment Regular Price 50 ¢ off 75 ¢ off

Unit Sale ineach store

37 46 46

38 43 49

40 43 48

40 45 48

38 45 47

38 43 48

40 44 49

39 44 49

Number ofstores

8 8 8

Mean sales 38.75 44.13 48.00

Page 36: Data Analysis: Data Analysis: Review and Practical Application using SPSS

ANOVA ANOVA ––Grocery Store Grocery Store HypothesisHypothesis

Grocery Store Example– Ho 1 = 2 = 3

– Ha At least one is different from one or more of

the others

Hypotheses for K Treatment groups or samples – Ho 1 = 2 = ………..k

– Ha At least one is different from one or more of

the others

Page 37: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Exhibit 15.1 Exhibit 15.1 SPSS Computer SPSS Computer Output forOutput for ANOVA AnalysisANOVA Analysis

Between-Subjects Factors

Regularprice

8

50 cents off 8

75 cents off 8

1

2

3

Treatmentgroup

Value Label N

Page 38: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Exhibit 15.1 Exhibit 15.1 SPSS Computer SPSS Computer Output forOutput for ANOVA AnalysisANOVA Analysis

(Cont’d)(Cont’d)Tests of Between-Subjects Effects

Dependent Variable: SALES

345.250a 2 172.625 137.445 .000

45675.375 1 45675.375 36367.123 .000

345.250 2 172.625 137.445 .000

26.375 21 1.256

46047.000 24

371.625 23

SourceCorrected Model

Intercept

TREAT

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .929 (Adjusted R Squared = .922)a.

There is less than a .001 probability of obtaining an F-value as high as 137.447

Page 39: Data Analysis: Data Analysis: Review and Practical Application using SPSS

ANOVAANOVA

OQ recommendation and OQ, individual variable

OQ and EDUC (Graph)..and post hoc

Page 40: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Overview of Techniques for Overview of Techniques for Examining AssociationsExamining Associations

(Cont’d)(Cont’d)Simple Regression Analysis TechniqueThis technique is appropriate when

– A mathematical function or equation linking two metric-scaled (interval or ratio) variables is to be constructed, under the assumption that values of one of the two variables is dependent on the values of the other

Page 41: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Overview of Techniques for Overview of Techniques for Examining AssociationsExamining Associations––Simple Simple

RRegression egression AAnalysis nalysis ((Cont’dCont’d))Illustrative Research Question(s) this

Technique Can Answer:– Are sales (measured in dollars) significantly

affected by advertising expenditures (measured in dollars)?

– What proportion of the variation in sales is accounted for by variation in advertising expenditures? How sensitive are sales to changes in advertising expenditures?

Page 42: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Overview of Techniques for Overview of Techniques for Examining AssociationsExamining Associations (Cont’d) (Cont’d)

Multiple Regression Analysis TechniqueThis technique is appropriate when

– Under the same conditions as simple regression analysis except that more than two variables are involved wherein one variable is assumed to be dependent on the others

Page 43: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Overview of Techniques for Overview of Techniques for Examining AssociationsExamining Associations (Cont’d) (Cont’d)

Illustrative Research Question(s) this Technique Can Answer:– Are sales significantly affected by advertising

expenditures and price (where all three variables are measured in dollars)?

– What proportion of the variation in sales is accounted for by advertising and price? How sensitive are sales to changes in advertising and price?

Page 44: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Simple Regression AnalysisSimple Regression Analysis

Generates a mathematical relationship (called the regression equation) between one variable designated as the dependent variable (Y) and another designated as the independent variable (X)

Page 45: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Independent VariableIndependent Variable Vs. Vs.Dependent Variable Dependent Variable

Independent variable– Explanatory or predictor variable– Often presumed to be a cause of the other

Dependent variable – Criterion Variable– Influenced by the independent variable

Page 46: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Practical Applications of Practical Applications of Regression EquationsRegression Equations

The regression coefficient, or slope, can indicate how sensitive the dependent variable is to changes in the independent variable

The regression equation is a forecasting tool for predicting the value of the dependent variable for a given value of the independent variable

Page 47: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Precautions In Using Precautions In Using Regression AnalysisRegression Analysis

Only capable of capturing linear associations between dependent and independent variables

A significant R2-value does not necessarily imply a cause-and-effect association between the independent and dependent variables

A regression equation may not yield a trustworthy prediction of the dependent variable when the value of the independent variable at which the prediction is desired is outside the range of values used in constructing the equation

Page 48: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Precautions In Using Precautions In Using Regression AnalysisRegression Analysis (Cont’d) (Cont’d)

A regression equation based on relatively few data points cannot be trusted

The ranges of data on the dependent and independent variables can affect the meaningfulness of a regression equation

Page 49: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Multiple Regression AnalysisMultiple Regression Analysis

Yi = a + b1X1i + b2X2i + … + bkXki

Yi is the predicted value of the dependent variable for some unit i;

X1i, X2i, …, Xki are values on the independent variables for unit i;

bl, b2, . . . , bk are the regression coefficients;

a is the Y-intercept representing the prediction for Y when all independent variables are set to zero

Page 50: Data Analysis: Data Analysis: Review and Practical Application using SPSS

National Insurance Company– National Insurance Company– Multiple Regression Using Multiple Regression Using

SPSSSPSSJill and Tom were interested in conducting

a multiple regression analysis wherein overall service quality perceptions is the dependent variable and the average ratings along the five dimensions are the indpendent variable

Page 51: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Factor AnalysisFactor Analysis

A data and variable reduction technique that attempts to partition a given set of variables into groups of maximally correlated variables

Page 52: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Factor Analysis Output and Its Factor Analysis Output and Its InterpretationInterpretation

Primary output of factor analysis is a factor-loading matrix

Page 53: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Table 15.4 Table 15.4 Factor-Loading Matrix Based on Data from Factor-Loading Matrix Based on Data from Study of Star CustomersStudy of Star Customers

Factor Loadings Factors F1 F2

AchievedCommunalities

X4: My friends are veryimpressed with the Star VCR

0.96 0.06 .926

X6: No other brand of VCReven comes close to matchingthe Star

0.92 0.17 .875

X1: I did not mind paying thehigh Price for my Star VCR

0.89 0.15 .815

X3: I hardly ever worry aboutanything going wrong with myStar VCR

0.18 0.94 .916

X5: The Star VCR has thelatest technology built into it

0.09 0.88 .782

X2: I am pleased with thevariety of things that a StarVCR can do

0.16 0.86 .766

VCREigenvalues: Standardizedvariance explained by eachfactor

2.626 2.454

Proportion of the total varianceexplained by each factor

0.438 0.409

3 Variables load high on factor 1

3 Variables load high on factor 2

Page 54: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Reducing Star DataReducing Star Data

X1, X4, and X6 can be combined into one factor

X2, X3, and X5 can be into a second factor

6 variables can be reduced to two factors

Page 55: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Potential Applications of Potential Applications of Factor AnalysisFactor Analysis

Used to – Develop concise but comprehensive, multiple-

item scales for measuring various marketing constructs

– Illuminate the nature of distinct dimensions underlying an existing data set

– Convert a large volume of data into a set of factor scores on a limited number of uncorrelated factors

Page 56: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Cluster AnalysisCluster Analysis

Segment objects into groups so that members within each group are similar to one another in a variety of ways

Useful for segmenting customers, market areas, and products

Page 57: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Use of Cluster AnalysisUse of Cluster Analysis Firm offering recreational services wanted to enter a

new region of the country They gathered data on more than 100 characteristics

including– Demographics– Expenditures on recreation– Leisure time activities– Interests of household members

The firm identified one or several household segments that are likely to be most responsive to its advertising and to its services

Page 58: Data Analysis: Data Analysis: Review and Practical Application using SPSS

How Does Cluster Analysis How Does Cluster Analysis Work? Work?

Cluster analysis measures the similarity between objects on the basis of their values on the various characteristics

Page 59: Data Analysis: Data Analysis: Review and Practical Application using SPSS

Exhibit 15.8 Exhibit 15.8 Clusters Formed Clusters Formed by Using Data on Two by Using Data on Two

CharacteristicsCharacteristicsHigh

High

Low

Low Extent of participation in outdoor sporting events

Ext

e nt

o f w

atc h

ing

outd

oor

s por

t ing

ev e

nts

on T

V