Data analysis

DATA ANALYSIS

Dr.Ajay Pandit

September 14, 2010

Data Analysis

Data Analysis involves three stages:

1. Testing association between variables

2. Determining the degree of association between the variables

3. Estimating the values of the variables

Identifying the technique

Technique shall largely depend upon the scales of measurement of variables i.e. nominal, ordinal, interval or ratio.

Bi-variate Analysis

Independent

Nom Int/Ratio

Nom Chi-Sq. χ2 Discriminant

Dependant

Int/Ratio ANOVA Reg/Co-Rel

CHI-SQUARE (χ2 )

The technique uses data arranged in a contingency table to determine whether two classifications of a population of nominal data are statistically independent.

This test can also be interpreted as a comparison of two or more populations.

Example

The demand for an MBA program’s optional courses and majors is quite variable year over year.

The research hypothesis is that the academic background of the students (i.e. their undergrad degrees) affects their choice of major.

A random sample of data on last year’s MBA students was collected and summarized in a contingency table…

Example The Data

MBA Major

UG Degree

Acntng Finance Mktg Total

BA 31 13 16 60

BEng 8 16 7 31

BBA 12 10 17 39

Other 10 5 7 22

Total 61 44 47 152

Example

We are interested in determining whether or not the academic background of the students affects their choice of MBA major. Thus our research hypothesis is:

H1: The two variables are dependent

Our null hypothesis then, is:

H0: The two variables are independent.

ExampleIn this case, our test statistic is:

(where k is the number of cells in the contingency table, i.e. rows x columns)

Our rejection region is:

where the number of degrees of freedom is (r–1)(c–1)

Example

In order to calculate our χ2 test statistic, we need to calculate the expected frequencies for each cell…

The expected frequency of the cell in row i and column j is:

COMPUTE

Row i total x Column j total eij = Sample size

Contingency Table Set-up…

Example COMPUTE

MBA Major

Undergrad

Degree

Accounting

FinanceMarketin

gTotal

BA 31 13 16 60

BEng 8 1631 x 47

15231

BBA 12 10 17 39

Other 10 5 7 22

Total 61 44 47 152

e23 = (31)(47)/152 = 9.59 — compare this to f23 = 7

Compute expected frequencies… Row i total x Column j total eij = Sample size

Example We can now compare observed with expected frequencies…

and calculate our test statistic:

MBA Major

Undergrad

Degree

Accounting

Finance Marketing

BA 31 24.08 13 17.37 16 18.55

BEng 8 12.44 16 8.97 7 9.59

BBA 12 15.65 10 11.29 17 12.06

Other 10 8.83 5 6.37 7 6.80

Example

We compare χ2 = 14.70 with:

Since our test statistic falls into the rejection region, we reject

H0: The two variables are independent.

in favor of

H1: The two variables are dependent.

That is, there is evidence of a relationship between undergrad degree and MBA major.

INTERPRET

χ2 = χ2 = χ2 = 12.5916

α ,ν .05, (4-1)(3-1) .05,6

Required Condition – Rule of Five…

In a contingency table where one or more cells have expected values of less than 5, we need to combine rows or columns to satisfy the rule of five.

Note: by doing this, the degrees of freedom must be changed as well.

Type ofMeasurement

Differences between three or more

independent groups

Interval or ratio One-wayANOVA

ANOVA

SAMPLE RESULTS OF PACKAGE SALES

SAMPLEOBSERVATION

PACKAGE1

PACKAGE2

PACKAGE3

PACKAGE4

1 3 7 3 52 3 5 2 33 4 2 4 54 5 9 6 25 3 7 2 36 3 6 3 37 4 6 1 2

TOTALS 25 42 21 23MEANS 3.6 6.0 3.0 3.3

ONE WAY ANOVA

1)-k(ndf

SSW

1-kdf

SSB

1-nkdf

SST

2

1 1

1

2

2

1 1

k

i

n

jiij

k

iGMi

k

i

n

jGMij

XX

XXn

XX

57

40-97

SSB-SSTSSW

40

)43.3434646.37SSB

97

42.....4543SST

2222

222

ANOVA SUMMARY TABLESOURCE OFVARIATION

SUM OFSQUARES Df MS F

BETWEEN GROUPS 40 3 13.33 5.52

WITHIN GROUPS 57 24 2.38

TOTAL 97 27

PACKAGE SALESDESCRIPTIVE STATISTICS

Descriptives

SALES

7 3.5714 .7868 .2974 2.8438 4.2991 3.00 5.00

7 6.0000 2.1602 .8165 4.0021 7.9979 2.00 9.00

7 3.0000 1.6330 .6172 1.4897 4.5103 1.00 6.00

7 3.2857 1.2536 .4738 2.1264 4.4451 2.00 5.00

28 3.9643 1.8951 .3581 3.2295 4.6991 1.00 9.00

1.00

2.00

3.00

4.00

Total

N MeanStd.

Deviation Std. ErrorLowerBound

UpperBound

95% ConfidenceInterval for Mean

Minimum Maximum

PACKAGE SALESANOVA SUMMARY TABLE

ANOVA

SALES

39.821 3 13.274 5.575 .005

57.143 24 2.381

96.964 27

Between Groups

Within Groups

Total

Sum ofSquares df

MeanSquare F Sig.

THANKS

Business

Data analysis