22
DATA ANALYSIS Dr.Ajay Pandit September 14, 2010

Data analysis

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Data analysis

DATA ANALYSIS

Dr.Ajay Pandit

September 14, 2010

Page 2: Data analysis

Data Analysis

Data Analysis involves three stages:

1. Testing association between variables

2. Determining the degree of association between the variables

3. Estimating the values of the variables

Page 3: Data analysis

Identifying the technique

Technique shall largely depend upon the scales of measurement of variables i.e. nominal, ordinal, interval or ratio.

Page 4: Data analysis

Bi-variate Analysis

Independent

Nom Int/Ratio

Nom Chi-Sq. χ2 Discriminant

Dependant

Int/Ratio ANOVA Reg/Co-Rel

Page 5: Data analysis

CHI-SQUARE (χ2 )

The technique uses data arranged in a contingency table to determine whether two classifications of a population of nominal data are statistically independent.

This test can also be interpreted as a comparison of two or more populations.

Page 6: Data analysis

Example

The demand for an MBA program’s optional courses and majors is quite variable year over year.

The research hypothesis is that the academic background of the students (i.e. their undergrad degrees) affects their choice of major.

A random sample of data on last year’s MBA students was collected and summarized in a contingency table…

Page 7: Data analysis

Example The Data

MBA Major

UG Degree

Acntng Finance Mktg Total

BA 31 13 16 60

BEng 8 16 7 31

BBA 12 10 17 39

Other 10 5 7 22

Total 61 44 47 152

Page 8: Data analysis

Example

We are interested in determining whether or not the academic background of the students affects their choice of MBA major. Thus our research hypothesis is:

H1: The two variables are dependent

Our null hypothesis then, is:

H0: The two variables are independent.

Page 9: Data analysis

ExampleIn this case, our test statistic is:

(where k is the number of cells in the contingency table, i.e. rows x columns)

Our rejection region is:

where the number of degrees of freedom is (r–1)(c–1)

Page 10: Data analysis

Example

In order to calculate our χ2 test statistic, we need to calculate the expected frequencies for each cell…

The expected frequency of the cell in row i and column j is:

COMPUTE

Row i total x Column j total eij = Sample size

Page 11: Data analysis

Contingency Table Set-up…

Page 12: Data analysis

Example COMPUTE

MBA Major

Undergrad

Degree

Accounting

FinanceMarketin

gTotal

BA 31 13 16 60

BEng 8 1631 x 47

15231

BBA 12 10 17 39

Other 10 5 7 22

Total 61 44 47 152

e23 = (31)(47)/152 = 9.59 — compare this to f23 = 7

Compute expected frequencies… Row i total x Column j total eij = Sample size

Page 13: Data analysis

Example We can now compare observed with expected frequencies…

and calculate our test statistic:

MBA Major

Undergrad

Degree

Accounting

Finance Marketing

BA 31 24.08 13 17.37 16 18.55

BEng 8 12.44 16 8.97 7 9.59

BBA 12 15.65 10 11.29 17 12.06

Other 10 8.83 5 6.37 7 6.80

Page 14: Data analysis

Example

We compare χ2 = 14.70 with:

Since our test statistic falls into the rejection region, we reject

H0: The two variables are independent.

in favor of

H1: The two variables are dependent.

That is, there is evidence of a relationship between undergrad degree and MBA major.

INTERPRET

χ2 = χ2 = χ2 = 12.5916

α ,ν .05, (4-1)(3-1) .05,6

Page 15: Data analysis

Required Condition – Rule of Five…

In a contingency table where one or more cells have expected values of less than 5, we need to combine rows or columns to satisfy the rule of five.

Note: by doing this, the degrees of freedom must be changed as well.

Page 16: Data analysis

Type ofMeasurement

Differences between three or more

independent groups

Interval or ratio One-wayANOVA

ANOVA

Page 17: Data analysis

SAMPLE RESULTS OF PACKAGE SALES

SAMPLEOBSERVATION

PACKAGE1

PACKAGE2

PACKAGE3

PACKAGE4

1 3 7 3 52 3 5 2 33 4 2 4 54 5 9 6 25 3 7 2 36 3 6 3 37 4 6 1 2

TOTALS 25 42 21 23MEANS 3.6 6.0 3.0 3.3

Page 18: Data analysis

ONE WAY ANOVA

1)-k(ndf

SSW

1-kdf

SSB

1-nkdf

SST

2

1 1

1

2

2

1 1

k

i

n

jiij

k

iGMi

k

i

n

jGMij

XX

XXn

XX

Page 19: Data analysis

57

40-97

SSB-SSTSSW

40

)43.3434646.37SSB

97

42.....4543SST

2222

222

ANOVA SUMMARY TABLESOURCE OFVARIATION

SUM OFSQUARES Df MS F

BETWEEN GROUPS 40 3 13.33 5.52

WITHIN GROUPS 57 24 2.38

TOTAL 97 27

Page 20: Data analysis

PACKAGE SALESDESCRIPTIVE STATISTICS

Descriptives

SALES

7 3.5714 .7868 .2974 2.8438 4.2991 3.00 5.00

7 6.0000 2.1602 .8165 4.0021 7.9979 2.00 9.00

7 3.0000 1.6330 .6172 1.4897 4.5103 1.00 6.00

7 3.2857 1.2536 .4738 2.1264 4.4451 2.00 5.00

28 3.9643 1.8951 .3581 3.2295 4.6991 1.00 9.00

1.00

2.00

3.00

4.00

Total

N MeanStd.

Deviation Std. ErrorLowerBound

UpperBound

95% ConfidenceInterval for Mean

Minimum Maximum

Page 21: Data analysis

PACKAGE SALESANOVA SUMMARY TABLE

ANOVA

SALES

39.821 3 13.274 5.575 .005

57.143 24 2.381

96.964 27

Between Groups

Within Groups

Total

Sum ofSquares df

MeanSquare F Sig.

Page 22: Data analysis

THANKS