Upload
kinshook-chaturvedi
View
465
Download
3
Embed Size (px)
DESCRIPTION
Citation preview
DATA ANALYSIS
Dr.Ajay Pandit
September 14, 2010
Data Analysis
Data Analysis involves three stages:
1. Testing association between variables
2. Determining the degree of association between the variables
3. Estimating the values of the variables
Identifying the technique
Technique shall largely depend upon the scales of measurement of variables i.e. nominal, ordinal, interval or ratio.
Bi-variate Analysis
Independent
Nom Int/Ratio
Nom Chi-Sq. χ2 Discriminant
Dependant
Int/Ratio ANOVA Reg/Co-Rel
CHI-SQUARE (χ2 )
The technique uses data arranged in a contingency table to determine whether two classifications of a population of nominal data are statistically independent.
This test can also be interpreted as a comparison of two or more populations.
Example
The demand for an MBA program’s optional courses and majors is quite variable year over year.
The research hypothesis is that the academic background of the students (i.e. their undergrad degrees) affects their choice of major.
A random sample of data on last year’s MBA students was collected and summarized in a contingency table…
Example The Data
MBA Major
UG Degree
Acntng Finance Mktg Total
BA 31 13 16 60
BEng 8 16 7 31
BBA 12 10 17 39
Other 10 5 7 22
Total 61 44 47 152
Example
We are interested in determining whether or not the academic background of the students affects their choice of MBA major. Thus our research hypothesis is:
H1: The two variables are dependent
Our null hypothesis then, is:
H0: The two variables are independent.
ExampleIn this case, our test statistic is:
(where k is the number of cells in the contingency table, i.e. rows x columns)
Our rejection region is:
where the number of degrees of freedom is (r–1)(c–1)
Example
In order to calculate our χ2 test statistic, we need to calculate the expected frequencies for each cell…
The expected frequency of the cell in row i and column j is:
COMPUTE
Row i total x Column j total eij = Sample size
Contingency Table Set-up…
Example COMPUTE
MBA Major
Undergrad
Degree
Accounting
FinanceMarketin
gTotal
BA 31 13 16 60
BEng 8 1631 x 47
15231
BBA 12 10 17 39
Other 10 5 7 22
Total 61 44 47 152
e23 = (31)(47)/152 = 9.59 — compare this to f23 = 7
Compute expected frequencies… Row i total x Column j total eij = Sample size
Example We can now compare observed with expected frequencies…
and calculate our test statistic:
MBA Major
Undergrad
Degree
Accounting
Finance Marketing
BA 31 24.08 13 17.37 16 18.55
BEng 8 12.44 16 8.97 7 9.59
BBA 12 15.65 10 11.29 17 12.06
Other 10 8.83 5 6.37 7 6.80
Example
We compare χ2 = 14.70 with:
Since our test statistic falls into the rejection region, we reject
H0: The two variables are independent.
in favor of
H1: The two variables are dependent.
That is, there is evidence of a relationship between undergrad degree and MBA major.
INTERPRET
χ2 = χ2 = χ2 = 12.5916
α ,ν .05, (4-1)(3-1) .05,6
Required Condition – Rule of Five…
In a contingency table where one or more cells have expected values of less than 5, we need to combine rows or columns to satisfy the rule of five.
Note: by doing this, the degrees of freedom must be changed as well.
Type ofMeasurement
Differences between three or more
independent groups
Interval or ratio One-wayANOVA
ANOVA
SAMPLE RESULTS OF PACKAGE SALES
SAMPLEOBSERVATION
PACKAGE1
PACKAGE2
PACKAGE3
PACKAGE4
1 3 7 3 52 3 5 2 33 4 2 4 54 5 9 6 25 3 7 2 36 3 6 3 37 4 6 1 2
TOTALS 25 42 21 23MEANS 3.6 6.0 3.0 3.3
ONE WAY ANOVA
1)-k(ndf
SSW
1-kdf
SSB
1-nkdf
SST
2
1 1
1
2
2
1 1
k
i
n
jiij
k
iGMi
k
i
n
jGMij
XX
XXn
XX
57
40-97
SSB-SSTSSW
40
)43.3434646.37SSB
97
42.....4543SST
2222
222
ANOVA SUMMARY TABLESOURCE OFVARIATION
SUM OFSQUARES Df MS F
BETWEEN GROUPS 40 3 13.33 5.52
WITHIN GROUPS 57 24 2.38
TOTAL 97 27
PACKAGE SALESDESCRIPTIVE STATISTICS
Descriptives
SALES
7 3.5714 .7868 .2974 2.8438 4.2991 3.00 5.00
7 6.0000 2.1602 .8165 4.0021 7.9979 2.00 9.00
7 3.0000 1.6330 .6172 1.4897 4.5103 1.00 6.00
7 3.2857 1.2536 .4738 2.1264 4.4451 2.00 5.00
28 3.9643 1.8951 .3581 3.2295 4.6991 1.00 9.00
1.00
2.00
3.00
4.00
Total
N MeanStd.
Deviation Std. ErrorLowerBound
UpperBound
95% ConfidenceInterval for Mean
Minimum Maximum
PACKAGE SALESANOVA SUMMARY TABLE
ANOVA
SALES
39.821 3 13.274 5.575 .005
57.143 24 2.381
96.964 27
Between Groups
Within Groups
Total
Sum ofSquares df
MeanSquare F Sig.
THANKS