77
Introduction to Data Analysis •Data Measurement Measurement of the data is the first step in the process that ultimately guides the final analysis. Consideration of sampling, controls, errors (random and systematic) and the required precision all influence the final analysis. Validation: Instruments and methods used to measure the data must be validated for accuracy. Precision and accuracy…Determination of error Social vs. Physical Sciences 1

Univariate, bivariate analysis, hypothesis testing, chi square

  • Upload
    kongara

  • View
    65

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Univariate, bivariate analysis, hypothesis testing, chi square

Introduction to Data Analysis

•Data Measurement•Measurement of the data is the first step in the process that ultimately guides the final analysis.

•Consideration of sampling, controls, errors (random and systematic) and the required precision all influence the final analysis.

•Validation: Instruments and methods used to measure the data must be validated for accuracy.

•Precision and accuracy…Determination of error•Social vs. Physical Sciences

1

Page 2: Univariate, bivariate analysis, hypothesis testing, chi square

Introduction to Data Analysis

•Types of data•Univariate/Multivariate

•Univariate: When we use one variable to describe a person, place, or thing. (e.g. Heights of individuals)•Multivariate: When we use two or more variables to measure a person, place or thing. Variables may or may not be dependent on each other.

(Bivariate e.g. name and marks, Multivariate: name, caste and marks)•Cross-sectional data/Time-ordered data (business, social sciences)

•Cross-Sectional: Measurements taken at one time period•E.g. (caste and per capita incomes)•Time-Ordered: Measurements taken over time in chronological sequence. e.g. years and per capita income

The type of data will dictate (in part) the appropriate data-analysis method.

2

Page 3: Univariate, bivariate analysis, hypothesis testing, chi square

•Measurement Scales•Nominal or Categorical Scale (e.g. fair, brown and black)

•Classification of people, places, or things into categories (e.g. age ranges, colors, etc.).•Classifications must be mutually exclusive (every element should belong to one category with no ambiguity).•Weakest of the four scales. No category is greater than or less (better or worse) than the others. They are just different.

•Ordinal or Ranking Scale•Classification of people, places, or things into a ranking such that the data is arranged into a meaningful order (e.g. poor, fair, good, excellent).•Qualitative classification only

Introduction to Data Analysis

3

Page 4: Univariate, bivariate analysis, hypothesis testing, chi square

Introduction to Data Analysis

•Measurement Scales (business, social sciences)•Interval Scale

•Data classified by ranking.•Quantitative classification (time, temperature, etc).•Zero point of scale is arbitrary (differences are meaningful).

•Ratio Scale •Data classified as the ratio of two numbers.•Quantitative classification (height, weight, distance, etc).•Zero point of scale is real

•(data can be added, subtracted, multiplied, and divided).

4

Page 5: Univariate, bivariate analysis, hypothesis testing, chi square

Univariate Analysis/Descriptive Statistics

• Descriptive Statistics– The Range– Min/Max– Average– Median– Mode– Variance– Standard Deviation– Histograms and Normal Distributions

5

Page 6: Univariate, bivariate analysis, hypothesis testing, chi square

Univariate Analysis/Histograms

• Distributions– Descriptive statistics are easier to interpret when

graphically illustrated.– However, charting each data element can lead to very

busy and confusing charts that do not help interpret the data.

– Grouping the data elements into categories and charting the frequency within these categories yields a graphical illustration of how the data is distributed throughout its range.

6

Page 7: Univariate, bivariate analysis, hypothesis testing, chi square

Univariate Analysis/Histograms

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

X-axis labels

Da

ta V

alu

es

With just a few columns this chart is difficult to interpret. It tells you very little about the data set. Even finding the Min and Max can be difficult.

The data can be presented such that more statistical parameters can be estimated from the chart (average, standard deviation).

7

Page 8: Univariate, bivariate analysis, hypothesis testing, chi square

Univariate Analysis/Histograms

• Frequency Table– The first step is to decide on the categories and group

the data appropriately.

(45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100)

Category Labels Frequency

0-50 3

51-60 2

61-70 6

71-80 5

81-90 3

>90 1

8

Page 9: Univariate, bivariate analysis, hypothesis testing, chi square

Univariate Analysis/Histograms

• Histogram– A histogram is simply a column chart of the frequency

table.

Category Labels Frequency

0-50 3

51-60 2

61-70 6

71-80 5

81-90 3

>90 10

1

2

3

4

5

6

7

0-50 51-60 61-70 71-80 81-90 >90

Scores

Fre

qu

en

cy

9

Page 10: Univariate, bivariate analysis, hypothesis testing, chi square

Univariate Analysis/Histograms

• Histogram

0

1

2

3

4

5

6

7

0-50 51-60 61-70 71-80 81-90 >90

Scores

Fre

qu

en

cy

Average (68.6) and Median (68)

Mode (74)

-1SD

+1SD

10

Page 11: Univariate, bivariate analysis, hypothesis testing, chi square

0

0.02

0.04

0.06

0.08

0.1

0.12

25 45 65 85 105 125 145 165

Univariate Analysis/Normal Distributions

• Distributions that can be described mathematically as Gaussian are also called Normal

• The Bell curve– Symmetrical

– Mean ≈ Median

Mean, Median, Mode

11

Page 12: Univariate, bivariate analysis, hypothesis testing, chi square

Univariate Analysis/Skewed Distributions

• When data are skewed, the mean and SD can be misleading

• Skewnesssk= 3(mean-median)/SDIf sk>|1| then distribution is

non-symetrical• Negatively skewed

– Mean<Median– Sk is negative

• Positively Skewed– Mean>Median– Sk is positive

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 20 40 60 80 100 120 140 160

0

0.02

0.04

0.06

0.08

0.1

0.12

25 45 65 85 105 125 145 165 185 205 225

12

Page 13: Univariate, bivariate analysis, hypothesis testing, chi square

Central Limit Theorem

• Regardless of the shape of a distribution, the distribution of the sample mean based on samples of size N approaches a normal curve as N increases.– N must be less than the entire sample

N=10

13

Page 14: Univariate, bivariate analysis, hypothesis testing, chi square

Univariate Analysis/Descriptive Statistics

• The Range– Difference between minimum and maximum

values in a data set– Larger range usually (but not always)

indicates a large spread or deviation in the values of the data set.

(73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100)

14

Page 15: Univariate, bivariate analysis, hypothesis testing, chi square

Univariate Analysis/Descriptive Statistics

• The Average (Mean)– Sum of all values divided by the number of values in the data set.– One measure of central location in the data set.

Average =

Average=(73+66+69+67+49+60+81+71+78+62+53+87+74+65+74+50+85+45+63+100)/20 = 68.6

Excel function: AVERAGE()

N

i

imN 1

1

15

Page 16: Univariate, bivariate analysis, hypothesis testing, chi square

Univariate Analysis/Descriptive Statistics

0 2.5 7.5 10

4.8

0 2.5 7.5 10

4.8

The data may or may not be symmetrical around its average value

16

Page 17: Univariate, bivariate analysis, hypothesis testing, chi square

Univariate Analysis/Descriptive Statistics

• The Median– The middle value in a sorted data set. Half the values

are greater and half are less than the median.– Another measure of central location in the data set.(45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74,

78, 81, 85, 87, 100)Median: 68

(1, 2, 4, 7, 8, 9, 9)

– Excel function: MEDIAN()

17

Page 18: Univariate, bivariate analysis, hypothesis testing, chi square

Univariate Analysis/Descriptive Statistics

• The Median– May or may not be close to the mean.– Combination of mean and median are used to define

the skewness of a distribution.

0 2.5 7.5 10

6.25

18

Page 19: Univariate, bivariate analysis, hypothesis testing, chi square

Univariate Analysis/Descriptive Statistics

• The Mode– Most frequently occurring value.– Another measure of central location in the data set.– (45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74,

74, 78, 81, 85, 87, 100)– Mode: 74

– Generally not all that meaningful unless a larger percentage of the values are the same number.

19

Page 20: Univariate, bivariate analysis, hypothesis testing, chi square

Univariate Analysis/Descriptive Statistics

• Variance– One measure of dispersion (deviation from the mean) of a data

set. The larger the variance, the greater is the average deviation of each datum from the average value.

m

mmN

N

ii

2

1

)(1

Variance =

Average value of the data set

Variance = [(45 – 68.6)2 + (49 – 68.6)2 + (50 – 68.6)2 + (53 – 68.6)2 + …]/20 = 181

Excel Functions: VARP(), VAR()

20

Page 21: Univariate, bivariate analysis, hypothesis testing, chi square

Univariate Analysis/Descriptive Statistics

• Standard Deviation– Square root of the variance. Can be thought of as the

average deviation from the mean of a data set.– The magnitude of the number is more in line with the

values in the data set.

Standard Deviation = ([(45 – 68.6)2 + (49 – 68.6)2 + (50 – 68.6)2 + (53 – 68.6)2 + …]/20)1/2 = 13.5

Excel Functions: STDEVP(), STDEV()

21

Page 22: Univariate, bivariate analysis, hypothesis testing, chi square

Bivariate Analysis

Cross-tabulation and chi-square

22

Page 23: Univariate, bivariate analysis, hypothesis testing, chi square

So far the statistical methods we have used only permit us to:

• Look at the frequency in which certain numbers or categories occur.

• Look at measures of central tendency such as means, modes, and medians for one variable.

• Look at measures of dispersion such as standard deviation and z scores for one interval or ratio level variable.

23

Page 24: Univariate, bivariate analysis, hypothesis testing, chi square

Bivariate analysis allows us to:

• Look at associations/relationships among two variables.

• Look at measures of the strength of the relationship between two variables.

• Test hypotheses about relationships between two nominal or ordinal level variables.

24

Page 25: Univariate, bivariate analysis, hypothesis testing, chi square

For example, what does this table tell us about

opinions on welfare by gender? Support cutting welfare benefits for immigrants

Male Female

Yes 15 5

No 10 20

Total 25 25

25

Page 26: Univariate, bivariate analysis, hypothesis testing, chi square

Are frequencies sufficient to allow us to make comparisons

about groups?

What other information do we need?

26

Page 27: Univariate, bivariate analysis, hypothesis testing, chi square

Is this table more helpful?

Benefits for

Immigrants

Males Female

Yes 15 (60%) 5 (20%)

No 10 (40%) 20 (80%)

Total 25 (100%) 25 (100%)

27

Page 28: Univariate, bivariate analysis, hypothesis testing, chi square

How would you write a sentence or two to describe what is in this

table?

28

Page 29: Univariate, bivariate analysis, hypothesis testing, chi square

Rules for cross-tabulation

• Calculate either column or row percents.

• Calculations are the number of frequencies in a cell of a table divided by the total number of frequencies in that column or row, for example 20/25 = 80.0%

• All percentages in a column or row should total 100%.

29

Page 30: Univariate, bivariate analysis, hypothesis testing, chi square

Let’s look at another example – social work degrees by gender

Social Work Degree

Male Female

BA 20 (33.3%) 20 ( %)

MSW 30 ( ) 70 (70.0%)

Ph.D. 10 (16.7%) 10 (10.0%)

60 (100.0%) 100 (100.0%

30

Page 31: Univariate, bivariate analysis, hypothesis testing, chi square

Questions:

What group had the largest percentage of Ph.Ds?

What are the ways in which you could find the missing numbers?

Is it obvious why you would use percentages to make comparisons among

two or more groups? 31

Page 32: Univariate, bivariate analysis, hypothesis testing, chi square

In the following table, were people with drug, alcohol, or a combination of both most likely

to be referred for individual treatment? Services Alcohol Drugs Both

Individual Treatment

10 (25%) 30 (60%) 5 (50%)

Group Treatment

10 (25%) 10 (20%) 2 (20%)

AA 20 (50%) 10 (20%) 3 (30%)

Total 40 (100%) 50 (100%) 10 (100%)

32

Page 33: Univariate, bivariate analysis, hypothesis testing, chi square

Use the same table to answer the following question:

How much more likely are people with alcohol problems

alone to be referred to AA than people with drug problems or a

combination of drug and alcohol problems?

33

Page 34: Univariate, bivariate analysis, hypothesis testing, chi square

We use cross-tabulation when:

• We want to look at relationships among two or three variables.

• We want a descriptive statistical measure to tell us whether differences among groups are large enough to indicate some sort of relationship among variables.

34

Page 35: Univariate, bivariate analysis, hypothesis testing, chi square

Cross-tabs are not sufficient to:

• Tell us the strength or actually size of the relationships among two or three variables.

• Test a hypothesis about the relationship between two or three variables.

• Tell us the direction of the relationship among two or more variables.

• Look at relationships between one nominal or ordinal variable and one ratio or interval variable unless the range of possible values for the ratio or interval variable is small. What do you think a table with a large number of ratio values would look like?

35

Page 36: Univariate, bivariate analysis, hypothesis testing, chi square

We can use cross-tabs to visually assess whether independent and

dependent variables might be related. In addition, we also use

cross-tabs to find out if demographic variables such as gender and ethnicity are related

to the second variable. 36

Page 37: Univariate, bivariate analysis, hypothesis testing, chi square

For example, gender may determine if someone votes

Democratic or Republican or if income is high, medium, or low.

Ethnicity might be related to where someone lives or attitudes

about whether undocumented workers should receive driver’s

licenses.37

Page 38: Univariate, bivariate analysis, hypothesis testing, chi square

Because we use tables in these ways, we can set up some decision rules about how to use

tables.• Independent variables should be column variables. • If you are not looking at independent and

dependent variable relationships, use the variable that can logically be said to influence the other as your column variable.

• Using this rule, always calculate column percentages rather than row percentages.

• Use the column percentages to interpret your results.

38

Page 39: Univariate, bivariate analysis, hypothesis testing, chi square

For example,

• If we were looking at the relationship between gender and income, gender would be the column variable and income would be the row variable. Logically gender can determine income. Income does not determine your gender.

• If we were looking at the relationship between ethnicity and location of a person’s home, ethnicity would be the column variable.

• However, if we were looking at the relationship between gender and ethnicity, one does not influence the other. Either variable could be the column variable.

39

Page 40: Univariate, bivariate analysis, hypothesis testing, chi square

SPSS will allow you to choose a column variable and row variable

and whether or not your table will include column or row

percents.

40

Page 41: Univariate, bivariate analysis, hypothesis testing, chi square

You must use an additional statistic, chi-square, if you want to:

• Test a hypothesis about two variables.• Look at the strength of the relationship between an

independent and dependent variable.• Determine whether the relationship between the

two variables is large enough to rule out random chance or sampling error as reasons that there appears to be a relationship between the two variables.

41

Page 42: Univariate, bivariate analysis, hypothesis testing, chi square

Chi-square is simply an extension of a cross-tabulation that gives you more information about the relationship.

However, it provides no information about the direction of the relationship (positive or negative) between the two

variables.

42

Page 43: Univariate, bivariate analysis, hypothesis testing, chi square

Let’s use the following table to test a hypothesis:

Education

Income High Low Total

High (Above $40,000)

40 50

Low ($39,999 or less)

50

Total 50 50 100

43

Page 44: Univariate, bivariate analysis, hypothesis testing, chi square

I have not filled in all of the information because we need to talk about two concepts

before we start calculations:

• Degrees of Freedom: In any table, there are a limited number of choices for the values in each cell.

• Marginals: Total frequencies in columns and rows.

44

Page 45: Univariate, bivariate analysis, hypothesis testing, chi square

Let’s look at the number of choices we have in the previous table:

Education

Income High Low Total

High (Above $40,000)

40 50

Low ($39,999 or less)

50

Total 50 50 100

45

Page 46: Univariate, bivariate analysis, hypothesis testing, chi square

So the table becomes:

Education

Income High Low Total

High (Above $40,000)

40 10 50

Low ($39,999 or less)

10 40 50

Total 50 50 100

46

Page 47: Univariate, bivariate analysis, hypothesis testing, chi square

The rules for determining degrees of freedom

in cross-tabulations or contingency tables:

• In any two by two tables (two columns, two rows, excluding marginals) DF = 1.

• For all other tables, calculate DF as:

(c -1 ) * (r-1) where c = columns and r = rows.

( So for a table with 3 columns and 4 rows, DF = ____. )

47

Page 48: Univariate, bivariate analysis, hypothesis testing, chi square

Importance of Degrees of Freedom

• You will see degrees of freedom on your SPSS print out.

• Most types of inferential statistics use DF in calculations.

• In chi-square, we need to know DF if we are calculating chi-square by hand. You must use the value of the chi-square and DF to determine if the chi-square value is large enough to be statistically significant (consult chi-square table in most statistics books).

48

Page 49: Univariate, bivariate analysis, hypothesis testing, chi square

Hypothesis Testing

• Goal: Make statement(s) regarding unknown population parameter values based on sample data

• Elements of a hypothesis test:– Null hypothesis - Statement regarding the value(s) of unknown

parameter(s). Typically will imply no association between explanatory and response variables in our applications (will always contain an equality)

– Alternative hypothesis - Statement contradictory to the null hypothesis (will always contain an inequality)

– Test statistic - Quantity based on sample data and null hypothesis used to test between null and alternative hypotheses

– Rejection region - Values of the test statistic for which we reject the null in favor of the alternative hypothesis

49

Page 50: Univariate, bivariate analysis, hypothesis testing, chi square

Hypothesis Testing

Test Result –

True State

H0 True H0 False

H0 True CorrectDecision

Type I Error

H0 False Type II Error CorrectDecision

)()( ErrorIITypePErrorITypeP

• Goal: Keep , reasonably small 50

Page 51: Univariate, bivariate analysis, hypothesis testing, chi square

Example - Efficacy Test for New drug

• Drug company has new drug, wishes to compare it with current standard treatment

• Federal regulators tell company that they must demonstrate that new drug is better than current treatment to receive approval

• Firm runs clinical trial where some patients receive new drug, and others receive standard treatment

• Numeric response of therapeutic effect is obtained (higher scores are better).

• Parameter of interest: New - Std

51

Page 52: Univariate, bivariate analysis, hypothesis testing, chi square

Example - Efficacy Test for New drug

• Null hypothesis - New drug is no better than standard trt

00:0 StdNewStdNewH

• Alternative hypothesis - New drug is better than standard trt

0: StdNewAH

• Experimental (Sample) data:

StdNew

StdNew

StdNew

nn

ss

yy

52

Page 53: Univariate, bivariate analysis, hypothesis testing, chi square

Sampling Distribution of Difference in Means

• In large samples, the difference in two sample means is approximately normally distributed: N= Normal distribution, with a mean and SD

2

22

1

21

2121 ,~nn

NYY

• Under the null hypothesis, 1-2=0 and:

)1,0(~

2

22

1

21

21N

nn

YYZ

53

Page 54: Univariate, bivariate analysis, hypothesis testing, chi square

Example - Efficacy Test for New drug

• Type I error - Concluding that the new drug is better than the standard (HA) when in fact it is no better (H0). Ineffective drug is deemed better.

– Traditionally = P(Type I error) = 0.05

• Type II error - Failing to conclude that the new drug is better (HA) when in fact it is. Effective drug is deemed to be no better.

– Traditionally a clinically important difference ( is assigned and sample sizes chosen so that:

= P(Type II error | 1-2 = ) 0.20

54

Page 55: Univariate, bivariate analysis, hypothesis testing, chi square

Elements of a Hypothesis Test

• Test Statistic - Difference between the Sample means, scaled to number of standard deviations (standard errors) from the null difference of 0 for the Population means:

2

22

1

21

21:..

ns

ns

yyzST obs

• Rejection Region - Set of values of the test statistic that are consistent with HA, such that the probability it falls in this region when H0 is true is (we will always set =0.05)

645.105.0:.. zzzRR obs55

Page 56: Univariate, bivariate analysis, hypothesis testing, chi square

P-value (aka Observed Significance Level)

• P-value - Measure of the strength of evidence the sample data provides against the null hypothesis:

P(Evidence This strong or stronger against H0 | H0 is true)

)(: obszZPpvalP

56

Page 57: Univariate, bivariate analysis, hypothesis testing, chi square

Large-Sample Test H0:1-2=0 vs H0:1-2>0

• H0: 1-2 = 0 (No difference in population means

• HA: 1-2 > 0 (Population Mean 1 > Pop Mean 2)

ty_value][probabiliobs

obs

2

2

2

1

2

1

21obs

)zZ(P:valueP

zz:.R.R

n

s

n

s

yyz:.S.T

Region] [Rejection

Statistic][Test

• Conclusion - Reject H0 if test statistic falls in rejection region, or equivalently the P-value is

57

Page 58: Univariate, bivariate analysis, hypothesis testing, chi square

Example - Botox for Cervical Dystonia

• Patients - Individuals suffering from cervical dystonia • Response - Tsui score of severity of cervical dystonia

(higher scores are more severe) at week 8 of Tx• Research (alternative) hypothesis - Botox A decreases

mean Tsui score more than placebo• Groups - Placebo (Group 1) and Botox A (Group 2)• Experimental (Sample) Results:

354.37.7

336.31.10

222

111

nsy

nsy

Source: Wissel, et al (2001)58

Page 59: Univariate, bivariate analysis, hypothesis testing, chi square

Example - Botox for Cervical Dystonia

0024.)82.2(:

645.1:..

82.285.0

4.2

35)4.3(

33)6.3(

7.71.10:..

0:

0:

05.

22

21

210

ZPvalP

zzzRR

zST

H

H

obs

obs

A

Test whether Botox A produces lower mean Tsui scores than placebo ( = 0.05)

Conclusion: Botox A produces lower mean Tsui scores than placebo (since 2.82 > 1.645 and P-value < 0.05)

There is only 0.24% chance that it is by chance. Hence Botox is better.

59

Page 60: Univariate, bivariate analysis, hypothesis testing, chi square

2-Sided Tests

• Many studies don’t assume a direction wrt the difference 1-2

• H0: 1-2 = 0 HA: 1-2 0

• Test statistic is the same as before• Decision Rule:

– Conclude 1-2 > 0 if zobs z=0.05 z2=1.96)

– Conclude 1-2 < 0 if zobs -z=0.05 -z2= -1.96)

– Do not reject 1-2 = 0 if -zzobs z

• P-value: 2P(Z |zobs|)

60

Page 61: Univariate, bivariate analysis, hypothesis testing, chi square

Power of a Test

• Power - Probability a test rejects H0 (depends on 1- 2)

– H0 True: Power = P(Type I error) = – H0 False: Power = 1-P(Type II error) = 1-

· Example: · H0: 1- 2 = 0 HA: 1- 2 > 0

=

n1 = n2 = 25

· Decision Rule: Reject H0 (at =0.05 significance level) if:

326.2645.12

2121

2

22

1

21

21

yy

yy

nn

yyzobs

1.414* 1.645= 2.326

61

Page 62: Univariate, bivariate analysis, hypothesis testing, chi square

Power of a Test

• Now suppose in reality that 1-2 = 3.0 (HA is true)

• Power now refers to the probability we (correctly) reject the null hypothesis. Note that the sampling distribution of the difference in sample means is approximately normal, with mean 3.0 and standard deviation (standard error) 1.414.

• Decision Rule (from last slide): Conclude population means differ if the sample mean for group 1 is at least 2.326 higher than the sample mean for group 2

• Power for this case can be computed as:)414.10.2,3(~)326.2( 2121 NYYYYP

62

Page 63: Univariate, bivariate analysis, hypothesis testing, chi square

Power of a Test

• All else being equal:

• As sample sizes increase, power increases

• As population variances decrease, power increases

• As the true mean difference increases, power increases

63

Page 64: Univariate, bivariate analysis, hypothesis testing, chi square

Power of a Test

Distribution (H0) Distribution (HA)

64

Page 65: Univariate, bivariate analysis, hypothesis testing, chi square

Power of a Test

Power Curves for group sample sizes of 25,50,75,100 and varying true values 1-2 with 1=2=5.

• For given 1-2 , power increases with sample size

• For given sample size, power increases with 1-2 65

Page 66: Univariate, bivariate analysis, hypothesis testing, chi square

Steps in testing a hypothesis:

• State the research hypothesis

• State the null hypothesis

• Choose a level of statistical significance (alpha level)

• Select and compute the test statistic

• Make a decision regarding whether to accept or reject the null hypothesis.

66

Page 67: Univariate, bivariate analysis, hypothesis testing, chi square

Calculating Chi-Square

• Formula is [0 - E]2

E

Where 0 is the observed value in a cell

E is the expected value in the same cell we would see if there was

no association

67

Page 68: Univariate, bivariate analysis, hypothesis testing, chi square

First steps

Alternative hypothesis is: There is a relationship between income level and education for respondents in a survey of BA students.

Null hypothesis is: There is no relationship between income level and education for respondents in a survey of BA students

Confidence level set at 0.05

68

Page 69: Univariate, bivariate analysis, hypothesis testing, chi square

Rules for determining whether the chi-square statistic and probability are large enough to verify a

relationship.

• For hand calculations, use the degree(s) of freedom and the confidence level you set to check the Chi-square table found in most statistics books. For the chi-square to be statistically significant, it must be the same size or larger than the number in the table.

• On an SPSS print out, the p. or significance value must be the same size or smaller than your significance level.

69

Page 70: Univariate, bivariate analysis, hypothesis testing, chi square

The formula for expected values are E = R*C

Education

Income High Low Total

High (Above $40,000)

25 25 50

Low ($39,999 or less)

25 25 50

Total 50 50 100

70

Page 71: Univariate, bivariate analysis, hypothesis testing, chi square

Go back to our first table

Education

Income High Low Total

High (Above $40,000)

40 10 50

Low ($39,999 or less)

10 40 50

Total 50 50 100

71

Page 72: Univariate, bivariate analysis, hypothesis testing, chi square

Chi-square calculation isExpected Values Chi-square

Cell 1 50 * 50/100= 25 (40-25)2/25= 9

Cell 2 50*50/100= 25 (10-25)2/25= 9

Cell 3 50 * 50/100= 25 (10-25)2/25= 9

Cell 4 50*50/100= 25 (40-25)2/25= 9

36

At 0.05, 1 = df, chi-square must be larger

than 3.84 to be statistically significant72

Page 73: Univariate, bivariate analysis, hypothesis testing, chi square

Chi-Square Table

73

Page 74: Univariate, bivariate analysis, hypothesis testing, chi square

Let’s calculate another chi-square- service receipt by location of residence

Service Urban Rural Total

Yes 20 40 60

No 30 10 40

Total 50 50 100

74

Page 75: Univariate, bivariate analysis, hypothesis testing, chi square

For this table,

• DF = 1

• Alternative hypothesis:

Receiving service is associated with location of residence.

Null hypothesis:

There is no association between receiving service and location of residence.

75

Page 76: Univariate, bivariate analysis, hypothesis testing, chi square

Calculations for chi-square are

Expected Values Chi-square

Cell 1 50 * 60/100= 30 (20-30)2/30= 3.33

Cell 2 50*40/100= 20 (30-20)2/20= 5.00

Cell 3 50*60/100= 30 (40-30)2/30= 3.33

Cell 4 50*40/100= 20 (10-20)2/20= 5.00

16.67

At 1 DF at 0.01 chi-square must be greater than 6.64. Do we accept or reject the null hypothesis? 76

Page 77: Univariate, bivariate analysis, hypothesis testing, chi square

Running chi-square in SPSS

• Select descriptive statistics• Select cross-tabulation• Highlight your independent variable and click on the arrow.• Highlight your dependent variable and click on the arrow.• Select Cells• Choose column percents• Click continue• Select statistics• Select chi-square• Click continue• Click ok

77