21
Chi-Square and Chi-Square and Analysis of Analysis of Variance (ANOVA) Variance (ANOVA) Lecture 9 Lecture 9

Chi-Square and Analysis of Variance (ANOVA) Lecture 9

  • View
    222

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

Chi-Square and Analysis Chi-Square and Analysis of Variance (ANOVA)of Variance (ANOVA)

Lecture 9Lecture 9

Page 2: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

The Chi-Square Distribution The Chi-Square Distribution and Test for Independenceand Test for Independence

Hypothesis testing between two or Hypothesis testing between two or more categorical variablesmore categorical variables

Page 3: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

Chi-square Test of IndependenceChi-square Test of Independence

Tests the association between two Tests the association between two nominal (categorical) variables.nominal (categorical) variables. Null Hyp: The 2 variables are independent.Null Hyp: The 2 variables are independent.

Its really just a comparison between Its really just a comparison between expected frequencies and observed expected frequencies and observed frequencies among the cells in a frequencies among the cells in a crosstabulation table.crosstabulation table.

Page 4: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

  Yes No Total

Males 46 (40.97) 71 (76.02) 117

Females 37 (42.03) 83(77.97) 120

Total 83 154 237

Example Crosstab: gender x Example Crosstab: gender x binary question binary question

Page 5: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

Degrees of freedomDegrees of freedom

Chi-square degrees of freedomChi-square degrees of freedom

df = (r-1) (c-1)df = (r-1) (c-1)Where r = # of rows, c = # of columnsWhere r = # of rows, c = # of columns

Thus, in any 2x2 contingency table, the degrees of Thus, in any 2x2 contingency table, the degrees of freedom = 1.freedom = 1.

As the degrees of freedom increase, the As the degrees of freedom increase, the distribution shifts to the right and the critical values distribution shifts to the right and the critical values of chi-square become larger.of chi-square become larger.

Page 6: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

Chi-Square DistributionChi-Square Distribution

The chi-square distribution results when The chi-square distribution results when independent variables with standard independent variables with standard normal distributions are squared and normal distributions are squared and summed.summed.

Page 7: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

Requirements for Chi-Square testRequirements for Chi-Square test

Must be a random sample from populationMust be a random sample from population

Data must be in raw frequenciesData must be in raw frequencies

Variables must be independentVariables must be independent

Categories for each I.V. must be mutually Categories for each I.V. must be mutually exclusive and exhaustiveexclusive and exhaustive

Page 8: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

Using the Chi-Square TestUsing the Chi-Square Test

Often used with contingency tables (i.e., Often used with contingency tables (i.e., crosstabulations)crosstabulations) E.g., gender x raceE.g., gender x race

Basically, the chi-square test of independence Basically, the chi-square test of independence tests whether the columns are contingent on the tests whether the columns are contingent on the rows in the table.rows in the table. In this case, the null hypothesis is that there is no In this case, the null hypothesis is that there is no

relationship between row and column frequencies.relationship between row and column frequencies.

Page 9: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

Practical Example:Practical Example:

Expected frequencies versus observed Expected frequencies versus observed frequenciesfrequencies

General Social Survey ExampleGeneral Social Survey Example

Page 10: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

ANOVA ANOVA and the f-distributionand the f-distribution

Hypothesis testing between a 3+ Hypothesis testing between a 3+ category variable and a metric category variable and a metric

variablevariable

Page 11: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

Analysis of VarianceAnalysis of Variance

In its simplest form, it is used to compare In its simplest form, it is used to compare means for three or more categories.means for three or more categories. Example:Example:

Life Happiness scale and Marital Status (married, Life Happiness scale and Marital Status (married, never married, divorced)never married, divorced)

Relies on the F-distributionRelies on the F-distribution Just like the t-distribution and chi-square Just like the t-distribution and chi-square

distribution, there are several sampling distribution, there are several sampling distributions for each possible value of df.distributions for each possible value of df.

Page 12: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

What is ANOVA?What is ANOVA?

If we have a categorical variable with 3+ If we have a categorical variable with 3+ categories and a metric/scale variable, we could categories and a metric/scale variable, we could just run 3 t-tests.just run 3 t-tests. The problem is that the 3 tests would not be The problem is that the 3 tests would not be

independent of each other (i.e., all of the information independent of each other (i.e., all of the information is known).is known).

A better approach: compare the variability A better approach: compare the variability between groups (treatment variance + error) to between groups (treatment variance + error) to the variability within the groups (error)the variability within the groups (error)

Page 13: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

The F-ratioThe F-ratio

MS = mean squareMS = mean square

bg = between groupsbg = between groups

wg = within groupswg = within groups

wg

bg

MS

MSF

Numerator is the “effect” Numerator is the “effect” and denominator is the and denominator is the “error”“error”

df = # of categories – 1 (k-df = # of categories – 1 (k-1)1)

Page 14: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

Between-Group Sum of Squares Between-Group Sum of Squares (Numerator)(Numerator)

Total variability – Residual VariabilityTotal variability – Residual Variability

TotalTotal variability is quantified as the sum of the variability is quantified as the sum of the squares of the differences between each value squares of the differences between each value and the grand mean.and the grand mean. Also called the total sum-of-squaresAlso called the total sum-of-squares

Variability Variability withinwithin groups is quantified as the sum groups is quantified as the sum of squares of the differences between each of squares of the differences between each value and its group meanvalue and its group mean Also called residual sum-of-squaresAlso called residual sum-of-squares

Page 15: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

Null Hypothesis in ANOVANull Hypothesis in ANOVA

If there is no If there is no difference between difference between the means, then the the means, then the between-group sum between-group sum of squares should = of squares should = the within-group sum the within-group sum of squares.of squares.

wg

bg

MS

MSF

Page 16: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

F-distributionF-distribution

F-test is always a one-tailed test.F-test is always a one-tailed test. Why?Why?

Page 17: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

Logic of the ANOVALogic of the ANOVA

Conceptual Intro to ANOVAConceptual Intro to ANOVA

Page 18: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

Bringing it all together: Bringing it all together:

Choosing the appropriate bivariate Choosing the appropriate bivariate statisticstatistic

Page 19: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

Reminder About CausalityReminder About Causality

Remember from earlier lectures: bivariate Remember from earlier lectures: bivariate statistics do not test causal relationships, statistics do not test causal relationships, they only show that there is a relationship.they only show that there is a relationship.

Even if you plan to use more sophisticated Even if you plan to use more sophisticated causal tests, you should always run simple causal tests, you should always run simple bivariate statistics on your key variables to bivariate statistics on your key variables to understand their relationships.understand their relationships.

Page 20: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

Choosing the Appropriate Choosing the Appropriate Statistical TestStatistical Test

General rules for choosing a bivariate test:General rules for choosing a bivariate test:

Two categorical variablesTwo categorical variablesChi-Square (crosstabulations)Chi-Square (crosstabulations)

Two metric variablesTwo metric variablesCorrelationCorrelation

One 3+ categorical variable, one metric variableOne 3+ categorical variable, one metric variable ANOVAANOVA

One binary categorical variable, one metric variableOne binary categorical variable, one metric variableT-testT-test

Page 21: Chi-Square and Analysis of Variance (ANOVA) Lecture 9

Assignment #2Assignment #2

Online (Online (course websitecourse website))

Due next Monday in class (April 10Due next Monday in class (April 10thth))