30
4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))} These plots can be produced by going to “file” and “new” and “script file”. Paste the commands into the script file window, press “F10” and the four plots are produced automatically. 4 histograms all at once Same as above, but instead of qqnorm, use hist, and you only need one column rather than dataframe 1 and 2. Also, don’t forget to change your label.

4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Embed Size (px)

Citation preview

Page 1: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

4 normal probability plots at oncepar(mfrow=c(2,2))

for(i in 1:4) {

qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”)

title(paste(“yourchoice”,i,sep=“”))}

These plots can be produced by going to “file” and “new” and

“script file”. Paste the commands into the script file window,

press “F10” and the four plots are produced automatically.

4 histograms all at onceSame as above, but instead of qqnorm, use hist, and you only

need one column rather than dataframe 1 and 2. Also, don’t forget

to change your label.

Page 2: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Lab: Chi-Squared Test (X2) Lack of Fit

November 10, 2000

Page 3: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

History

Invented in 1900 Oldest inference procedure still used in

its original form English statistician Karl Pearson

Page 4: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

The X2 Test

When you have data values for two categorical variables

Also called a two-way table For example: men/women and NSOE

track; regenerated seaweed (yes/no) and access level (limpet only/limpet and fish/etc).

Page 5: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Example: Why do Men and Women Participate in Sports?

Desire to win or do better than others– called social comparison

Desire to improve one’s skills or to do one’s best– called mastery

Page 6: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Data Collected from 67 male and 67 female

undergraduate students at a large university

Survey given asking about students’ sports goals.

Students were all categorized either high or low with regard to both of the questions:– high or low social comparison– high or low mastery

Duda, Joan L., Leisures Sciences, 10(1988), pp. 95-106

Page 7: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Groups

This leads to four groups:– High social comparison, high mastery. – High social comparison, low mastery. – Low social comparison, high mastery– Low social comparison, low mastery

We want to compare this for men and women.

Page 8: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Observed Counts for Sports Goals

Goal Female Male

HSC-HM 14 31

HS-LM 7 18

LSC-HM 21 5

LSC-LM 25 13

Total 67 67

Page 9: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

1. Add Totals

Column: In this case, what population the observation comes from..

Observed Counts for Sports Goals

Goal Female Male Total

HSC-HM 14 31 45

HS-LM 7 18 25

LSC-HM 21 5 26

LSC-LM 25 13 38

Total 67 67 134

Row: Categorical response variable

Grand total

Page 10: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Observed Counts for Sports Goals

Goal Female Male Total

HSC-HM 14 31 45

HS-LM 7 18 25

LSC-HM 21 5 26

LSC-LM 25 13 38

Total 67 67 134

A Cell

A table with r rows and c columns contains r x c cells

Page 11: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

X2 is really an analysis of 5 things in this table:

Frequency (actual count) Percent of overall total Percent of row Percent of column Expected count

Page 12: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Observed Counts for Sports Goals

Goal Female Male Total

HSC-HM 14 31 45

HS-LM 7 18 25

LSC-HM 21 5 26

LSC-LM 25 13 38

Total 67 67 134

Frequency: Just the cell count

Page 13: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Observed Counts for Sports Goals

Goal Female Male Total

HSC-HM 14 31 45

HS-LM 7 18 25

LSC-HM 21 5 26

LSC-LM 25 13 38

Total 67 67 134

Overall Percent: Cell count divided by grand total

14/134=0.105. That is, 10.5% of all those studied were HSC-HM and female.

Page 14: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Observed Counts for Sports Goals

Goal Female Male Total

HSC-HM 14 31 45

HS-LM 7 18 25

LSC-HM 21 5 26

LSC-LM 25 13 38

Total 67 67 134

Row Percent: Cell count divided by row total

14/45=0.311 That is, of all those students reporting HSC-HM,31% were female.

Page 15: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Observed Counts for Sports Goals

Goal Female Male Total

HSC-HM 14 31 45

HS-LM 7 18 25

LSC-HM 21 5 26

LSC-LM 25 13 38

Total 67 67 134

Column Percent: Cell count divided by column total

14/67=0.209 That is, of all female student participants, 21% were HSC-HM..

Page 16: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Expected Count

Coming later to a slide near you...

Page 17: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

These percents are useful in graphical analysis. Overall, row, and column percent can

be calculated for each cell Then questions of interest can be asked We are interested in the effect of sex on

sports goals. In this case, we would examine the

column percents

Page 18: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Column percents for sports goals

Goal Female Male

HSC-HM 21 46

HSC-LM 10 27

LSC-HM 31 7

LSC-LM 37 19

Total 100 100

Page 19: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

05

1015

2025303540

4550

Female Male

HSC-HMHSC-LM

LSC-HM

LSC-LM

Page 20: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Surprise, surprise - we want to ask whether these apparently

obvious differences are significant.

Can these differences be attributed to chance?

Calculate the chi-square and compare to a chi-square distribution

Determine the p-value A low p-value means we reject our null

hypothesis (sound familiar?)

Page 21: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

The hypotheses: Null

No association exists between our row and our column variables– No association exists between sex

and sports goals

– The distributions of sports in the male and female populations are the same.

Page 22: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

The hypotheses: Alternative Alternative: An association exists

between the row and column variables– No particular direction (not one- or two-

sided)– The distributions of sports goals in the male

and female populations are not all the same.

– Includes many kinds of possible associations

– “Men rate social comparison higher as a goal than do women”

Page 23: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

OK: Now back to the Expected Count

If the null hypothesis were true, what would the count in each cell be?

For women in the HSC-HM cell, it would work like this:– 33.6% of all respondents are HSC-HM– We have 67 women– So, if no sex difference exists (our null),

we would expect that 33.6% of our 67 women would be HSC-HM --> 22.5 women.

Page 24: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Observed Counts for Sports Goals

Goal Female Male Total

HSC-HM 14 31 45

HS-LM 7 18 25

LSC-HM 21 5 26

LSC-LM 25 13 38

Total 67 67 134

Expected Count

1. 45/134=33.6% of all respondents are HSC-HM.

2. 33.6% of 67 women is 22.5.

Page 25: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Finally: The Chi-Squared Statistic Itself

Compare the entire set of observed counts with the set of expected counts.

Take the difference in each cell between observed and expected

Square each difference Normalize these (divide by the expected

count) Sum over all cells.

Page 26: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

The Formula:

Large values of X2 provide evidence against the null hypothesis

A chi-square distribution is used to obtain the p-value

Degrees of freedom are (r-1)(c-1)

2

2 observed count - expected count

expected countX

Page 27: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

In this case... Chi-squared = 24.898 on 3 df. The p-value is less than 0.0005. The chance of obtaining a chi-squared

value greater than or equal to this due to chance alone is very small

Clear evidence against the null hypothesis

Strong evidence that female and male students have different distributions of sports goals.

Page 28: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Is that all you can say? No, you can and should combine the test with

a description that shows the relationship. – Percents in our earlier table and our graph– Summary comments: the percent fo males in each

of the HSC goal classes is more than twice the percent of females.

– The HSC-HM group contains 46% of the males, but only 21% of the females

– The HSC-LM group contains 27% of the males and only 10% of the females

– We conclude that males are more likely to be motivated by social comparison goals and females are more likely to be motivated by mastery goals.

Page 29: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Important to remember:

The approximation of the population chi-square by our estimate becomes more accurate as the cell counts increase.

For 2 x 2 tables, the expected count in each of the 4 cells must be five or higher.

For tables larger than 2 x 2, the average of the expected counts must be 5 or higher, and the smallest expected count must be 1 or more.

Page 30: 4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Important to remember:

This is sometimes called the chi-squared test for homogeneity or the chi-squared test of independence.

Although this is is one of the most widely used of statistical tools, it is also one of the least informative.– The only thing you produce is a p-value and

there is no associated parameter to describe the degree of dependence

– the alternative hypothesis is very general (that row and columns are not independent)