15
Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012

Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012

Embed Size (px)

Citation preview

Page 1: Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012

Summarizing the Relationship Between Two Variables with Tables and a bit of a review

Chapters 6 and 7

Jan 31 and Feb 1, 2012

Page 2: Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012

Looking at Tables

• Tables are useful for examining the relationship between: – variables measured at the nominal or ordinal

level,– or variables measured at the interval or ratio

level with a small number of discrete values.

Page 3: Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012

Some Terminology and Conventions

• Two-way tables, Crosstabulations (Crosstabs)

• Column (explanatory or independent)

• Row (response or dependent)• Although the book does not always do

this, there is a soft convention of stating a table title as “Dependent by Independent”. E.g. Table 2: Election Needed Now by Province.

Page 4: Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012

• Cells: As in a spread sheet a table is divided into cells aligned along rows and columns.

• Marginal Distributions: These are the numbers that summarize the rows and the columns at the side and bottom of a table

• Conditional Distributions: The book gives a very complicated explanation for what this is. In reality it is just the percentage of the cases in a cell or cells. This can be the percentage of cases along the horizontal row or the vertical column.

Page 5: Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012

Here is an example of a Two Way table made with the “crosstab” procedure in SPSS (see my tip sheet for doing them in Excel). Question. Is there a difference between the number of bathrooms that homes in urban and rural

Ontario have?• Look at row 1.

Reading Across: we see 94% of the homes with 1 bathroom are

Urban 6% are Rural • Look at Column 1.

Reading Down:we see 52.2% of urban homes have 1 bath-room, 36.8% have 2 bathrooms 11.0% have three or more bathrooms.

Page 6: Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012

• The percentages give us a way to ‘eyeball’ the data and estimate if there is a difference, but to ask if these difference are meaningful, we must go further and calculate some statistics.

Page 7: Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012

• The Chi Sq. Test is a common one to use in a table with nominal variables such as this.

• It measures the difference between the number of cases we expect to see in each cell and the number of cases we actually observe in each cell of our table.

• The Chi Sq. value itself has little meaning for us. What matters is whether or not the value is significant. In this case it is > .05.

• Therefore we reject the hypothesis that the results we see are meaningfully different from what we could expect through simple probability

• Therefore we also reject that there is any meaningful difference between the number of bathrooms in urban and rural homes.

Page 8: Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012

Simpson’s ParadoxThat lurking variable thing again

• Example 6.4 in your book gives you a look at a problem called Simpson’s Paradox.

• An association or comparison that holds for all or several groups can reverse direction when the data are combined to form a single group (Moore pg. 169).

• As Moore further notes, this is usually the sign of a “lurking” variable.

Page 9: Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012
Page 10: Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012
Page 11: Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012

• In order to check for lurking variables we can subdivide tables by a further categorical variable (such as was done in the book where the data was divided into serious and less serious accidents).

Page 12: Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012

The book has two very nice “four step” graphics in Chapter 7

Page 13: Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012

• I would probably adjust this a bit. • In the final leg I would say:

– If the Y (response or dependent) variable is quantitative (measured at the interval or ratio level) a regression line is a good summary.

– However, if the Y (response or dependent) variable is ordinal or nominal, then a two way table is needed.

Page 14: Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012

And again the general four step approaches to any statistical problem

• State: What is the practical question, in the context of the real world setting

• Plan: What specific statistical operations does this problem call for

• Solve: Make the graphs and carry out the calculations needed for this problem

• Conclude: Give your practical conclusion in the setting of the real-world problem

Page 15: Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012

Also keep in mind…

• I would also remind you that we looked at a couple of things the book did not cover, such as “spearman’s rho” as an alternative measure of correlation when working with a table in which both variables are ordinal and Chi Sq.