Answering Questions Statisticallymbaiocch/Guest Lecture... · Answering Questions Statistically...

Preview:

Citation preview

Answering Questions StatisticallyENVS 407 – Prevention of Tobacco Addiction

Key statistical ideas

• Clarify your questions

• Construct contrasts

• Know the procedure

• Control as much as you can, then leave the rest to chance!

Clarify your questions

• Initial Questions:

▫ Why are people buying cigarettes?

▫ Where are people getting their cigarettes?

• Problems:

▫ “Why” is super hard to answer.

• Solution:

▫ Break question into smaller, easier questions

▫ Try to think of questions that can be written as “how much/many” or “is this more than that”

Clarify your questions (cont’d)

• Break the question into parts:

▫ Availability: what stores sell the most types of tobacco?

▫ Availability: what stores carry the most brands of cigarettes?

▫ Advertising: which modes of advertising are most popular?

▫ Advertising: do different locations have different size advertisements?

Construct contrasts

• Statistics is about comparing one set of things to another set of things… and figuring out if they’re different

Construct contrasts – hypothesis

• What do you want to disprove?

▫ Null hypothesis

• What do you want to prove beyond a reasonable doubt?

▫ Alternative hypothesis

Construct contrasts – EXAMPLE 1

• Null hypothesis: The average number of tobacco advertisements at grocery stores is the same as the number of advertisements at a liquor shop.

• Alternative hypothesis: The average number of advertisements at grocery stores is less than at a liquor shop.

Construct contrasts – EXAMPLE 2

• Null hypothesis: The average percent of “sexy” ads at grocery stores is the same as at liquor stores.

• Alternative hypothesis: They are not the same.

Know the procedure – collecting data

• Once you translate the questions into statistics…

▫ Design the study

Means vs percentage?

▫ Randomly select places to sample the data

Careful of bias!

▫ Collect the data

Harder than you think!!

▫ Analyze the data

Easier than you think!!

▫ Interpret the data

Know the procedure – testing

• Things aren’t ever perfectly like the null…

• …but how different is too different?

Know the procedure – counts/means

• Greek vs. Roman

▫ μ = true mean (a.k.a. “average”)

▫ σ = true standard deviation

▫ x = sample mean (i.e., comes from the data)

▫ s = sample standard deviation (i.e., from the data)

▫ n = sample size (sometimes have n1 and n2)

Know the procedure – counts/means

Know the procedure – counts/means

Know the procedure – counts/means

Know the procedure – counts/means

Know the procedure – counts/means

Know the procedure – counts/means

Know the procedure – counts/means

Know the procedure – counts/means

Know the procedure – counts/means

t-statistic

Know the procedure – counts/means

t-statistic

But is this “big”?

Tables for the t distribution• If we want a 100·C% confidence

level for the test, we need to find the value so that we have a probability of C between -t* and t*

in a t distribution with n-1 degrees of freedom

• Example: 95% confidence level when n = 14 means that we need a

tail probability of 0.025, so t*=2.15

= 0.95

= 0.025

t*-t*

df = 14

Know the procedure – percentages

• The symbols

▫ p = true percentage

▫ Y = observed outcome (e.g. count of successes)

▫ n = sample size

▫ p = sample percentage (i.e., Y/n)

Know the procedure – percentages

Control as much as you can…

• Make sure you don’t “stack the deck”

▫ Don’t pick all your grocery stores from Center City and all of your liquor stores from University City

• Standardize definitions of “size of advertisement” and “theme of ad”

▫ It’s surprising how much opinions differ

• Think carefully about all variables which are important… but aren’t the one you’re most interested in. CONTROL THEM!!!

…leave the rest to chance!

• Randomize once you’ve controlled for the important variables

▫ Get a list of well “controlled” stores, and then randomly pick which you’ll visit

▫ Picking the easiest to go to will introduce selection bias!!

Key statistical ideas

• Clarify your questions▫ Bigger -> smaller▫ Intangible -> quantifiable

• Construct contrasts▫ Compare two things: greater than? less than?

merely different?

• Know the procedure ▫ Or know someone who knows the procedure…

• Control as much as you can, leave the rest to chance!

Websites and resources

• Quick reference▫ http://en.wikipedia.org/wiki/Student's_t-test

▫ Use “unequal sample sizes, unequal variance”

• Simple t-test calculator▫ http://www.graphpad.com/quickcalcs/ttest1.cfm

• Wharon StatLab▫ http://www-stat.wharton.upenn.edu/~sivana/statlab.html

• My page ▫ http://stat.wharton.upenn.edu/~mbaiocch/

▫ The slides I used today▫ Spreadsheet▫ My contact info