35
Statistics: Unlocking the Power of Data STAT 101 Dr. Kari Lock Morgan Synthesis Big Picture Essential Synthesis Review Speed Dating

Synthesis

  • Upload
    hoshi

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

STAT 101 Dr. Kari Lock Morgan. Synthesis. Big Picture Essential Synthesis Review Speed Dating. Final. Monday, April 28th, 2 – 5pm No make-ups, no excuses 30 % of your course grade Cumulative from the entire course - PowerPoint PPT Presentation

Citation preview

PowerPoint Presentation

STAT 101Dr. Kari Lock MorganSynthesisBig Picture Essential Synthesis Review Speed DatingStatistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5FinalMonday, April 28th, 2 5pmNo make-ups, no excuses30% of your course gradeCumulative from the entire courseOpen only to a calculator and 3 double-sided pages of notes prepared only by youStatistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Help Before FinalWednesday, 4/23:3 4pm, Prof Morgan, Old Chem 2164 9pm, Stat Ed Help, Old Chem 211A

Thursday, 4/24:5 7pm, Yating, Old Chem 211A4 9pm, Stat Ed Help, Old Chem 211A

Friday, 4/25:1 3pm, Prof Morgan, Old Chem 2163 4 pm, REVIEW SESSION, room tbd

Sunday, 4/27:4 6pm, Tori, Old Chem 211A6 7pm, Stat Ed Help, Old Chem 211A7 9pm, David, Old Chem 211A

Monday, 4/28:12:30 1:30, Prof Morgan, Old Chem 216

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5ReviewWhat is Bayes Rule?A way of getting from P(A if B) to P(B if A)A way of calculating P(A and B)A way of calculating P(A or B)

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Data Collection The way the data are/were collected determines the scope of inference For generalizing to the population: was it a random sample? Was there sampling bias? For assessing causality: was it a randomized experiment? Collecting good data is crucial to making good inferences based on the dataStatistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Exploratory Data Analysis Before doing inference, always explore your data with descriptive statistics Always visualize your data! Visualize your variables and relationships between variables Calculate summary statistics for variables and relationships between variables these will be key for later inference The type of visualization and summary statistics depends on whether the variable(s) are categorical or quantitativeStatistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Estimation For good estimation, provide not just a point estimate, but an interval estimate which takes into account the uncertainty of the statistic Confidence intervals are designed to capture the true parameter for a specified proportion of all samples A P% confidence interval can be created by bootstrapping (sampling with replacement from the sample) and using the middle P% of bootstrap statistics

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Hypothesis Testing A p-value is the probability of getting a statistic as extreme as observed, if H0 is true The p-value measures the strength of the evidence the data provide against H0 If the p-value is low, the H0 must go If the p-value is not low, then you can not reject H0 and have an inconclusive test

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5p-value A p-value can be calculated by A randomization test: simulate statistics assuming H0 is true, and see what proportion of simulated statistics are as extreme as that observed Calculating a test statistic and comparing that to a theoretical reference distribution (normal, t, 2, F)

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Hypothesis TestsVariablesAppropriate TestOne QuantitativeSingle mean (t)One CategoricalSingle proportion (normal)Chi-square Goodness of FitTwo CategoricalDifference in proportions (normal)Chi-square Test for AssociationOne Quantitative, One CategoricalDifference in means (t)Matched pairs (t)ANOVA (F)Two QuantitativeCorrelation (t)Slope in Simple Linear Regression (t)More than twoMultiple Regression (t, F)Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Regression Regression is a way to predict one response variable with multiple explanatory variables Regression fits the coefficients of the model

The model can be used to Analyze relationships between the explanatory variables and the response Predict Y based on the explanatory variables Adjust for confounding variables

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Probability

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Romance What variables help to predict romantic interest? Do these variables differ for males and females? All we need to figure this out is DATA!

(For all of you, being almost done with STAT 101, this is the case for many interesting questions!)Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Speed Dating We will use data from speed dating conducted at Columbia University, 2002-2004 276 males and 276 females from Columbias various graduate and professional schools Each person met with 10-20 people of the opposite sex for 4 minutes each After each encounter each person said either yes (they would like to be put in touch with that partner) or no

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Speed Dating Data

What are the cases?Students participating in speed datingSpeed datesRatings of each student

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Speed DatingWhat is the population? Ideal population? More realistic population?Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Speed DatingIt is randomly determined who the students will be paired with for the speed dates. We find that people are significantly more likely to say yes to people they think are more intelligent. Can we infer causality between perceived intelligence and wanting a second date?YesNo

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Successful Speed Date?What is the probability that a speed date is successful (results in both people wanting a second date)?To best answer this question, we should useDescriptive statisticsConfidence IntervalHypothesis TestRegressionBayes Rule

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Successful Speed Date?63 of the 276 speed dates were deemed successful (both male and female said yes).A 95% confidence interval for the true proportion of successful speed dates is(0.2, 0.3)(0.18, 0.28)(0.21, 0.25)(0.13, 0.33)

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Pickiness and GenderAre males or females more picky when it comes to saying yes?Guesses?MalesFemales

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Pickiness and GenderAre males or females more picky when it comes to saying yes? How could you answer this?Test for a single proportionTest for a difference in proportionsChi-square test for associationANOVAEither (b) or (c)

YesNoMales146130Females127149Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Pickiness and GenderDo males and females differ in their pickiness? Using = 0.05, how would you answer this?a) Yesb) No c) Not enough information

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5ReciprocityAre people more likely to say yes to someone who says yes back? How would you best answer this?Descriptive statisticsConfidence IntervalHypothesis TestRegressionBayes Rule

Male says YesMale says NoFemale says Yes6364Female says No8366Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5ReciprocityAre people more likely to say yes to someone who says yes back? How could you answer this?Test for a single proportionTest for a difference in proportionsChi-square test for associationANOVAEither (b) or (c)

Male says YesMale says NoFemale says Yes6364Female says No8366Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5ReciprocityAre people more likely to say yes to someone who says yes back? p-value = 0.3731Based on this data, we cannot determine whether people are more likely to say yes to someone who says yes back.Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Race and Response: FemalesDoes the chance of females saying yes to males differ by race?

How could you answer this question?Test for a single proportionTest for a difference in proportionsChi-square goodness of fitChi-square test for associationANOVA

AsianBlackCaucasianLatinoOther0.500.570.420.480.53Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Race and Response: MalesEach person rated their date on a scale of 1-10 based on how much they liked them overall. Does how much males like females differ by race? How would you test this?Chi-square testt-test for a difference in meansMatched pairs testANOVAEither (b) or (d)

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Physical AttractivenessEach person also rated their date from 1-10 on the physical attractiveness. Do males rate females higher, or do females rate males higher?Which tool would you use to answer this question?Two-sample difference in meansMatched pair difference in meansChi-SquareANOVACorrelation

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5

The histogram shown is of thedatabootstrap distributionrandomization distributionsampling distributionPhysical Attractiveness

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Other RatingsEach person also rated their date from 1-10 on the following attributes:AttractivenessSincerityIntelligenceHow fun the person seemsAmbitionShared interests

Which of these best predict how much someone will like their date?Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Multiple Regression

MALES RATING FEMALES:FEMALES RATING MALES:

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Ambition and LikingDo people prefer their dates to be less ambitious???How does the perceived ambition of a date relate to how much the date is liked?How would you answer this question?Inference for difference in meansANOVAInference for correlationInference for simple linear regressionEither (b), (c) or (d)

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Simple Linear RegressionMALES RATING FEMALES:FEMALES RATING MALES:

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5Ambition and Liking

r = 0.44, SE = 0.05

Find a 95% CI for .

Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5After taking STAT 101:If you have a question that needs answeringThank You!!!ALL YOU NEED IS DATA!!!Statistics: Unlocking the Power of Data Lock5Statistics: Unlocking the Power of Data Lock5