23
Class 1: Sept. 9 • About instructor: Dylan Small, Assistant Professor, Department of Statistics. • How I got interested in statistics?

Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Class 1: Sept. 9• About instructor: Dylan Small, Assistant

Professor, Department of Statistics.

• How I got interested in statistics?

                       

Page 2: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

My Current research• Statistical methods for comparing treatments/policies

when a perfectly controlled randomized experiment cannot be done using the method of “instrumental variables.” Applications to:– Treatment of depression among the elderly in primary care

practices– Food policy in developing countries

• Statistical methods for panel studies, studies that survey same people repeatedly over time.– Prediction of child morbidity/mortality in Pakistan using

previous height and weight measurements.

Page 3: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Course Objectives

• To learn how to use two important statistical tools to analyze data: Regression and Analysis of Variance

• To get hands on experience analyzing data and computing with data (using JMP)

• To gain experience in interpreting the results of a statistical analysis and communicating the results to others

Page 4: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Course requirements• Responsible for both material covered in the lecture

and reading associated with the lecture. • Weekly homework, typically handed out on Thursday,

due following Thursday at beginning of class. Late homework will be given at most half credit.

• Project: Analysis of data set of interest to you using regression. Work in groups of 2-3 people. Final report, class presentation. More details in October.

• Midterm: Tuesday, October 21, 3:00 pm-4:20pm• Final: Tuesday, December 21, 8:30am-10:30am

Page 5: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Grading

• Grades will be based on– 20% Homework– 30% Project– 20% Midterm– 30% Final

Page 6: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Web site/Textbooks• Web site:

http://www-stat.wharton.upenn.edu/~dsmall/stat112-f04 Can be reached by going to http://www-stat.wharton.upenn.edu

, clicking on courses and clicking on Stat 112.• Textbooks:

– Moore and McCabe, Introduction to the Practice of Statistics, 4 th edition (Required). We will be covering Chapter 2, part of Chapter 3 and Chapters 10-13.

– JMP version 5 with handbook. Highly recommended. If you do not own it, you need to sign up for a Wharton account and use it in the Wharton labs.

– JMP manual for Introduction to the Practice of Statistics. Recommended.

Page 7: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Instructor Accessibility

• E-mail address: [email protected]• My Office hours (office: 464 Huntsman Hall):

– Tuesdays and Thursdays after class, 4:30-5:30.– By appointment. I will be happy to meet with you if

you send me an e-mail to arrange a time.• I encourage you to come see me at least once during

the semester to chat about your background, interests, concerns about the class and future plans.

• TA: Lie Wang, office hours TBA• Stat Lab: Monday-Thursday, 9-3; Friday, 11-5

Page 8: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Class 1• Reading: Introduction to Chapter 2, Chapter 2.1• Topic: Relationships between variables measured on

same unit.• Unit could be an individual, a state, a company, a

year, etc. • Data set: Penn Alcohol data set. Penn Alcohol

dataset (pennalcohol.JMP under datasets on website). Survey given to 123 Penn undergraduates.

• Alcohol use: Number of days per month on which person drinks.

Distributions Alcohol Use

0 5 10 15 20 25

Page 9: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?
Page 10: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Bivariate Fit of Alcohol Use By Sensation Seeking

0

5

10

15

20

25

Alc

oh

ol U

se

-0.10 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 1.1

Sensation Seeking

Bivariate Fit of Alcohol Use By Life Change

0

5

10

15

20

25

Alc

oh

ol U

se

500 10001500200025003000

Life Change

Page 11: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Association

• Two variables measured on the same unit are associated if some values of one variable tend to occur more often with some values of the second variable than with other values of that variable.

• Two variables are positively associated when above average values of one tend to accompany above average values of the other and below-average values also tend to occur together.

• Two variables are negatively associated when above-average values of one accompany below-average values of the other, and vice versa.

Page 12: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Strength of association

• Strength of the association: Measure of how strong is the positive or negative association. Statistical associations are overall tendencies, not ironclad rules.

• If there is a strong association between two variables, then knowing one helps a lot in predicting the other. But when there is a weak association, information about one variable does not help much in guessing the other.

Page 13: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Association does not have to be linear or unidirectional

• Relationship between gas mileage per gallon and speed at which a car is driven:

Bivariate Fit of Mileage By Speed

5

10

15

20

25

30

35

40

Mile

ag

e

0 102030405060708090100110

Speed

                                                          

                                                                                            

Page 14: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Response and Explanatory Variable

• Response variable (Y) measures outcome of study.

• Explanatory variable (X) explains or causes change in the response variable.

• Y=gas mileage per gallon, X=speed at which car is driven.

• Response and explanatory variables in alcohol study?

Page 15: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Scatterplots

• A scatterplot shows the relationship between two quantitative variables measured on the same units. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each unit in the data appears as the point in the plot fixed by the values of both variables for that unit.

• Always plot the explanatory variable, if there is one, on the horizontal axis (the x axis of the scatterplot).

Page 16: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Scatterplots in JMP

• Click Analyze, Fit Y by X. Left click the response variable (so that it is highlighted) and then left click the Y, response button (so that it appears in the Y, response box). Similarly left click the explanatory variable and then left click the X, factor button. Click OK.

Page 17: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Examining a scatterplot

• Look for the overall pattern of the data and for striking deviations from that pattern.

• The overall pattern of a scatterplot can be described by the form, direction and strength of the relationship.

• An important kind of deviation is an outlier in terms of the direction of the scatterplot, a point that falls outside the overall pattern of the relationship.

Page 18: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

                                                                   

Page 19: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Brain size and body size in 96 mammals (mammalstudy.JMP)

Bivariate Fit of BRAIN By BODY

0

1000

2000

3000

4000

BR

AIN

Human beingDolphin

African elephant

Hippopotamus

0 50010001500200025003000

BODY

Page 20: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Labeling points in JMP

• To label a point in a scatterplot in JMP, put cursor in column that you want to use to name the point (species in the mammal study), then click Cols and then click Label. Then put cursor on the row you want to label, then click Rows and then click Label.

Page 21: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Association is not causation

• An association between what we call the response variable and what we call the explanatory variable does not prove that changes in the explanatory variable cause changes in the response variable.

• The relationship between two variables can be strongly influenced by other variables that are lurking in the background (lurking variables)

Page 22: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Bivariate Fi t of Sal ary of Presbyt erian Mi ni st ers in MA By Price of Rum

0

1 0 0 0 0

2 0 0 0 0

3 0 0 0 0

4 0 0 0 0

5 0 0 0 0S

ala

ry o

f Pre

sb

yte

rian

Min

iste

rs in

MA

1 9 9 8

1 9 8 2

1 9 5 41 9 2 61 8 8 6

0 2 .55 7 .51 01 2 .5P ric e o f R u m

Are the Presybterian ministers benefiting from the rum trade or supporting it?

Page 23: Class 1: Sept. 9 About instructor: Dylan Small, Assistant Professor, Department of Statistics. How I got interested in statistics?

Key Points from Lecture

• Association: Definition.

• Scatterplots: – How to examine them.– How to make them in JMP

• Association is not causation.

• Next class: 2.2 (correlation), begin 2.3 (least squares regression)