Upload
hatruc
View
216
Download
2
Embed Size (px)
Citation preview
8/28/2012
1
Statistics: Unlocking the Power of Data Lock5
STAT 101: Data Analysis and
Statistical Inference
Professor Kari Lock Morgan
Statistics: Unlocking the Power of Data Lock5
Course Website: http://stat.duke.edu/courses/Fall12/sta101.002/
Sakai: https://sakai.duke.edu/portal/site/STAT101_Fall12
Course Website
Syllabus
Statistics: Unlocking the Power of Data Lock5
Lecture Slides
Lecture slides will be posted on the course website
I’ll try to post them the night before, so you can print them out if you want
The slides posted the night before will NOT be the complete version (I want you to think during class, so won’t give you all the answers). Complete slides will be posted after class.
Statistics: Unlocking the Power of Data Lock5
Textbook
Statistics: Unlocking the Power of Data
by Lock, Lock, Lock Morgan, Lock, and Lock
(Not published yet – you get an advance custom version)
Statistics: Unlocking the Power of Data Lock5
Other Course Materials
Clicker: i>clicker
Available at the bookstore, Amazon, or from
previous students at this google doc
Clicker grading will begin on 9/10
Register your clicker at
http://www.iclicker.com/support/registeryourclicker/
for Student ID use your NetID.
Calculator
must support normal, t, 2, and F distributions
Statistics: Unlocking the Power of Data Lock5
Support
My Office Hours: (in Old Chemistry 216) 2 – 4 pm Monday 1 – 2:30 pm Thursday
Statistics Education Center: (in Old Chem 211A) 4 – 9 pm Sunday – Thursday in Old Chem 211A
Email: Email your TA or [email protected] NOTE: I will only reliably reply to email 4:30 – 5:30pm
on weekdays.
8/28/2012
2
Statistics: Unlocking the Power of Data Lock5
Grade Breakdown
Labs 5%
Clicker Questions 10%
Homework 10%
Projects (2 10%) 20%
Midterm Exams (2 15%) 30%
Final Exam 25%
Grades ≥ 90 are guaranteed at least an A- Grades ≥ 80 are guaranteed at least a B- Grades ≥ 70 are guaranteed at least a C- Grades ≥ 60 are guaranteed at least a D-
Statistics: Unlocking the Power of Data Lock5
Labs
Labs are on Monday in Old Chem 01
The goal of labs is to familiarize you with statistical software, and to give you guided experience analyzing data
Labs will be group based
If you missed lab yesterday, make sure to work through it on your own by Thursday (it’s on the course website)
Statistics: Unlocking the Power of Data Lock5
Clickers Clicker grade will be split equally between:
Review “Quiz” Questions: Credit only for answering correctly
Goal: motivate you to keep up with the material
Review questions will usually happen at the beginning of class – it is your responsibility to arrive on time.
New Questions: Credit simply for clicking in
Goal: motivate you to think actively about new material as it is being presented
Statistics: Unlocking the Power of Data Lock5
Class Year
What is your class year?
(a) First-year
(b) Sophomore
(c) Junior
(d) Senior
Statistics: Unlocking the Power of Data Lock5
Major
Your primary major (or potential future major) best falls under the category…
(a) Natural Sciences
(b) Arts and Humanities
(c) Social Sciences
(d) Math/Statistics/CS
(e) Other
Statistics: Unlocking the Power of Data Lock5
Homework Weekly homework due, usually on Tuesdays
Point of homework: to LEARN! to make sure you are keeping up with the material to prepare you for projects and exams
Graded problems and practice problems
Grading Graded on a 10 point scale Lowest homework grade dropped Penalties for late homework
8/28/2012
3
Statistics: Unlocking the Power of Data Lock5
Projects
Project 1 individual
confidence intervals, hypothesis tests
written report up to 5 pages in length
Project 2 with your lab group
regression
10 minute presentation
written report up to 10 pages in length
Statistics: Unlocking the Power of Data Lock5
Exams
Midterm Exams Thursday, October 11, in class Thursday, November 15, in class
Final Tuesday, December 11th, 2 – 5 pm
Exams are mandatory and cannot be made-up
Statistics: Unlocking the Power of Data Lock5
Keys to Success
Come to class
Come to lab
Do the homework
Read the textbook
Do lots of practice problems
Stay on top of the material
Statistics: Unlocking the Power of Data Lock5
8/28/12
Introduction to Data
SECTION 1.1 • Data • Cases and variables • Categorical and quantitative variables • Using data to answer a question
Statistics: Unlocking the Power of Data Lock5
Why Statistics?
Statistics is all about DATA
Collecting DATA
Describing DATA – summarizing, visualizing
Analyzing DATA
Data are everywhere! Regardless of your field, interests, lifestyle, etc., you will almost definitely have to make decisions based on data, or evaluate decisions someone else has made based on data
Statistics: Unlocking the Power of Data Lock5
Data
Data are a set of measurements taken on a set of individual units
Usually data is stored and presented in a dataset, comprised of variables measured on cases
8/28/2012
4
Statistics: Unlocking the Power of Data Lock5
Cases and Variables
We obtain information about cases or units.
A variable is any characteristic that is recorded for each case.
Generally each case makes up a row in a dataset, and each variable makes up a column
Statistics: Unlocking the Power of Data Lock5
Countries of the World
Country Land Area Population Rural Health Internet
Birth Rate
Life Expectancy HIV
Afghanistan 652230 29021099 76 3.7 1.7 46.5 43.9
Albania 27400 3143291 53.3 8.2 23.9 14.6 76.6
Algeria 2381740 34373426 34.8 10.6 10.2 20.8 72.4 0.1 American Samoa 200 66107 7.7
Andorra 470 83810 11.1 21.3 70.5 10.4
Angola 1246700 18020668 43.3 6.8 3.1 42.9 47 2
Antigua and Barbuda 440 86634 69.5 11 75
Argentina 2736690 39882980 8 13.7 28.1 17.3 75.3 0.5
Statistics: Unlocking the Power of Data Lock5
Intro Statistics Survey Data
Statistics: Unlocking the Power of Data Lock5
Diet Coke and Calcium Drink Calcium Excreted
Diet cola 50
Diet cola 62
Diet cola 48
Diet cola 55
Diet cola 58
Diet cola 61
Diet cola 58
Diet cola 56
Water 48
Water 46
Water 54
Water 45
Water 53
Water 46
Water 53
Water 48
Statistics: Unlocking the Power of Data Lock5
Data US News and World Report National University Rankings
Stock Market
Duke Basketball
Obama versus Romney
Unemployment Rate
Hybrid Cars
Antidepressants and Alzheimer’s
Statistics: Unlocking the Power of Data Lock5
Data Applicable to You
Think of a potential dataset (it doesn’t have to actually exist) that you would be interested in analyzing
What are the cases?
What are the variables?
What interesting questions could it help you answer?
8/28/2012
5
Statistics: Unlocking the Power of Data Lock5
Kidney Cancer
Source: Gelman et. al. Bayesian Data Anaylsis, CRC Press, 2004.
Counties with the highest kidney cancer death rates
Statistics: Unlocking the Power of Data Lock5
Kidney Cancer
Counties with the lowest kidney cancer death rates
Source: Gelman et. al. Bayesian Data Anaylsis, CRC Press, 2004.
Statistics: Unlocking the Power of Data Lock5
Kidney Cancer
If the values in the kidney cancer dataset are rates of kidney cancer deaths, then what are the cases?
(a) The people living in the US
(b) The counties of the US
A person either has kidney cancer or doesn’t… a rate must apply to a group of people, such as a county
Statistics: Unlocking the Power of Data Lock5
Kidney Cancer
If the values in the kidney cancer dataset are yes/no, then what are the cases?
(a) The people living in the US
(b) The counties of the US
A person either has kidney cancer or doesn’t. Yes/no doesn’t make sense for a county.
Statistics: Unlocking the Power of Data Lock5
Categorical versus Quantitative
• A categorical variable divides the cases into groups
• A quantitative variable measures a numerical quantity for each case
Variables are classified as either categorical or quantitative:
Statistics: Unlocking the Power of Data Lock5
Categorical Quantitative
8/28/2012
6
Statistics: Unlocking the Power of Data Lock5
Kidney Cancer
If the cases in the kidney cancer dataset are counties, then the measured variable is…
(a) Categorical
(b) Quantitative
Rates are numbers (quantitative).
Statistics: Unlocking the Power of Data Lock5
Kidney Cancer
If the cases in the kidney cancer dataset are people, then the measured variable is…
(a) Categorical
(b) Quantitative
Either having kidney cancer or not is categorical.
Statistics: Unlocking the Power of Data Lock5
Variables
For each of the following situations: What are the variables? Is each variable categorical or quantitative?
1. Can eating a yogurt a day cause you to lose weight?
2. Do males find females more attractive if they wear red?
3. Does louder music cause people to drink more beer?
4. Are lions more likely to attack after a full moon?
(the answer to all of these questions is yes!)
Statistics: Unlocking the Power of Data Lock5
Let’s Collect Some Data!
QUESTION: If you are romantically interested in someone, should you be obvious about it, or should you play hard to get?
Using Data to Answer a Question
Statistics: Unlocking the Power of Data Lock5
Romance
What type of person are you generally more romantically interested in?
(a) Someone who is obviously into you
(b) Someone who plays heard to get
Statistics: Unlocking the Power of Data Lock5
Romance
MALES ONLY: What type of person are you generally more romantically interested in?
(a) Someone who is obviously into you
(b) Someone who plays heard to get
8/28/2012
7
Statistics: Unlocking the Power of Data Lock5
Romance
FEMALES ONLY: What type of person are you generally more romantically interested in?
(a) Someone who is obviously into you
(b) Someone who plays heard to get
Statistics: Unlocking the Power of Data Lock5
One or Two Variables
Sometimes we are interested in one variable, as in whether people prefer obvious romantic interest or hard to get
Other times we are interested in the relationship between two variables, such as
1) prefer obvious interest or hard to get?
2) gender
Statistics: Unlocking the Power of Data Lock5
What do you want to know?
We’ll do a class survey, collecting data you are interested in.
What do you want to know about your peers?
Is this a question about one variable or two variables?
What are the variables?
Are they categorical or quantitative?
Statistics: Unlocking the Power of Data Lock5
What do you want to know?
Write a question to measure each variable of interest. Write questions so the resulting data will be accurate and easy to analyze.
Quantitative variable? Give units.
Categorical variable? Give the possible categories (no more than 5).
Be clear and specific.
Statistics: Unlocking the Power of Data Lock5
Summary
Data are everywhere, and pertain to a wide variety of topics
A dataset is usually comprised of variables measured on cases
Variables are either categorical or quantitative
Data can be used to provide information about essentially anything we are interested in and want to collect data on!
Statistics: Unlocking the Power of Data Lock5
To Do
Read Section 1.1
If you haven’t already…
Get the textbook
Get a clicker and register it
Do Lab 0 by Thursday