10
Lecture 0: Introduction Statistics 101 Mine C ¸ etinkaya-Rundel January 10, 2013 Syllabus & policies Logistics General Info Professor: Dr. Mine C ¸ etinkaya-Rundel - [email protected] Old Chemistry 213 Teaching Gary Larson - [email protected] Assistants: Yingbo Li - [email protected] Shaan Qamar - [email protected] Anthony Weishampel - [email protected] Lecture: Tuesdays and Thursdays, 1:25 - 2:40 at Soc Sci 136 Lab: Mondays at Old Chem 101 08:30am - 09:45am - Anthony 10:05am - 11:20am - Gary 11:45am - 01:00pm - Anthony 01:25pm - 02:40pm - Gary 03:05pm - 04:20pm - Gary Statistics 101 (Mine C ¸ etinkaya-Rundel) Lecture 0: Introduction January 10, 2013 1 / 39 Syllabus & policies Logistics Required materials Textbook OpenIntro Statistics Diez, Barr, C ¸ etinkaya-Rundel CreateSpace, 2 nd Edition, 2012 ISBN: 978-1478217206 Clicker i>clicker2. ISBN: 1429280476, available at the Duke textbook store, i>clicker website, or Ama- zon, used clickers from former students (see Google doc). Calculator (Optional) You might need a four function calcu- lator that can do square roots for this class. No limitation on the type of calculator you can use. Statistics 101 (Mine C ¸ etinkaya-Rundel) Lecture 0: Introduction January 10, 2013 2 / 39 Syllabus & policies Logistics Clicker registration http:// iclicker.com/ support/ registeryourclicker Statistics 101 (Mine C ¸ etinkaya-Rundel) Lecture 0: Introduction January 10, 2013 3 / 39

Lecture 0: Introduction - Duke University · Lecture 0: Introduction Statistics 101 ... ... Encourage you to complete the reading assignment prior

Embed Size (px)

Citation preview

Lecture 0: Introduction

Statistics 101

Mine Cetinkaya-Rundel

January 10, 2013

Syllabus & policies Logistics

General InfoProfessor: Dr. Mine Cetinkaya-Rundel - [email protected]

Old Chemistry 213

Teaching Gary Larson - [email protected]: Yingbo Li - [email protected]

Shaan Qamar - [email protected] Weishampel - [email protected]

Lecture: Tuesdays and Thursdays, 1:25 - 2:40 at Soc Sci 136

Lab: Mondays at Old Chem 101• 08:30am - 09:45am - Anthony• 10:05am - 11:20am - Gary• 11:45am - 01:00pm - Anthony• 01:25pm - 02:40pm - Gary• 03:05pm - 04:20pm - Gary

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 1 / 39

Syllabus & policies Logistics

Required materials

Textbook OpenIntro StatisticsDiez, Barr, Cetinkaya-RundelCreateSpace, 2nd Edition, 2012ISBN: 978-1478217206

Clicker i>clicker2. ISBN: 1429280476, available at theDuke textbook store, i>clicker website, or Ama-zon, used clickers from former students (seeGoogle doc).

Calculator (Optional) You might need a four function calcu-lator that can do square roots for this class. Nolimitation on the type of calculator you can use.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 2 / 39

Syllabus & policies Logistics

Clicker registration

http:// iclicker.com/ support/ registeryourclicker

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 3 / 39

Syllabus & policies Logistics

Webpage

http:// stat.duke.edu/ courses/ Spring13/ sta101.001

All announcements and assignments will be posted on this websiteunder the schedule tab.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 4 / 39

Syllabus & policies Logistics

Grading

- Clicker questions: 5%

- Problem sets: 7.5%

- Labs: 7.5%

- Readiness assessments: 15%(2/3 individual, 1/3 team)

- Project 1: 10%

- Project 2: 10%

- Midterm: 15%

- Final: 25%

- Peer evaluations: 5%

Grades curved at the end of the course after overall averageshave been calculated.

Average of 90-100 guaranteed A-.Average of 80-90 guaranteed B-.Average of 70-80 guaranteed C-.

The more evidence there is that the class has mastered thematerial, the more generous the curve will be.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 5 / 39

Syllabus & policies Details

Course goals & objectives

1 Recognize the importance of data collection, identify limitationsin data collection methods, and determine how they affect thescope of inference.

2 Use statistical software to summarize data numerically andvisually, and to perform data analysis.

3 Have a conceptual understanding of the unified nature ofstatistical inference.

4 Apply estimation and testing methods to analyze single variablesor the relationship between two variables in order to understandnatural phenomena and make data-based decisions.

5 Model numerical response variables using a single explanatoryvariable or multiple explanatory variables in order to investigaterelationships between variables.

6 Interpret results correctly, effectively, and in context withoutrelying on statistical jargon.

7 Critique data-based claims and evaluate data-based decisions.8 Complete two research projects: one that employs simple

statistical inference and another that employs more advancedmodeling techniques.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 6 / 39

Syllabus & policies Details

Units and major topics

Unit 1 Introduction to data: Observational studies and non-causalinference, principles of experimental design and causalinference, exploratory data analysis: description, summary andvisualization, introduction to statistical inference.

Unit 2 Probability and distributions: The basics of probability andchance processes, Bayesian perspective in statistical inference,the normal distribution.

Unit 3 Framework for inference: Central Limit Theorem and samplingdistributions

Unit 4 Statistical inference for numerical variables

Unit 5 Statistical inference for categorical variables

Unit 6 Simple linear regression: Bivariate correlation and causality,introduction to modeling

Unit 7 Multiple linear regression: More advanced modeling

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 7 / 39

Syllabus & policies Details

Course structure

Seven learning units.

Set of learning objectives and required and suggested readings,videos, etc. for each unit.

Prior to beginning the unit, complete the readings and familiarizeyourselves with the learning objectives.

Begin a new unit with a readiness assessment: individual, thenteam.

Tuesdays and Thursdays: Split rest of the class time betweenlecture (supplemented with active participation and peerinstruction via clickers) and team application exercises.

Mondays: Complete lab assignments in teams.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 8 / 39

Syllabus & policies Details

Teams

Assigned to teams of 4-5 students based on data from thesurvey and the pre-test.

Teams are heterogeneous with respect to stats exposure andhomogenous with respect to majors and/or interests - to theextent that it’s possible.

Once team assignments have been made there is no option forchanging teams, other than under extraordinary circumstances.

Six peer evaluations throughout the semester as well as othermeasures to ensure the functionality of the teams and to makesure all team members contribute to the team work.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 9 / 39

Syllabus & policies Details

Lectures

Lecture slides will be posted on the course webpage (underschedule) by noon the day of the course.

In order to be able to keep up with the pace of the course andnot fall behind you must attend the lectures.

Introduction of concepts as well as hands on activities andexercises to complement them.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 10 / 39

Syllabus & policies Details

Clicker questions

Objective: Make you an active participant and help me pace the class.

On new material introduced in class that day.

Credit for clicking in, regardless of whether you have the correctanswer (must answer at least 75% of the questions that day).

Up to two unexcused late arrivals or absences will not affect yourclicker grade.

If one person is simultaneously using two or more clickers, allstudents involved will receive a 0 for an overall clicker grade.

Grading will start on January 22.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 11 / 39

Syllabus & policies Details

Problem sets

Objective: Help you develop a more in-depth understanding of thematerial and help you prepare for exams and projects.

Questions from the textbook.

Due at the beginning of class on the due date.

Show all your work to receive credit.

Welcomed and encouraged to work with others, but turn in yourown work.

Lowest score will be dropped.

No make-ups.

Excused absences do not excuse homework.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 12 / 39

Syllabus & policies Details

Labs

Objective: Give you hands on experience with data analysis using astatistical software and provide you with tools for the projects.

http:// beta.rstudio.org

Add your gmail address to Google doc by 5pm today to create anRStudio account.

Complete in teams.

Lowest lab score will be dropped.

If you do not attend a lab section, you are not eligible for crediton that lab.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 13 / 39

Syllabus & policies Details

Projects

Objective: Give you independent applied research experience usingreal data and statistical methods.

Project 1:individualstatistical inference exploring the distributional characteristics ofone variable or relationship between two variableschoose a research question, find data, analyze it, write up yourresults

Project 2: in teams, presentation, multiple linear regression,more info later

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 14 / 39

Syllabus & policies Details

Readiness assessments

Objective: Encourage you to complete the reading assignment priorto coming to class and evaluate your conceptual understanding of thelearning objectives.

10 multiple choice questions, at the beginning of a unit.

Conceptual questions addressing the learning objectives of thenew unit, assessing familiarity and reasoning, not mastery.

Take the individual readiness assessment using your clickers,and then re-take the same assessment in teams.

Your performance on both assessments factors into your finalgrade: score for each assessment is a weighted average of theindividual (2/3) and team (1/3) scores.

First readiness assessment next Tuesday, for practice, notgraded.

6 graded readiness assessments, lowest score will be dropped.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 15 / 39

Syllabus & policies Details

Exams

Midterm: Thursday, February 21

Final: Saturday, May 4, 2-5pm (Cumulative)

Exam dates cannot be changed. No make-up exams will begiven. If you cannot take the exams on these dates you shoulddrop this class.

You must bring a calculator to the exams (no cell phones, iPods,etc.) and you are also allowed to bring one sheet of notes(“cheat sheet”). This sheet must be no larger than 81

2” × 11” andmust be prepared by you (no photocopies). You may use bothsides of the sheet.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 16 / 39

Syllabus & policies Details

Work load

You are expected to put in 4-6 hours of work outside of class. Someof you will do well with less time than this, and some of you will needmore.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 17 / 39

Syllabus & policies Support

Email

I will regularly send announcements by email, so make sure tocheck your email daily.

While email is the quickest way to reach me outside of class, it ismuch more efficient to answer most statistical questions inperson.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 18 / 39

Syllabus & policies Support

Discussion Forum on Sakai

Any non-personal questions related to the material covered inclass, problem sets, labs, projects, etc. should be posted on theDiscussion Forum on Sakai.

Before posting a new question please make sure to check if yourquestion has already been answered.

The TAs and myself will be answering questions on the forumdaily and all students are expected to answer questions as well.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 19 / 39

Syllabus & policies Support

Office hours

Professor Mondays and Wednesdays 2pm - 4pm

TAs at theSECC

Sunday - Thursday 4pm - 9pm (Old Chemistry211A)The statistics education center has upper level statis-tics students available to help you. For more informa-tion and a schedule see http:// stat.duke.edu/ courses/resources-students .

You are highly encouraged to stop by with any questions orcomments about the class, or just to say hi and introduceyourself.

Most problem sets due on Thursday. Recommend attempting allproblems by Wednesday to make the most of OH.

Specific TA office hours TBA.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 20 / 39

Syllabus & policies Support

Other learning resources

Aside from the TAs and the professor’s office hours, you can alsomake use of the Academic Resource Center. For more information,see http:// web.duke.edu/ arc .

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 21 / 39

Syllabus & policies Support

Students with disabilities

Students with disabilities who believe they may needaccommodations in this class are encouraged to contact the StudentDisability Access Office at (919) 668-1267 as soon as possible tobetter ensure that such accommodations can be made.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 22 / 39

Syllabus & policies Policies

Policies I

Late work policy for problem sets and labs reports:

late but during class: lose10% of pointsafter class on due date: lose20% of points

next day: lose 30% of points

later than next day: lose allpoints

Late work policy for projects: 10% off for each day late.There will be no make-up for clicker questions, individual andteam readiness assessments, labs, problem sets, projects, orexams.If a readiness assessment or the midterm exam must be missed,absence must be officially excused in advance, in which case themissing exam score will be imputed using the final exam score.Missed assessments not excused in advance will receive agrade of 0.The final exam must be taken at the stated time.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 23 / 39

Syllabus & policies Policies

Policies II

You must take the final exam and turn in the two projects in orderto pass this course.Regrade requests must be made within one week of when theassignment is returned, and must be submitted in writing.

These will be honored if points were tallied incorrectly, or if youfeel your answer is correct but it was marked wrong.No regrade will be made to alter the number of points deductedfor a mistake.There will be no grade changes after the final exam.

Clickers may not be shared, and the clicker registered to aperson may only be used by that person. Failure to abide by thiswill result in a 0 clicker grade for everyone involved.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 24 / 39

Syllabus & policies Policies

Academic Dishonesty

Any form of academic dishonesty will result in an immediate 0 on thegiven assignment and will be reported to the Office of StudentConduct. Additional penalties may also be assessed if deemedappropriate. If you have any questions about whether something is oris not allowed, ask me beforehand.

Some examples:

Use of disallowed materials (including any form ofcommunication with classmates or looking at a classmateOswork) during exams.

Plagiarism of any kind.

Use of outside answer keys or solution manuals for thehomework.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 25 / 39

Syllabus & policies Tips

Tips for success

1 Complete the reading before a new unit begins, and then reviewagain after the unit is over.

2 Be an active participant during lectures and labs.3 Ask questions - during class or office hours, or by email. Ask me,

the TAs, and your classmates.4 Do the problem sets - start early and make sure you attempt and

understand all questions.5 Start your projects early and and allow adequate time to

complete them.6 Give yourself plenty of time time to prepare a good cheat sheet

for exams. This requires going through the material and takingthe time to review the concepts that you’re not comfortable with.

7 Do not procrastinate - don’t let a week go by with unansweredquestions as it will just make the following week’s material evenmore difficult to follow.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 26 / 39

To do

To do

Get an i>clicker2 and register it. If you have previously bought ani>clicker1 for this course and cannot return it, see me after classor at OH tomorrow.

Download or purchase the textbook.If you missed lab yesterday:

Complete the getting to know you survey on Sakai.Complete the pretest.Add your Gmail address to the Google Doc

Read the syllabus and let me know if you have any questions.

Start reviewing the resources for Unit 1 - readiness assessmentnext Tuesday.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 27 / 39

Statistics and the Scientific Method

Statistics and the Scientific Method

From Universe Today - http:// www.universetoday.com/ 74036/ what-are-the-steps-of-the-scientific-method/

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 28 / 39

Examples Baby names

Baby names in the US

Each year the Social Security Administration collects andreleases data on the how many babies are given a certain name.

They released these data for years 1880 to 2011 for eachgender.

For privacy reasons they restrict the list of names to those with atleast 5 occurrences.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 29 / 39

Examples Baby names

Top 10 Baby Names For 2011

http:// www.ssa.gov/ oact/ babynames

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 30 / 39

Examples Baby names

Jac...

http:// www.babynamewizard.com

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 31 / 39

Examples Baby names

Clicker question

Which of the below is the most common name in this class?

(a) Andrew

(b) William

(c) Kevin

(d) Rachel

(e) Grace

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 32 / 39

Examples Geotagged data

Clicker question

Do you geotag your posts on social networking sites, like Facebook,Twitter, Instagram, etc.?

(a) yes

(b) no

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 33 / 39

Examples Geotagged data

Map based on Flickr tags

http:// aaronstraupcope.com

Red: Tourists

Blue: Locals

Yellow: Either

http:// www.flickr.com/ photos/

walkingsf/ 4671594023/ in/

set-72157624209158632/

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 34 / 39

Examples 538

The most famous statistician in the world

Source: http:// fivethirtyeight.blogs.nytimes.com

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 35 / 39

Examples Links to blogs

Links to blogs

http:// stat.duke.edu/ courses/ Spring13/ sta101.001/ links.html

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 36 / 39

Why study statistics

[...], the study also warnsthere is a significant shortageof qualified workers to analyzethese data sets adequately.According to the report, ashortfall of about 140,000 to190,000 individuals withanalytical expertise isprojected by 2018. The studyalso predicts a need for anadditional 1.5 millionmanagers and analysts bythat same date to fully engagethe true potential of thecurrently available data.

http:// jobs.aol.com/ articles/ 2011/ 08/ 10/ data-scientist-the-hottest- job-you-havent-heard-of

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 37 / 39

Why study statistics

http:// www.dailymail.co.uk/ news/ article-2023514

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 38 / 39

Data collection

What do you want to know?

We’ll do a class survey, collecting data you are interested in.What do you want to know about your peers?

Is this a question about one variable or two variables?What are the variables?Are they categorical or numerical?

Work in groups to write a question to measure variable(s) ofinterest. Write questions so the resulting data will be accurateand easy to analyze.

Numerical variable? Give units.Categorical variable? Give the possible categories (at most 5).Be clear and specific.

I will email with instructions to fill out an anonymous onlinesurvey.

Statistics 101 (Mine Cetinkaya-Rundel) Lecture 0: Introduction January 10, 2013 39 / 39