23
Chris Morgan, MATH G160 [email protected] April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason): Statistical Applications and Types of Data

Chris Morgan, MATH G160 [email protected] April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Embed Size (px)

Citation preview

Page 1: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Chris Morgan, MATH [email protected]

April 6, 2012Lecture 27

Chapter 1 (and 7.8 for some reason):

Statistical Applications and Types of Data

Page 2: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

2C. Morgan, STAT 225, Fall 2011

Page 3: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

A few definitions:

Data – measurements from which information and knowledge are derived, facts and figures collected, analyzed, and summarized

Data Set – a collection of data, usually put in table form

Element – a single cell in a dataset

Observation – a subject on which data is being collected, makes up the rows of a dataset

Variable – any characteristic of an observation, makes up the columns of a dataset

3C. Morgan, STAT 225, Fall 2011

Page 4: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

An example of a data set:

University In-State Tution University

Out-State Tuition

Iowa $14,828 Minnesota $24,245

Indiana $16,298 Iowa $30,202

Wisconsin $16,667 Wisconsin $31,927

Purdue $18,190 Ohio State $33,768

Michigan State $19,542 Indiana $34,958

Ohio State $19,584 Purdue $35,742

Minnesota $19,864 Penn State $35,960

Michigan $21,029 Michigan State $37,442

Illinois $23,372 Illinois $37,514

Penn State $24,096 Michigan $45,193

4C. Morgan, STAT 225, Fall 2011

Page 5: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Types of Data (part I):

Quantitative (continuous)

- can be measured (length, volume, weight, cost, etc)

- intervals, ratios, percentages

- differences in intervals do not have “natural” zeros

- differences in ratios do have “natural” zeros

Qualitative (categorical)

- is observed not measured (beauty, taste, texture, smell, color, etc)

- labels or names used to identify an attribute of each element

- nominal: order does not matter (gender, religion, race)

- ordinal: order does matter (class year, pain rating, salsa hotness)

5C. Morgan, STAT 225, Fall 2011

Page 6: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Quantitative

- Height of wheat thin: 1’ ¼’’

- Weight of wheat thin: 1.06 oz

- 22 servings per container

- 11 wheat thins = 1 serving size

Qualitative

- Yellow Box and brown wheat thins

- texture is smooth and yet slightly bumpy

- Chris is obsessed with them

- incredibly delicious!

Page 7: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

What type of variable is… (qualitative or quantitative)

- GPA

- The amount of goodness in every wheat thin

- Time it takes to run a mile

- How many wheat thins I can stuff in my mouth at once

- Smoking status

- Income

- The number of places you’d rather be than here

7C. Morgan, STAT 225, Fall 2011

Page 8: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Types of Data (part II):

Cross-sectional data

- observes many objects at one time

- eg. How many wheat thins each of you can eat at once

- eg. Number of people who fall asleep today in class

- eg. The classes opinion on best ice cream flavor

- eg. Your height today

Time series

- observes one subject or many subjects over time

- eg. Average amount of wheat thins each of you can eat every week

- eg. Number of students who fall asleep at least once this semester

- eg. Student’s test scores over the semester

- eg. Your height from age 7 - 22

8C. Morgan, STAT 225, Fall 2011

Page 9: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Data Collection

• Existing Sources

• Surveys

• Observational Studies

• Experiments

9C. Morgan, STAT 225, Fall 2011

Page 10: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Existing Sources

- Look at what others have already collected- many people and companies already have existing databases:

• www.census.gov• www.swivel.com• www.who.org• www.cdc.gov

Surveys

- go out and ask people for their opinion- ask people for information

10C. Morgan, STAT 225, Fall 2011

Page 11: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Observational Studies

- Watch subjects over time and record results

- Comparing sales of different grocery stores in West Lafayette (simply observing their sales records and are not applying a treatment to any group)

- Look up past data and analyze outcomes

Experiments

- design a study to answer specific questions- set up specific treatment to see if there are any outcomes- have a control group- random samples

11C. Morgan, STAT 225, Fall 2011

Page 12: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Statistical Inferences

• Population: the set of all elements of interest in a particularstudy• Sample: a subset of the population• Census: the process of conducting a survey to collect data for the entire population• Sample Survey: the process of conducting a survey to collect data for a sample

Why sample? Logistics, cost, limitations, etc…

Statistical Inference: Using data from a sample to estimate the characteristic of a population

12C. Morgan, STAT 225, Fall 2011

Page 13: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

• Take a census by counting the number of “e”s in the given paragraph.

• Take a sample by randomly selecting a line and counting the number of “e”s and then

multiplying by the number of lines in the paragraph.

• How close are we?

Statistical Inference Example:

Page 14: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Statistical Inference Example:

Elegant, extravagant elephants entertain every evening at seven. They serve escargot and eggs benedict. Eight elderly elegant elephants elevate themselves to the expensive entrance with elevators exceeding expectations. Eating everything edible, elephants expand exponentially. “Excellent!” the entertained elephants express after the entertaining entrees were served. Everything was expedited by the energetic efforts of the executive elephant empress. Everyone was entertained to excess and enjoyed the edible endeavors immensely. The evening ended enchantedly with Echinacea herbal tea.

Page 15: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Statistical Inference Example:

• Total “e” count: 126• I randomly chose line #3 with an “e” count of 12

–12x12=144• I randomly chose line #10 with an “e” count of 11

–11x12=132

Page 16: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Sampling Methods

• Stratified Sampling

• Cluster Sampling

• Systematic Sampling

• Convenience

Sampling

• Judgment Sampling

16C. Morgan, STAT 225, Fall 2011

Page 17: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Sampling Methods – Simple Random Sampling (SRS)

• Finite population: A sample of size n from a finite population

of size N is selected such that each possible sample of size n

has the same probability of being selected.

• Infinite population: A sample is selected from a population in

such a way that each element has the same probability of

being selected.

• Sampling With Replacement: Elements are put back in the

population after being selected for

• Sampling Without Replacement: Elements are not replaced

after being selected and are therefore only chosen once to be

in a sample.

Page 18: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Sampling Methods – SRS example

Say I want to take a sample of NFL football teams

1. make a list of all the teams 2. randomly select 8 teams

without replacement: select one team at a time and then remove the chosen team from the listwith replacement: select one team at a time, but do not remove the chosen team from the list

Page 19: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Stratified Sampling

- Divides population into groups called strata

- Takes a simple random sample (SRS) from each strata

- Divide students into class year and take a random sample from

eachCluster Sampling

- divides population into groups called clusters- takes a SRS of clusters- each element in the group is a part of the sample

Systematic Sampling

- number the units in the population from 1 to N, decide on the

n (sample size) that you want or need- set k = N/n first, one of the first k elements is selected

and then every kth element thereafter is selected.

19C. Morgan, STAT 225, Fall 2011

Page 20: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Convenience Sampling

- Easiest sampling method, usually cheapest and easiest to

implement

- Fliers on campus for people to participate in surveys or other

studies

- choosing a random box of wheat thins to determine quality

instead of sampling from twenty boxes

- Not supported as a probability sampleJudgement Sampling

- not scientific at all- based on one sampler’s opinion- does this one sample (one observation in this case) represent

the whole of the population? Why or why not?

20C. Morgan, STAT 225, Fall 2011

Page 21: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Sampling Methods: example

• Convenience Sample: select subjects 1-4• Stratified Random Sample: divide the 20 subjects into 4 non-overlapping groups each has 5 subjects, choose 1 subject from every group• Cluster Sample: divide the 20 subjects into 10non-overlapping groups each has 2 subjects, randomly choose 2 of these groups, those subject in the 2 chosen groups are selected in the sample• Systematic Sample: Randomly choose 1 from the first 5subjects, for example 4, then choose 4, 9, 14, 19 in the sample

21C. Morgan, STAT 225, Fall 2011

Page 22: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Bias

Bias is any deviation of your expected result of the survey from the true population

Sources of bias include:

- poorly worded questions

- bad communication

- sensitive questions that some may not want to answer, or answer incorrectly

- entry error (human error)

22C. Morgan, STAT 225, Fall 2011

Page 23: Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data

Avoiding Bias• Confusing wording? – If you have to read it more than once to understand what its saying• Asking something no one would remember? – What were you doing between 8 and 8:15 on Tuesday November 5th

2005• Leading the question to a certain answer – Would you advocate a recycling plan that would help reduce landfill

mass? – Would you pass a bill outlawing the shipment of oil from Alaska to Russia due to the large death rate of the baby seals?• Something really embarrassing that they wouldn’t answer honestly – Do you always wash your hands after using the restroom? – Have you ever cheated on a test? – Have you ever done drugs?• Date sensitive question – How safe do you feel at Purdue University (what if this was asked right after the Virginia Tech shootings?)

23C. Morgan, STAT 225, Fall 2011