21
Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Embed Size (px)

Citation preview

Page 1: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Chapter 7

An Overview of Statistical Inference –

Learning from Data

Created by Kathy Fritz

Page 2: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Statistical Inference

What You Can Learn from Data

Page 3: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

With the increasing popularity of online dating services, the

truthfulness of information in the personal profiles by users is a topic of interest.

A study was designed to investigate misrepresentation of personal characteristics. The researchers hoped to answer three questions:

1. What proportion of online daters believe they have misrepresented themselves in an online profile?

2. What proportion of online daters believe that others frequently misrepresent themselves?

3. Are people who place a greater importance on developing a long-term, face-to-face relationship more honest in their online profiles?

The first two of these questions are estimation problems because they involve using sample data to learn

something about a population characteristic.

The third question is a hypothesis testing problem because it involves determining if sample data support a claim about the

population of online daters.

Page 4: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Learning from Sample Data

When you obtain information from a sample selected from some population, it is usually because

• you want to learn something about characteristics of the population.

OR

• you want to use sample data to decide whether there is support for some claim or statement about the population.

An estimation problem involves using sample data to estimate the value of a

population characteristic.

A hypothesis testing problem involves using sample data to test a claim about a

population.

Methods for estimation and hypothesis testing are called statistical inference methods because they

involve generalizing (making an inference) from a sample to the population from which the sample was

selected.

Page 5: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Learning from Data When There Are Two or More PopulationsSometimes sample data are obtained from two or more populations of interest, and the goal is to learn about differences between the populations. Consider the following example:

College student spend a lot of time online, but do members of Facebook spend more time online than non-members?Data was collected from two samples of college students; one consisting of Facebook members and the other consisting of non-members.One of the variables studied was the amount of time spent on the Internet in a typical day.Based on the resulting data, it was concluded that there was no support for the claim that the mean time spent online for Facebook members was greater than the mean time for non-members.

This study involves generalizing from samples, and it is a hypothesis testing problem because it involves testing a claim about the difference

between the two groups.

Page 6: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Learning from Experimental DataStatistical inference methods are also used to learn from experiment data. When data are obtained from an experiment, it is usually because

• you want to learn about the effect of the different experimental conditions (treatments) on the measured response.

OR

• you want to determine if experiment data provide support for a claim about how the effects of two or more treatments differ.

This is a hypothesis testing problem because it involves testing a claim

(hypothesis) about treatment effects.

This is an estimation problem because it involves using sample data to estimate a characteristic of the treatments, such as

the mean response for a treatment.

Page 7: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Do U Smoke After Txt?

Researchers in New Zealand investigated whether mobile phone text messaging could be used to help people stop smoking? An experiment was designed to compare two treatments.

Subjects for the experiment were 1705 smokers who were older than 15 years and owned a mobile phone and who wanted to quit smoking.

People in the first group received personalized text messages providing support and advice on stopping smoking.

The second group was a control group, and people in this group did not receive any of these text messages.

After 6 weeks, each person participating in the study was contacted and asked if he or

she had smoked during the previous week.

Data from the experiment were used to estimate the difference in the proportion who had quit for those who received the text messages and those who did not.

Researchers estimated that the proportion of those who successfully quit smoking was

greater by 0.15 for those who received text messages.

Page 8: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Statistical Inference Involves Risk

The risks associated with statistical inference arise because you are attempting to draw conclusions on the basis of data that provide partial rather than complete information.

In estimation problems . . .

RISK – these estimates

may be inaccurate

Understand that the method used to produce the estimates and accompanying measures of accuracy might mislead

Page 9: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Statistical Inference Involves Risk

The risks associated with statistical inference arise because you are attempting to draw conclusions on the basis of data that provide partial rather than complete information.

In hypothesis testing situations . . .

RISK – an inaccurate

conclusion

Understand how likely it is that the method used to decide whether or not a claim is supported might lead to an incorrect decision

Page 10: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Variability in Data

Suppose we wanted to estimate the mean length of fish in a large lake. We could catch a sample of 20 fish from the lake.One sample may have a symmetric distribution like this.

Another sample may have a skewed distribution like this . . .

. . . or like this.

When there is variability in the population, you need to consider whether this partial picture (the sample)

is representative of the population.

This sample-to-sample variability should be considered when you assess the risk associated with

drawing conclusions about the population from sample data.

Page 11: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Variability in Data

An experiment might be designed to determine if noise level has an effect on the time required to perform a task requiring concentration.

There are 20 individuals available to serve as subjects in this experiment with two treatment conditions (quiet environment and noisy environment).

The response variable is the time required to complete the task.

If noise level has NO effect on completion time, the time observed for each of the 20 subjects would be the same whether they are in the quiet group or the noisy group.

Any observed differences in the completion times for the two treatments would NOT be due to noise level, but to person-to-person variability and the random assignment of subjects to treatments.

You must understand how differences might result from variability in the response and the

random assignment to treatment groups in order to distinguish them from differences

created by a treatment effect.

vs.

Page 12: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Selecting an Appropriate Method

Four Key Questions

Page 13: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Four Key Questions

Question Type (Q): Is the question you are trying to answer an estimation problem or a hypothesis testing problem?

Study Type (S): Does the situation involve generalizing from a sample to learn about the population (an observational study or survey) OR does it involve generalizing from an experiment to learn about treatment effects?

In the following chapters, you will encounter different types of inference problems. The

answer to the following questions will lead you to a suggested method to use.

You will choose different methods depending on the answer to this question.

The answer to this question affects the choice of the method as well as the type of

conclusion that can be drawn.

Page 14: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Are the data categorical or numerical?

Four Key Questions Continued . . .Type of Date (T): What type of data will be used to answer the question? Is the data set univariate (one variable) or bivariate (two variables)?

Univariate versus Bivariate

A study was performed to learn how

the proportion with a TV in the

bedroom differed for children in two

age groups.

The study of deception in online dating profiles investigated whether people who place a greater importance on developing a long-term face-to-face relationship are more honest in their online profiles.

Identify whether these examples involve univariate or bivariate data. Explain your

choice.

Page 15: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Are the data categorical or numerical?

Four Key Questions Continued . . .Type of Date (T): What type of data will be used to answer the question?

If you have a single variable and the data are categorical, the question of interest is probably about a population proportion.

Is the data set univariate (one variable) or bivariate (two variables)?

Categorical versus Numerical

If the data are numerical, the question of interest is probably about a population

mean.

Page 16: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Number of Samples or Treatments (N): How many samples are there? OR IF the data are from an experiment, how many treatments are being compared?

For situations that involve sample data, different methods are used depending on whether there are one, two, or more than two samples.

Also, you may choose a different method to analyze data from an experiment with only two treatments than you would for an experiment with more than two treatments.

Four Key Questions Continued . . .

Page 17: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

QSTN

Think of this as the word QUESTION without the vowels.Q

Question TypeEstimation or hypothesis

testing?

SStudy Type

Sample data or experiment data?

TType of Data

Univariate or bivariate?Categorical or numerical?

NNumber of Samples

or Treatments

How many samples or treatments?

Page 18: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Answering Four Key Questions to Identify An Appropriate Method

QQuestion

Type

SStudy Type

T Type of Data

NNumber Method to Consider Chapter

Estimation SampleUnivariate Categorical

1One Sample z Confidence Interval for a Proportion

9

Hypothesis Test

SampleUnivariate Categorical

1One Sample z Test for a

Proportion10

Estimation SampleUnivariate Categorical

2Two Sample z Confidence Interval for a Difference in

Proportions11

Hypothesis Test

SampleUnivariate Categorical

2Two Sample z Test for a

Difference in Proportions11

Estimation SampleUnivariate Numerical

1One Sample t Confidence

Interval for a Mean12

Hypothesis Test

SampleUnivariate Numerical

1 One Sample t Test for a Mean 12

Estimation SampleUnivariate Numerical

2Two Sample t Confidence Interval for a Difference in

Means13

Hypothesis Test

SampleUnivariate Numerical

2Two Sample t Test for a

Difference in Means13

Hypothesis Test

SampleUnivariate Numerical

More than 2 ANOVA F Test 17 online

Estimation SampleUnivariate Numerical

More than 2 Multiple Comparisons 17 online

You will be able to refer to this table in the following chapters to identify an

appropriate method to use.

Page 19: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

A Five-Step Process for Statistical Inference

Estimation ProblemsHypothesis Testing Problems

Page 20: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

A Five-Step Process for Estimation Problems (EMC3)

Step What is this step?

E

M

C

C

C

Estimate: Explain what population characteristic you plan to estimate

Method: Select a potential method using QSTN

Check: Check to make sure that the method is appropriate. It is important to verify that any conditions are met before proceeding.

Calculate: Sample data are used to perform any necessary calculations.

Communicate Results: This is a critical step in the process. You will answer the questions of interest, explain what you have learned from the data, and acknowledge potential risk.

EM C

C C

Page 21: Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

A Five-Step Process for Hypothesis Testing Problems (HMC3)

Step What is this step?

H

M

C

C

C

Hypotheses: Define the hypotheses that will be tested

Method: Select a potential method using QSTN

Check: Check to make sure that the method is appropriate. It is important to verify that any conditions are met before proceeding.

Calculate: Sample data are used to perform any necessary calculations.

Communicate Results: This is a critical step in the process. You will answer the questions of interest, explain what you have learned from the data, and acknowledge potential risk.

HM C

C C