17
INTRODUCTION TO STATISTICS & PROBABILITY Chapter 3: Producing Data (Part 3) Dr. Nahid Sultana 1

Chapter 3 part3-Toward Statistical Inference

  • Upload
    nszakir

  • View
    262

  • Download
    2

Embed Size (px)

DESCRIPTION

Statistica

Citation preview

INTRODUCTION TO STATISTICS & PROBABILITY

Chapter 3: Producing Data

(Part 3)

Dr. Nahid Sultana

1

Chapter 3: Producing Data

Introduction

3.1 Design of Experiments

3.2 Sampling Design

3.3 Toward Statistical Inference

3.4 Ethics

2

3.3 Toward Statistical Inference

3

Parameters and Statistics

Sampling Variability

Sampling Distribution

Bias and Variability

Sampling from Large Populations

4

Parameters and Statistics Using samples to talk about populations

A parameter is a number that describes some characteristic of the population. In statistical practice, the value of a parameter is not known because we cannot examine the entire population.

Name Symbol Example

Mean µ In a nationwide test, what is the average score? Proportion p What proportion of people choose chocolate as their favorite ice cream

flavor?

Name Symbol Example Sample Mean Sample mean of 100 test scores

Sample Proportion

Sample proportion of 100 people who choose chocolate as their favorite ice cream flavor?

x

We answer such questions by studying a sample…. A statistic is a number that describes some characteristic of a sample. The value of a statistic can be computed directly from the sample data.

p

5

Parameters and Statistics Examples:

Proportion of all students who attended the last home football game. Parameter, p Proportion of registered voters who voted in November.

Parameter, p Mean height of a sample of NBA basketball players.

Statistics, Mean SAT of entering freshmen

Parameter, µ Proportion of people who prefer Coke over Pepsi in a sample of mall shoppers

Statistics, Mean number of pepperoni slices on a 12̎ pizza from a sample of a certain brand of pepperoni pizzas.

Statistics, x

x

6

Statistical Estimation

The process of statistical inference involves using information from a sample to draw conclusions about a wider population.

Your estimate of the population is only as good as your sampling design.

Work hard to eliminate biases.

Your sample is only an estimate—and if you randomly sampled again you would probably get a somewhat different result.

Bigger sample is better.

7

Sampling Variability

Each time we take a random sample from a population, we are likely to get a different set of individuals and calculate a different statistic. This is called sampling variability.

We ask, “What would happen if we took many samples?” Take a large number of samples from the same population.

Calculate the sample mean/proportion for each sample.

Make a histogram of these values.

Examine the distribution displayed in the histogram for shape, center, and spread, as well as outliers or other deviations.

8

Sampling Variability (Cont…)

The sampling distribution of a statistic is the distribution of that statistic for samples of a given size n taken from the same population. The variability of a statistic is described by the spread of its sampling distribution. This spread depends on the sampling design and the sample size n, with larger sample sizes leading to lower variability.

9

The results of many SRSs have a regular pattern. Here, we draw 1000 SRSs of size 100 from the same population. The population proportion is p = 0.60. The histogram shows the distribution of the 1000 sample proportions.

The distribution of sample proportions for 1000 SRSs of size 2500 drawn from the same population as in first figure. The two histograms have the same scale. The statistic from the larger sample is less variable.

10

Both bias and variability describe what happens when we take many shots at the target.

Bias concerns the center of the sampling distribution. A statistic used to estimate a parameter is unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated.

The variability of a statistic is described by the spread of its sampling distribution. This spread is determined by the sampling design and the sample size n. Statistics from larger probability samples have smaller spreads. 10

Bias and Variability

11

A good sampling scheme must have both small bias and small variability.

To reduce bias, use random sampling. To reduce variability of a statistic from an SRS, use a larger sample.

Managing Bias and Variability

POPULATION SIZE DOESN’T MATTER The variability of a statistic from a random sample does not depend on the size of the population, as long as the population is at least 100 times larger than the sample.

12

3.4 Ethics

Institutional Review Boards

Informed Consent

Confidentiality

Clinical Trials

Behavioral and Social Science Experiments

13

Institutional Review Boards

The organization that carries out the study must have an institutional review board that reviews all planned studies in advance in order to protect the subjects from possible harm.

The institutional review board:

reviews the plan of study

can require changes

reviews the consent form

monitors progress at least once a year

14

Informed Consent

All subjects must give their informed consent before data are collected.

Subjects must be informed in advance about the nature of a study and any risk of harm it might bring.

Subjects must then consent in writing.

Who can’t give informed consent?

prison inmates

very young children

people with mental disorders

15

Confidentiality

All individual data must be kept confidential. Only statistical summaries may be made public.

Confidentiality is not the same as anonymity. Anonymity means that subjects are anonymous—their names are not known even to the director of the study. Anonymity prevents follow-ups to improve non-response or inform subjects of results.

Any breach of confidentiality is a serious violation of data ethics.

The best practice is to separate the identity of the subjects from the rest of the data immediately!

16

Clinical Trials

Clinical trials study the effectiveness of medical treatments on actual

patients—these treatments can harm as well as heal.

Points for a discussion:

Randomized comparative experiments are the only way to

see the true effects of new treatments.

Most benefits of clinical trials go to future patients. We must

balance future benefits against present risks.

17

Behavioral and Social Science Experiments

Many behavioral experiments rely on hiding the true purpose of the

study.

Subjects would change their behavior if told in advance what

investigators were looking for.

The “Ethical Principles” of the American Psychological Association

require consent unless a study only observes behavior in a public

space.