23
Why sample? Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Why sample?Why sample?

Diversity in populationsDiversity in populationsPracticality and costPracticality and cost

Page 2: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

TermsTermsPopulation = large group about which Population = large group about which

conclusions are drawn. Real, but conclusions are drawn. Real, but unknown.unknown.

Sample = small group that represents Sample = small group that represents population. Real, known.population. Real, known.

Population

Sample

Sample Sample Sample

Sample

Sample

Sample

Sample

Page 3: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Element = individual member of a Element = individual member of a population.population.

Sampling unit = element or group of Sampling unit = element or group of elements selected in a sample.elements selected in a sample.

Unit of analysis = element or group of Unit of analysis = element or group of elements compared in the analysiselements compared in the analysis

The above units can be the same or different.

Page 4: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Element, Sampling Unit, Unit of Element, Sampling Unit, Unit of Analysis: ExamplesAnalysis: Examples

Opinion Survey of UMD Opinion Survey of UMD StudentsStudents

Element = individual Element = individual studentstudent

Sampling unit = individual Sampling unit = individual studentstudent

Unit of analysis = Unit of analysis = individual student individual student (student opinions (student opinions measured)measured)

Survey of family incomesSurvey of family incomes

Element = adult Element = adult household memberhousehold member

Sampling unit = Sampling unit = household or addresshousehold or address

Unit of analysis = family Unit of analysis = family (total family income (total family income measured)measured)

Page 5: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Element, Sampling Unit, Unit of Element, Sampling Unit, Unit of Analysis: More ExamplesAnalysis: More Examples

Voter PollsVoter Polls Element = individual Element = individual

votervoter Sampling unit = Sampling unit =

telephone numbertelephone number Unit of analysis = Unit of analysis =

individual voterindividual voter

(voter opinions measured)(voter opinions measured)

U.S. Census of housingU.S. Census of housing Element = household Element = household

or addressor address Sampling unit = Sampling unit =

household or addresshousehold or address Unit of analysis = Unit of analysis =

household or addresshousehold or address

(# of rooms measured)(# of rooms measured)

Page 6: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Sampling frame = list of all the sampling Sampling frame = list of all the sampling units in the population. Needed for units in the population. Needed for probability sampling.probability sampling.

Probability sample = researcher knows Probability sample = researcher knows and controls the probability of selection.and controls the probability of selection.

Main advantage: Only probability samples Main advantage: Only probability samples permit accurate estimation of sampling permit accurate estimation of sampling error.error.

Page 7: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Simple Random SampleSimple Random Sample

Every element in the population has an Every element in the population has an equal and constant chance of selectionequal and constant chance of selection1. Physical sampling with replacement1. Physical sampling with replacement2. Table of random numbers2. Table of random numbers3. Random selection by computer3. Random selection by computer

Probability of selection = Sample Size/ Pop. sizeProbability of selection = Sample Size/ Pop. size

Requires list (frame) of all elements in populationRequires list (frame) of all elements in population

Page 8: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Systematic Random SampleSystematic Random Sample Every “kth” element is drawn from a list. (e.g. every Every “kth” element is drawn from a list. (e.g. every

5050thth name) name)

1. K = sampling interval = Pop. Size/Sample size (e.g. 1. K = sampling interval = Pop. Size/Sample size (e.g. 5000/100).5000/100).

2. Random starting point between 1 and K (e.g. 1 and 50).2. Random starting point between 1 and K (e.g. 1 and 50).

3. Statistically equivalent to simple random sample)3. Statistically equivalent to simple random sample)

4. List must be randomly ordered.4. List must be randomly ordered.

5. Convenient, since lists are available for many populations 5. Convenient, since lists are available for many populations

Page 9: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Stratified Random SampleStratified Random Sample

Population is first divided into groups Population is first divided into groups (strata).(strata).

Simple random sample is taken from Simple random sample is taken from within each stratumwithin each stratum

Separate random samples are combined Separate random samples are combined into a single total sample.into a single total sample.

Page 10: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Example of stratified sampleExample of stratified sample

UMD Population

Sample 2 Sample

Sample

Sample 1

Sample 3

Sample 4

Seniors

Juniors

Sophomores

Freshmen

Sample 1

Sample 3

Sample 2

Sample 4

Stratified Sample

Sample 1

Sample 3

Sample 2

Sample 4

Page 11: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Considerations in Stratified Considerations in Stratified SamplingSampling

Requires knowledge of stratifying variable Requires knowledge of stratifying variable Best used when there is much variation Best used when there is much variation

between strata in variable being measured between strata in variable being measured (Example:(Example: Stratify by year in school if measuring Stratify by year in school if measuring opinions of advisingopinions of advising))

Lowest sampling errorLowest sampling errorMost costlyMost costly

Page 12: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Sampling error = estimated difference Sampling error = estimated difference between sample value and actual between sample value and actual population value (e.g. population value (e.g. ++ 3%) 3%)

Page 13: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Cluster SampleCluster SampleElements in population are naturally Elements in population are naturally

grouped together (“clusters”)grouped together (“clusters”)Simple random sample of clusters is takenSimple random sample of clusters is takenEvery element in selected clusters is Every element in selected clusters is

studied.studied.

Population:

Sample

Page 14: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Considerations in Cluster SamplingConsiderations in Cluster Sampling

Best when there is little variation between Best when there is little variation between clusters in variable being measured.clusters in variable being measured.

Does not require a list of individual Does not require a list of individual elements (only clusters).elements (only clusters).

May be used to cover large geographic May be used to cover large geographic area (smaller areas = clusters)area (smaller areas = clusters)

May be less expensiveMay be less expensiveHighest sampling error.Highest sampling error.

Page 15: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Multistage DesignsMultistage Designs

Combines two or more sampling designs.Combines two or more sampling designs. Example: sampling voters in MNExample: sampling voters in MN

Stage 1: Stratify by geographic area (e.g. county)Stage 1: Stratify by geographic area (e.g. county)

Stage 2: Sample census tracts (clusters) in selected Stage 2: Sample census tracts (clusters) in selected counties.counties.

Stage 3: Take SRS of households in each tract.Stage 3: Take SRS of households in each tract.

Commonly used in large, diverse populationsCommonly used in large, diverse populations Design is best left to experts!Design is best left to experts!

Page 16: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

SamplingSamplingWhy use sampling?Why use sampling?Terms and definitionsTerms and definitionsProbability Sampling DesignsProbability Sampling Designs

Simple randomSimple randomSystematicSystematicStratifiedStratifiedClusterClusterMultistage designsMultistage designs

Estimation from samplesEstimation from samples

Page 17: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Estimation from SamplesEstimation from Samples

Find a likely range of values for a Find a likely range of values for a population parameter (e.g. average, %)population parameter (e.g. average, %)

Parameter = characteristic of a populationParameter = characteristic of a populationStatistic = characteristic of a sampleStatistic = characteristic of a sampleStatistical inference = drawing conclusions Statistical inference = drawing conclusions

about a population based on sample dataabout a population based on sample dataUsually connected with a probability of Usually connected with a probability of

error.error.

Page 18: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Sampling DistributionSampling Distribution

Distribution of results of all possible Distribution of results of all possible samples of size N taken from same samples of size N taken from same populationpopulation

Theoretical, not actually done in practiceTheoretical, not actually done in practiceProperties of sampling distributions are Properties of sampling distributions are

known to statisticiansknown to statisticiansUsed as basis for inferring from samples Used as basis for inferring from samples

to populationsto populations

Page 19: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Example: estimating proportion of Example: estimating proportion of homes with internet accesshomes with internet access

Suppose population proportion = .62Suppose population proportion = .62Take 1 sample of size 200 homes. 150 Take 1 sample of size 200 homes. 150

have internet access. Sample p = .60have internet access. Sample p = .60Can we conclude that the population Can we conclude that the population

proportion is .60?proportion is .60?A different sample might produce a A different sample might produce a

different answerdifferent answer

Page 20: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Most sample proportions would be close to Most sample proportions would be close to population value population value

A few would be much higher or lowerA few would be much higher or lowerAverage of sample proportions would be Average of sample proportions would be

the true population proportionthe true population proportionDistribution would be a bell-shaped curveDistribution would be a bell-shaped curve

What if we took all possible samples?What if we took all possible samples?

0 .1 .2 .3 .4 .5 .62 .7 .8 .9 1.0

% of samplesAll possible sample proportions

Page 21: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

What we know from sampling What we know from sampling distribution:distribution:

We DON’T know the true population We DON’T know the true population proportion.proportion.

We DO know how many sample We DO know how many sample proportions fall within a given distance of proportions fall within a given distance of the true proportion.the true proportion.

Sampling error = estimated difference Sampling error = estimated difference between sample value and actual between sample value and actual population value population value

(example: 95% of sample proportions fall within (example: 95% of sample proportions fall within ++

3% of true proportion)3% of true proportion)

Page 22: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

How we make an estimateHow we make an estimateFind sample proportionFind sample proportionAdd sampling error (margin of error) on Add sampling error (margin of error) on

either sideeither sideTrue proportion probably falls within this True proportion probably falls within this

intervalinterval

0 .1 .2 .3 .4 .5 .62 .7 .8 .9 1.0

% of samplesAll possible sample proportions

p

p

p

0 .1 .2 .3 .4 .5 .62 .7 .8 .9 1.0

% of samplesAll possible sample proportions

p

p

p

Page 23: Why sample? Diversity in populations Diversity in populations Practicality and cost Practicality and cost

Examples of estimatesExamples of estimates If 95% of sample proportions (p) fall within If 95% of sample proportions (p) fall within ++ 3% of true 3% of true

proportion, then 95% of all intervals p proportion, then 95% of all intervals p ++ .03 will contain .03 will contain true population proportion.true population proportion.

If p = .6, we estimate the true proportion is .6 If p = .6, we estimate the true proportion is .6 ++ .03 = .57 .03 = .57 to .63to .63

If p = .62, we estimate the true proportion is .62 If p = .62, we estimate the true proportion is .62 ++ .03 .03 = .59 to .65= .59 to .65

If p = .57, we estimate the true proportion is .57 If p = .57, we estimate the true proportion is .57 ++ .03 .03 = .54 to .60= .54 to .60

If p = If p = .7, we estimate the true proportion is .7 If p = If p = .7, we estimate the true proportion is .7 ++ .03 .03 = .67 to .73= .67 to .73

95% of the time this procedure yields a correct estimate.95% of the time this procedure yields a correct estimate.