STAT 4060 Design and Analysis of Surveys

Preview:

DESCRIPTION

STAT 4060 Design and Analysis of Surveys. Exam: 6 0% Mid Test: 20% Mini Project: 10 % Continuous assessment : 10 %. What we have learned:. 1. Simple random sampling, confidence interval and choice of sample size. - PowerPoint PPT Presentation

Citation preview

23/4/19 www.uic.edu.hk/~xlpeng 1

STAT 4060 Design and Analysis of Surveys

Exam: 60% Mid Test: 20% Mini Project: 10% Continuous assessment: 10%

23/4/19 www.uic.edu.hk/~xlpeng 2

What we have learned:

1. Simple random sampling, confidence interval and choice of sample size.

2. Ratio and regression estimators, systematic sampling.

3. Stratified random sampling, allocation of stratum weights.

4. Cluster sampling.

23/4/19 www.uic.edu.hk/~xlpeng 3

Population Parameter

23/4/19 www.uic.edu.hk/~xlpeng 4

Sample Statistics

23/4/19 www.uic.edu.hk/~xlpeng 5

Simple random sampling

We shall consider the use of simple random samples for estimating the three population characteristics:

the population mean

the population total

and the proportion P.

We shall discuss how any estimators behave in terms of their sampling distributions. The variance is often a crucial measure.

1

1, denoted , ;

N

jj

Y Y YN

1

, denoted , ;N

T T jj

Y Y Y

23/4/19 www.uic.edu.hk/~xlpeng 6

23/4/19 www.uic.edu.hk/~xlpeng 7

Proof of (1.9)

n

SfS

Nn

nS

Nn

N

yynnyVarn

YnYyynnYyVarnn

YnyEynnEyn

YyyyEn

YnyEyEyEyVar

jii

jii

jii

jijii

n

ii

222

22222

2222

222

22

1

22

)1(11

)),cov()1()((1

})),)(cov(1())(({1

})1({1

)(1

)/()()()(

23/4/19 www.uic.edu.hk/~xlpeng 8

Confidence interval for the population mean

23/4/19 www.uic.edu.hk/~xlpeng 9

23/4/19 www.uic.edu.hk/~xlpeng 10

23/4/19 www.uic.edu.hk/~xlpeng 11

23/4/19 www.uic.edu.hk/~xlpeng 12

23/4/19 www.uic.edu.hk/~xlpeng 13

23/4/19 www.uic.edu.hk/~xlpeng 14

Ratio Estimation and Regression Estimation(Chapter 4, Textbook, Barnett, V., 1991)

2.1 Estimation of a population ratio: The ratio estimator In some situations it is useful to estimate a (positive) ratio of two

population characteristics: the totals, or means, of two (positive) variables X and Y.

The sample average of ratio

unbiased for estimating the population mean

Two obvious estimators of R are

The ratio of the sample averages

is widely used.

23/4/19 www.uic.edu.hk/~xlpeng 15

1 1

1 1( / )

n n

i i ii i

r y x rn n

/ /T Tr y x y x

1 1

1 1( / )

N N

j j jj j

R R Y XN N

but biased for estimating R

The bias in estimating R by r

The bias in estimating R by r is the expectation of the following difference:

(2.3)

23/4/19 www.uic.edu.hk/~xlpeng 16

( ) /r R y Rx x 1

1y Rx x X

X X

2

1 .y Rx x X x X

X X X

2

[( )( )]( )

y Rx E y Rx x XE r R E

X X

Discussion about the bias

23/4/19 www.uic.edu.hk/~xlpeng 17

23/4/19 www.uic.edu.hk/~xlpeng 18

(2.5)

2

21

2 2 22

( )1

1

12

Nj j

j

Y YX X

Y RXf

nX N

fS RS R S

nX

( ) ( )j j j j jZ Y RX Y Y RX RX

2.2 Ratio estimation of a population mean or total

23/4/19 www.uic.edu.hk/~xlpeng 19

( / )Ry rX X x y

( / )TR T Ry rX NX x y Ny

Variance of ratio estimator

23/4/19 www.uic.edu.hk/~xlpeng 20

23/4/19 www.uic.edu.hk/~xlpeng 21

23/4/19 www.uic.edu.hk/~xlpeng 22

The estimate of the ratio R of the present weight to prestudy weight for the herd is:

Solution:

000929.012

646.848,8)

500

121(

880

11)(

22

2

rSXn

frVar

030485.0000929.0)( rse

23/4/19 www.uic.edu.hk/~xlpeng 23

This examines when the variance of (2.10) could be less or greater than that of (1.9)

23/4/19 www.uic.edu.hk/~xlpeng 24

2.3 Regression estimation

Condition (2.15.1) demands that X and Y be linearly related, but, if the linear relationship does not pass through the origin, then, it suggests considering an alternative estimator known as regression estimator.

23/4/19 www.uic.edu.hk/~xlpeng 25

2.3 Regression estimation

23/4/19 www.uic.edu.hk/~xlpeng 26

A practicable simple linear regression model is (2.17)

.

An ideal (perfect) linear relationship is

(2.16)

)( jj XXbYY

(2.18)

jjj EXXbYY )(

2.3 Regression estimation

23/4/19 www.uic.edu.hk/~xlpeng 27

Consider the average (mean) of either (2.16) or (2.17),

( )Ly y b X x (2.19)

2.3 Regression estimation

23/4/19 www.uic.edu.hk/~xlpeng 28

2( ) [( ) ]L LVar y E y Y 2

2 2 2

2 2

{[( ) ( )] }

1( 2 )

1(1 )

L

Y YX X

Y YX

E y Y b x X

fS bS b S

nfS

n

21( )Y

fS Var y

n

(2.20)

y

2.3 Regression estimation

23/4/19 www.uic.edu.hk/~xlpeng 29

From (2.20),

2 2 21min { ( )} min ( 2 )b L b Y YX X

fVar y S bS b S

n

2 21(1 )Y YX

fS

n

The minimum is obtained with 2min / /YX X YX Y Xb b S S S S

Y

Thus the most efficient regression estimator of is

( / )( )L YX Y Xy y S S X x

(2.22)

2.3 Regression estimation

23/4/19 www.uic.edu.hk/~xlpeng 30

The optimal value of b of (2.22) suggests the obvious estimate:

1min 2 2

1

( )( )( )

( )

n

i iyx in

x ii

y y x xsb b

s x x

(2.24)

( )Ly y b X x (2.25)

which enjoys the following asymptotic properties:

1( ) ( )LE y Y O n

2.3 Regression estimation

23/4/19 www.uic.edu.hk/~xlpeng 31

Asymptotic properties:

( )LVar y

2 2 2 3/21( / ) ( )Y YX X

fS S S O n

n

21( ) ( )L y yx

fV y s bs

n

(2.27)

(2.26) )()1(1 2/322

nOSn

fXYX

2.4 Comparison of ratio and regression estimators

23/4/19 www.uic.edu.hk/~xlpeng 32

23/4/19 www.uic.edu.hk/~xlpeng 33

2.4 Comparison of ratio and regression estimators

2 2 2 21( ) ( ) 2R L X YX Y X YX Y

fV y Var y R S R S S S

n

21X YX Y

fRS S

n

23/4/19 www.uic.edu.hk/~xlpeng 34

Stratified Simple Random Sampling(Chapter 5, Textbook, Barnett, V., 1991)

Consider another sampling method:

Some Notations

23/4/19 www.uic.edu.hk/~xlpeng 35

To estimate the population mean of a finite population, we assume that the population is stratified, that is to say it has been divided into k non-overlapping groups, or strata, of sizes:

The stratum means and variances are denoted by

and

23/4/19 www.uic.edu.hk/~xlpeng 36

Estimation of Population Characteristicsin Stratified Populations

Estimating

23/4/19 www.uic.edu.hk/~xlpeng 37

The stratified sample mean is defined as

Here we assume the weights Wi=Ni /N is given (known).

The mean and variance of

23/4/19 www.uic.edu.hk/~xlpeng 38

Note that

Since

Because it is assumed that “sampling in different strata are independent”, that is

23/4/19 www.uic.edu.hk/~xlpeng 39

Simple random sampling

Stratified sampling with proportional allocation

23/4/19 www.uic.edu.hk/~xlpeng 40

(a) When stratum size is large enough:

N

N i

23/4/19 www.uic.edu.hk/~xlpeng 41

(b) When stratum size is not large enough:

The stratified sample mean will be more efficient than the s.r. sample mean

If and only if variation between the stratum means is sufficiently large

compared with within-strata variation!

Optimum Choice of Sample Size

23/4/19 www.uic.edu.hk/~xlpeng 42

To achieve required precision of estimation Some cost limitation

The simplest form assumes that there is some overhead cost, c0 of administering

The survey, and that individual observations from the ith stratum each cost an

Amount ci. Thus the total cost is:

23/4/19 www.uic.edu.hk/~xlpeng 43

I. Minimum variance for fixed cost (Cont.)

23/4/19 www.uic.edu.hk/~xlpeng 44

I. Minimum variance for fixed cost (Cont.)

Then

II. Minimum cost for fixed variance

23/4/19 www.uic.edu.hk/~xlpeng 45

Consider to satisfy for the minimum possible total cost.

23/4/19 www.uic.edu.hk/~xlpeng 46

iii nwnwGiven ,

23/4/19 www.uic.edu.hk/~xlpeng 47

Comparison of proportional allocation and optimum allocation

23/4/19 www.uic.edu.hk/~xlpeng 48

Thus the extent of the potential gain from optimum (Neyman) allocation

Compared with proportional allocation depends on the variability of the

stratum variances: the larger this is, the greater the relative advantage

Of optimum allocation.

23/4/19 www.uic.edu.hk/~xlpeng 49

Cluster Sampling(Chapter 6, Textbook, Barnett, V., 1991)

23/4/19 www.uic.edu.hk/~xlpeng 50

23/4/19 www.uic.edu.hk/~xlpeng 51

23/4/19 www.uic.edu.hk/~xlpeng 52

23/4/19 www.uic.edu.hk/~xlpeng 53

23/4/19 www.uic.edu.hk/~xlpeng 54

Comparison of s.r. sampling with cluster sampling

Systematic Sampling

23/4/19 www.uic.edu.hk/~xlpeng 55

Systematic sample can be viewed as a cluster sample of size m=1!

Systematic sample mean

Systematic Sampling

23/4/19 www.uic.edu.hk/~xlpeng 56

Comparison of s.r. sampling with systimatic sampling

23/4/19 www.uic.edu.hk/~xlpeng 57

Two ways of estimating ---

23/4/19 www.uic.edu.hk/~xlpeng 58

Y

23/4/19www.uic.edu.hk/~xlpeng 59

n

23/4/19 www.uic.edu.hk/~xlpeng 60

Recommended