22
Fitting a Mixture Model for Incomplete Data Xiangyuanchai Guo Prof. Alan Welsh The Australian National University February 27, 2016 1

Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

Fitting a Mixture Model for Incomplete Data

Xiangyuanchai GuoProf. Alan Welsh

The Australian National University

February 27, 2016

1

Page 2: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

Abstract

Informative non-response is very difficult to model. Recently, in Section 7.2 of “MaximumLikelihood Estimation for Sample Surveys” (Chambers et al.,2012), a simple problem withbinary data was considered. In this project, we review the presentation in Section 7.2, explorethe issue of identifiability and carry out some simulation work to explore the methods.

2

Page 3: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

1 Introduction

Innovation surveys are designed to collect information on the uptake and development of newtechnology by businesses. It is thought that businesses that are not innovative are more inclined tobe non-responsive because they see no value in innovation surveys. Thus, the non-response processis a function of the survey variable of interest. In section 7.2 of “Maximum Likelihood Estimationfor Sample Surveys”, a simple, abstracted version of an innovation survey was considered and usedto illustrate and explore some of the issues which arise with non-response in sample surveys. Weconsider this same problem in more detail.

In this project, we write down a mixture model for informative non-response, showing that itis not identifiable. We then consider obtaining extra data from a follow up survey. We extendthe model to describe the second stage sample and reconsider the problem of non-identifiability.We explore the use of additional constraints to impose identifiability. In particular, we carry outsimulations to investigate the coverage and length of confidence intervals for the parameters ofinterest. Our conclusion is that the problem of handling informative non-response is quite difficult.

2 Non-response in innovation surveys

The original abstracted version of the survey in Chambers et al. (2012, Section 7.2) considered astratified population of businesses, where the businesses are classified into H strata. We consider asimplified version in which the population is not stratified, or to put it another way, the businessesin the population are all in the same stratum. The key quantatities describing the survey are asfollows:

• N is the population size;

• Ii is the sample indicator, which equals 1 when the corresponding business is selected in thesample;

• s = {i : Ii = 1} denotes the selected sample of n =∑

U Ii businesses;

• ri is the response indicator, which equals 1 when the corresponding business has responded,and is only observed for the in sample businesses;

• s1 = {i : Ii = 1, ri = 1} denotes the set of n1 =∑

U Iiri in sample businesses that respond;

• s0 = {i : Ii = 1, ri = 0} denotes the set of n0 =∑

U Ii(1 − ri) in sample businesses that donot respond;

• yi denotes the survey variable, which takes the value one if the business is innovative and zerootherwise, and is observed only for the units in s1.

Let iU be the N -vector of sample inclusion indicators, rs be the n-vector of the response indicatorsand ys1 be the n1 vector of observed values of the survey variable. Then the observed data fromthe survey are Bs = (ys1 , rs, iU ). The structure of the data Bs and the notation are summarisedin Table 1.

3

Page 4: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

Table 1: The structure of data.

Group Number Sample Response Surveylabel of units indicator indicator variables1 n1 Ii = 1 ri = 1 yis0 n0 Ii = 1 ri = 0 missingr N − n1 − n0 Ii = 0 missing missing

Total N N n1 + n0 n1

2.1 The model

We model the distribution of ri and then of yi given ri as Bernoulli, resulting in the mixture model:

yi|ri = 1 ∼ independent Bernoulli(δ1),

yi|ri = 0 ∼ independent Bernoulli(δ0), (1)

ri ∼ independent Bernoulli(γ).

Under the mixture model (1), we have

〈yi, ri〉 = δyiri (1− δri)1−yiγri(1− γ)1−ri ,

〈yi〉 = δyi

0 (1− δ0)1−yi(1− γ) + δyi

1 (1− δ1)1−yiγ,

〈ri|yi〉 =δyi

rih(1− δrih)1−yiγri(1− γ)1−ri

δyi

0 (1− δ0)1−yi(1− γ) + δyi

1 (1− δ1)1−yiγ.

From (1), the non-response is non-informative if δ1 = δ0 and informative if δ1 6= δ0. The mixturemodel (1) is deficient as a model for data with informative non-response when the two distributionsof yi|ri are different because there is information from the respondents (s1) about their distributionyi|ri = 1 but, by definition, there is no information from the non-respondents (s0) about theirdistribution yi|ri = 0.

We used maximum likelihood estimation (MLE) to estimate the unknown parameter vectorθ = (δ0, δ1, γ)T . The population log-likelihood is

log{L(θ)} =∑U

ri{yi log(δ1) + (1− yi) log(1− δ1)}

+∑U

(1− ri){yi log(δ0) + (1− yi) log(1− δ0)}

+∑U

{ri log(γ) + (1− ri) log(1− γ)}.

In Chambers et al. (2012, pp. 226-227), the authors differentiated the population log-likelihood withrespect to the parameters to obtain the population score function. They then took the conditionalexpectation given Bs of the unknown part of the population score function to obtain the samplescore function. Here to check the result in another way, and to meet the need for the sample log-likelihood function in the later part of this project, we integrated over the unobserved part of the

4

Page 5: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

population log-likelihood function and obtained the sample log-likelihood function

ls =∑s

(ri{yi log(δ1) + (1− yi) log(1− δ1)}

+{ri log(γ) + (1− ri) log(1− γ)}).

We then differentiated the sample log-likelihood with respect to the parameters to obtain the samplescore function

scs(δ1) =1

δ1(1− δ1)

∑s1

(yi − δ1),

scs(δ0) = 0,

scs(γ) =n1γ− n0

1− γ.

This is the same as the result in Chambers et al. (2012, p. 227), as expected. We found that δ0 isnot identifiable since the δ0 component of the sample score function is identically zero, and hencewe cannot estimate δ0. The maximum likelihood estimators of the other parameters are howevermeaningful:

• δ̂1 =∑

s1yi/n1 is the proportion of responding businesses that are innovative (yi = 1);

• γ̂ = n1/n is the proportion of businesses that respond.

When non-response is non-informative δ0 = δ1 = δ, then we can discard the δ0 component of thesample score function and δ̂1 is the maximum likelihood estimate of the common δ. Thus, as weclaimed, the model (1) is applicable for non-informative non-response but deficient for informativenon-response.

2.2 The mixture approach with a follow up survey

The problem with model (1) is that we have no data on the non-responding businesses. One wayto obtain data on the non-respondents is to carry out a second stage follow up survey in which wesample and then try to get information from the non-respondents. For simplicity, we consider onlysampling non-respondents in the second stage. It is also possible for there to be non-response in thesecond stage, which is different from the non-response process in the first stage. The distribution of

the second-stage response indicators r(2)i can depend on yi. For the second stage survey we require

additional notations:

• I(2)i denotes the second stage sample inclusion indicator;

• r(2)i denotes the second stage response indicator;

• s(2)1 = {i : I(2)i = 1, r

(2)i = 1} denotes the set of n

(2)1 =

∑U I

(2)i r

(2)i in (second stage) sample

businesses that respond;

• s(2)0 = {i : I(2)i = 1, r

(2)i = 0} denotes the set of n

(2)0 =

∑U I

(2)i (1 − r(2)i ) in (second stage)

sample businesses that do not respond.

5

Page 6: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

The second stage survey effectively decomposes s0 in the same way that the first stage sample de-composes the whole population. The sample data after the second stage survey is Bs = (ys1 , rs, iU ),

and B(2)s = (y

(2)

s1(2) , r(2)

s(2), i

(2)s0 ), where i

(2)s0 is the n0-vector of second stage sample inclusion indicators,

r(2)

s(2)is the (n

(2)0 + n

(2)1 )-vector of second stage response indicators and y

(2)

s1(2) is the n(2)1 -vector of

observed values of the second stage survey variable. This is shown in Table 2.

Table 2: The structure of the observed first and second stage data.

Group Number Sample Response Surveylabel of units indicator indicator variables1 n1 Ii = 1 ri = 1 yi

s(2)1 n

(2)1 I

(2)i = 1 r

(2)i = 1 yi

s(2)0 n

(2)0 I

(2)i = 1 r

(2)i = 0 missing

r(2)h n0 − n(2)1 − n

(2)0 I

(2)i = 0 missing missing

s0 n0 Ii = 1 ri = 0 missing

rh Nh − n− n0 Ii = 0 missing missingTotal Nh Nh n+ n0 n1

We expanded the basic model (1) by adding “on top” the additional components

r(2)i |ri = 0, yi = 1 ∼ independent Bernoulli(ζ1),

r(2)i |ri = 0, yi = 0 ∼ independent Bernoulli(ζ0). (2)

The new population log-likelihood including the second-stage survey is

log{L(θ)} =∑U

ri{yi log(δ1) + (1− yi) log(1− δ1)}

+∑U

(1− ri){yi log(δ0) + (1− yi) log(1− δ0)}

+∑U

{ri log(γ) + (1− ri) log(1− γ)}

+∑U

(1− ri)yi{r(2)i log(ζ1) + (1− r(2)i ) log(1− ζ1)}

+∑U

(1− ri)(1− yi){r(2)i log(ζ0) + (1− r(2)i ) log(1− ζ0)}.

After integrating over the unobserved variables and rearranging, we obtained the sample log-

6

Page 7: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

likelihood function

ls =∑s

(ri{yi log(δ1) + (1− yi) log(1− δ1)}

+ri log(γ) + (1− ri) log(1− γ)

+(1− ri)r(2)i {yi log(δ0ζ1) + (1− yi) log((1− δ0)ζ0)}

+(1− ri)(1− r(2)i ) log{δ0(1− ζ1) + (1− δ0)(1− ζ0)}).

We then differentiated the sample log-likelihood with respect to the parameters to get the samplescore function

scs(δ1) =1

δ1(1− δ1)

∑s1

(yi − δ1),

scs(δ0) =1

δ0(1− δ0)

∑s(2)1

(yi − δ0) +ζ0 − ζ1

δ0(1− ζ1) + (1− δ0)(1− ζ0)n(2)0 ,

scs(γ) =n1γ− n0

1− γ,

scs(ζ1) =1

ζ1

∑s(2)1

yi −δ0

δ0(1− ζ1) + (1− δ0)(1− ζ0)n(2)0 ,

scs(ζ0) =1

ζ0

∑s(2)1

(1− yi)−1− δ0

δ0(1− ζ1) + (1− δ0)(1− ζ0)n(2)0 .

The second stage survey provides data for estimating the previously non-identifiable δ0; it doesnot affect the estimation of δ1 or γ in Subsection 2.1, and we now also have three equations to solvefor δ̂0, ζ̂0 and ζ̂1.

7

Page 8: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

3 Simulation of the 5-parameter model in R

We generated data from the assumed model (1) - (2) using the rbinom function in R after setting

the true values of the parameters n, γ, δ1, δ0, ζ1, ζ0 and m, where m = n(2)1 + n

(2)0 is the sample size

of the second stage survey. The values of parameters we used are:

• n=5000

• m=2000

• γ=0.4

• δ1=0.7

• δ0=0.3

• ζ1=0.6

• ζ0=0.5

Setting the sample score functions to zero, we have a system of three equations to solve for δ̂0,ζ̂0 and ζ̂1. As the equations are nonlinear, we used function nleqslv (Boyden, 1965) to solve theequations numerically.

We expected to get a unique solution to the system of equations, and thus find the MLE’sfor δ0, ζ1, and ζ0. However, we found on simulated data that the solution changes as the initialvalue changes when the relative steplength tolerance and function value tolerance are small enough,indicating that there are multiple solutions to the system of equations.

As critical points may not necessarily be maximum points, multiple solutions to the sample scorefunctions may not directly indicate multiple MLE’s. So we then used the function constrOptim inR to obtain the maximum points of the sample log-likelihood function. Unsurprisingly, we obtainedmultiple solutions. The sample log-likelihood function has infinitely many maximum points withthe same function value. Also, we found that the maximum function values at those differentmaximum points are all the same. Table 3 shows multiple solutions for different initial values. Thefull set of maximising points for the data are plotted in the following figure.

Table 3: Multiple solutions to the 5-parameter model.

Initial value Maximum point Maximum value

(δ̂0, ζ̂1, ζ̂0) (δ̂0, ζ̂1, ζ̂0)(0.8,0.8,0.8) (0.345, 0.406, 0.825) -48.6(0.5,0.5,0.5) (0.243, 0.577, 0.713) -48.6(0.2,0.2,0.2) (0.288, 0.487, 0.758) -48.6

8

Page 9: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

From these results, we conclude that the model is non-identifiable.We then came up with an intuitive explanation for the non-identifiability.We can divide the model into two parts. The first part is from the basic model (1) for the first

stage survey given in Section 2.1:

yi|ri = 1 ∼ independent Bernoulli(δ1),

yi|ri = 0 ∼ independent Bernoulli(δ0),

ri ∼ independent Bernoulli(γ).

The second part includes the components with respect to the second stage survey:

r(2)i |ri = 0, I

(2)i = 1, yi = 1 ∼ independent Bernoulli(ζ1),

r(2)i |ri = 0, I

(2)i = 1, yi = 0 ∼ independent Bernoulli(ζ0),

yi|ri = 0 ∼ independent Bernoulli(δ0).

We observe that the two parts of the model have similar structures although the condition of the

first part is Ii = 1, and the condition of the second part is ri = 0 and I(2)i = 1.

We can also divide the 5 parameters into two groups. The first group is γ, δ1 and δ0, and thesecond group is δ0, ζ1, and ζ0. We found the first group of parameters can be estimated explicitlyusing the first part of the model, while the second group of parameters are estimated using onlythe second part of the model, as the system of equations to solve only contains the three samplescore functions for δ0, ζ1, and ζ0. In short, the two parts of the model are similar but separate; thetwo groups of parameters are solved separately, using different parts of the model.

Just as we failed to estimate all the three parameters in the first part of the model using thedata from the first stage survey, we will still fail to estimate all three parameters in the second partof the model using the data from the second stage survey.

The main problem is that, we aimed to use additional parameters to help estimate the unknownδ0, but we failed to link them with the basic model properly. To solve this problem, we thenconsidered adding constraints to the model.

9

Page 10: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

4 Adding constraints to the 5-parameter model

To make connections between the two parts of the model, we considered two possible constraints.We then checked the applicability of these assumptions using R.

4.1 Constraint 1

The first constraint is that ζ1 = ζ0, so the non-response in the second stage survey is assumed tobe non-informative, and the model becomes a 4-parameter model. When setting ζ1 equal to ζ0, themodel becomes a 4-parameter model. When solving the model in R, we simply change both ζ1 andζ0 to a new parameter ζ to obtain a new sample log-likelihood function.

ls =∑s

(ri{yi log(δ1) + (1− yi) log(1− δ1)}

+ri log(γ) + (1− ri) log(1− γ)

+(1− ri)r(2)i {yi log(δ0ζ) + (1− yi) log((1− δ0)ζ)}

+(1− ri)(1− r(2)i ) log(1− ζ)).

The unique explicit maximum likelihood estimators for δ0 and the additional parameters are:

δ̂0 =

∑s(2)1yi∑

s(2)1yi +

∑s(2)1

(1− yi),

ζ̂ =

∑s(2)1yi +

∑s(2)1

(1− yi)∑s(2)1yi +

∑s(2)1

(1− yi) + n(2)0

.

4.2 Constraint 2

Before setting the second constraint, we first investigated the distribution of ri|yi, Ii = 1. Giventhat

yi|ri = 1 ∼ independent Bernoulli(δ1),

yi|ri = 0 ∼ independent Bernoulli(δ0),

ri ∼ independent Bernoulli(γ),

we found ri|yi, Ii = 1 also follows bernoulli distribution, with parameter ζyi , where:

ζ1 =δ1γ

δ0(1− γ) + δ1γ,

ζ0 =(1− δ1)γ

(1− δ0)(1− γ) + (1− δ1)γ.

The second constraint is that r(2)i |yi, ri = 0, I

(2)i = 1 has the same distribution as ri|yi, Ii = 1, by

which we imposed

r(2)i |yi, ri = 0, I

(2)i = 1 ∼ independent Bernoulli(ζyi

).

10

Page 11: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

By substituting

ζ1 =δ1γ

δ0(1− γ) + δ1γ,

ζ0 =(1− δ1)γ

(1− δ0)(1− γ) + (1− δ1)γ,

we obtained a new sample log-likelihood function under constraint 2:

ls =∑s

(ri{yi log(δ1) + (1− yi) log(1− δ1)}

+ ri log(γ) + (1− ri) log(1− γ)

+ (1− ri)r(2)i {yi log(δ0δ1γ

δ0(1− γ) + δ1γ)

+ (1− yi) log((1− δ0)(1− δ1)γ

(1− δ0)(1− γ) + (1− δ1)γ)}

+ (1− ri)(1− r(2)i ) log{δ0(1− ζ1)

+ (1− δ0)((1− δ0)(1− γ)

(1− δ0)(1− γ) + (1− δ1)γ)}).

However, multiple solutions still exist, and δ̂1 is always 1.000, which is not reasonable. This indicatesconstraint 2 does not make the model identifiable, and the link we set to connect the two parts ofthe model is not a useful one.

5 Simulation of the 4-parameter model in R

As the 4-parameter model (the 5-parameter model with assumption 1) is identifiable, we did somefurther investigation as well as simulation on this model to learn about its properties.

First, for a simulated dataset, we checked the 3D plot of the surface of its sample log-likelihoodfunction for δ0 and ζ:

The surface appears to be smooth and continuous, indicating that it has a single maximum. It isquite flat so the variance of the maximum may be large.

11

Page 12: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

We then checked the eigenvalue of the sample log-likelihood function’s Hessian, and found thelargest eigenvalue is different from zero, which is a good sign.

Finally, we did some simulations to check the confidence intervals for the parameters underdifferent parameter settings.

5.1 Confidence intervals and coverages

Using the same set of parameter values as before, we generated 5000 sets of data in R. For each set ofdata, we calculated confidence intervals for δ1 and δ0. We calculated the coverages and the lengthsof these intervals for δ1 and δ0. The full R code for the loop can be found in the appendix. We foundthe coverage of δ1 is high (0.946), while the coverage of δ0 is relatively low(0.227). However, whenwe change the parameter setting, the results change. With m=200 instead of 2000, the coverage ofδ0 is instead 0.863. We will see below that this is due to the larger variance for smaller m obscuringthe bias in the estimation of δ0.

We also checked the lengths of the confidence intervals (for m=2000). We found that theconfidence interval for δ1 is much shorter than that for δ0. This is shown in the following boxplot.The fact that the coverage for δ1 is higher than that for δ0 indicates that the estimator of δ1 isunbiased.

12

Page 13: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

5.2 Changing parameter settings

As in the last subsection, we generated 5000 datasets. The different settings we used here are:

• Original setting: large n (5000), large m (2000) and ζ1 (0.6) larger than ζ0 (0.5);

• Large n (5000) and small m (200);

• Small n (500) and small m (200);

• ζ0 (0.6) larger than ζ1 (0.5).

We use boxplots to present the results in each case, with red points representing the true value ofthe parameters γ, δ1, δ0, and the average of ζ1 and ζ0. From the boxplots, we found that:

• The sample size of the first stage survey n mainly influences the dispersions of γ̂ and δ̂1, withsmaller dispersions when n is larger;

• The sample size of the second stage survey m mainly influences the dispersions of ζ̂ and δ̂0,with smaller dispersions when m is larger;

• ζ̂ and δ̂0 are less accurate compared to the other two estimators when ζ1 and ζ0 have differentvalues; however, they appear to be accurate when ζ1 = ζ0;

• The change of the big-small relationship between ζ0 and ζ1 reverses the over- and under-estimation of δ0 and ζ.

In conclusion, although the 4-parameter model is simpler and solves the problem of non-identifiability,it still has some problems with estimating δ0 when the assumed constraint does not actually hold.

13

Page 14: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

14

Page 15: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

6 Summary and Conclusion

We wrote down a mixture model for informative non-response, and showed that δ0 is not identifiable.We obtained extra data from a follow up survey, extended the model, and showed that δ0, ζ0 and

ζ1 are still not identifiable. We tried two constraints, one of which (r(2)i |yi, ri = 0, I

(2)i = 1 has the

same distribution as ri|yi, Ii = 1) did not work, while the other (ζ1 = ζ0) gave identifiability andexplicit maximum likelihood estimators. We showed that imposing the constraint ζ1 = ζ0 does notproduce good estimators of δ0 when ζ1 6= ζ0.

We confirmed that informative non-response is difficult to handle, and showed that follow-upsurveys do not resolve the problem on their own. Imposing a particular constraint on the modelfor follow-up non-response is needed for identifiability, but leads to poor estimators of δ0 whenthe constraint does not hold. The problem is we never know whether our assumptions about non-response are correct or not, and when they are incorrect, they can lead to poor results (bias, lowcoverage etc.).

7 References

• Chambers et al.(January 2012). “Maximum Likelihood Estimation for Sample Surveys”,CRC Press, Section 7.2.

• Broyden, C. G. (October 1965). “A Class of Methods for Solving Nonlinear SimultaneousEquations”. Mathematics of Computation, American Mathematical Society, 19 (92), 577 -593.

15

Page 16: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

8 Appendix

8.1 Data generating

r <- rbinom(n,1,gamma)

r <- sort(r)

y <- c(rbinom(n-sum(r),1,delta0), rbinom(sum(r),1,delta1))

r2 <- c(rbinom(m-sum(y[1:m]),1,zeta0), rbinom(sum(y[1:m]),1,zeta1))

#calculate1

n1 = sum(r)

n0 = n - n1

n12 = sum(r2)

n02 = m-sum(r2)

gammahat = n1/n

delta1hat = sum(y[((n-sum(r))+1):n])/n1

8.2 Sample score functions

#calculate2

sum2 = sum(r2[(m-sum(y[1:m])+1):m])

sum3 = sum(r2[1:(m-sum(y[1:m]))])

install.packages("nleqslv")

fn <- function(x) {

f <- numeric(length(x))

f[1] <- (sum2-n12*x[1])/(x[1]*(1-x[1])) +

(x[3]-x[2])*n02/(x[1]*(1-x[2])+(1-x[1])*(1-x[3]))

f[2] <- sum2/x[2] - x[1]*n02/(x[1]*(1-x[2])+(1-x[1])*(1-x[3]))

f[3] <- sum3/x[3] - (1-x[1])*n02/(x[1]*(1-x[2])+(1-x[1])*(1-x[3]))

f

}

x.start <- c(0.8,0.8,0.8)

#problem: solution changes as x.start changes

nleqslv(x.start,fn)

result <- nleqslv(x.start,fn)$x

result

delta0hat=result[1]

zeta1hat=result[2]

zeta0hat=result[3]

help(nleqslv)

res<- matrix(0,729,3)

res[,1]=c(rep(0.1,times=81),rep(0.2,times=81),rep(0.3,times=81),

rep(0.4,times=81),rep(0.5,times=81),rep(0.6,times=81),

rep(0.7,times=81),rep(0.8,times=81),rep(0.9,times=81))

res[,2]=c(rep(c(rep(0.1,times=9),rep(0.2,times=9),

rep(0.3,times=9),rep(0.4,times=9),rep(0.5,times=9),

16

Page 17: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

rep(0.6,times=9),rep(0.7,times=9),rep(0.8,times=9),

rep(0.9,times=9)),times=9))

res[,3]=rep(seq(from=0.1, to=0.9, by=0.1),times=81)

#res<- read.csv(........res.csv) don’t know the correct way to

import this dataset

#View(res)

#attach(res)

solutions<- matrix(0,729,3)

options(digits=3)

for(i in 1:729){

solutions[i,]=constrOptim(res[i,],fn2,ui=A,ci=B,grad=NULL,

control = list(fnscale = -1))$par

}

plot(solutions[,1],solutions[,2])

plot(solutions[,1],solutions[,2],xlim=c(0,1),ylim=c(0,2.5))

# other points form a upward convex curve

plot(solutions[,1],solutions[,3],xlim=c(0,1),ylim=c(0,2.5))

# similar upward convex curve

plot(solutions[,2],solutions[,3],xlim=c(0,2.5),ylim=c(0,2.5))

#downward convex curve

8.3 Simulation: CI for 4-parameter model

Delta1_3<-matrix(0,5000,6)

#[,1]is deltahat; [,2]is lower bound of CI;

#[,3]is upper bound of CI;[,4]&[,5]are indicators;[,6]is CI length

Delta0_3<-matrix(0,5000,6)

n=5000

m=2000

gamma=0.4

delta1=0.7

delta0=0.3

zeta1=0.6

zeta0=0.5

Delta1<-rep(0,5000)

Gamma<-rep(0,5000)

Delta0<-rep(0,5000)

Zeta0<-rep(0,5000)

for (i in 1:5000) {

r <- rbinom(n,1,gamma)

r <- sort(r)

y <- c(rbinom(n-sum(r),1,delta0), rbinom(sum(r),1,delta1))

r2 <- c(rbinom(m-sum(y[1:m]),1,zeta0), rbinom(sum(y[1:m]),1,zeta1))

n1 = sum(r)

n0 = n - n1

17

Page 18: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

n12 = sum(r2)

n02 = m-sum(r2)

Gamma[i] <- n1/n

Delta1[i]<-sum(y[((n-sum(r))+1):n])/n1

sum2 = sum(r2[(m-sum(y[1:m])+1):m])

sum3 = sum(r2[1:(m-sum(y[1:m]))])

Delta0[i]<-sum2/(sum2+sum3)

Zeta0[i]<-(sum2+sum3)/(sum2+sum3+n02)

SE1=sqrt((Delta1[i]*(1-Delta1[i]))/n1)

SE0=sqrt((Delta0[i]*(1-Delta0[i]))/n12)

CI0l=Delta0[i]-1.96*SE0

CI0r=Delta0[i]+1.96*SE0

CI1l=Delta1[i]-1.96*SE1

CI1r=Delta1[i]+1.96*SE1

Delta0_3[i,1]=Delta0[i]

Delta0_3[i,3]=CI0r

Delta0_3[i,2]=CI0l

if(CI0l <= delta0){Delta0_3[i,4]=1}

if(CI0r >= delta0){Delta0_3[i,5]=1}

Delta0_3[i,6]=CI0r-CI0l

Delta1_3[i,1]=Delta1[i]

Delta1_3[i,3]=CI1r

Delta1_3[i,2]=CI1l

Delta1_3[i,6]=CI1r-CI1l

if(CI1l <= delta1){Delta1_3[i,4]=1}

if(CI1r >= delta1){Delta1_3[i,5]=1}

}

cov0<-mean(Delta0_3[,4]*Delta0_3[,5])#around 0.2

cov1<-mean(Delta1_3[,4]*Delta1_3[,5])#around 0.9

cov0

cov1

boxplot(split(c(Delta0_3[,6],Delta1_3[,6]),c(rep(1,5000),

rep(2,5000))),

main="Boxplot of CI lengths",names=

c(expression(delta[0]),expression(delta[1])))

8.4 Simulation: changing settings of parameters

#1. large n & large m: estimators are near outliers

n=5000

m=2000

gamma=0.4

delta1=0.7

delta0=0.3

zeta1=0.6

18

Page 19: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

zeta0=0.5

Delta1<-rep(0,5000)

Gamma<-rep(0,5000)

Delta0<-rep(0,5000)

Zeta0<-rep(0,5000)

for (i in 1:5000) {

r <- rbinom(n,1,gamma)

r <- sort(r)

y <- c(rbinom(n-sum(r),1,delta0), rbinom(sum(r),1,delta1))

r2 <- c(rbinom(m-sum(y[1:m]),1,zeta0), rbinom(sum(y[1:m]),1,zeta1))

n1 = sum(r)

n0 = n - n1

n12 = sum(r2)

n02 = m-sum(r2)

Gamma[i] <- n1/n

Delta1[i]<-sum(y[((n-sum(r))+1):n])/n1

sum2 = sum(r2[(m-sum(y[1:m])+1):m])

sum3 = sum(r2[1:(m-sum(y[1:m]))])

Delta0[i]<-sum2/(sum2+sum3)

Zeta0[i]<-(sum2+sum3)/(sum2+sum3+n02)

}

#par(mfrow=c(1,4))

boxplot(split(c(Gamma,Delta1,Delta0,Zeta0),c(rep(1,5000),

rep(2,5000),rep(3,5000),rep(4,5000))),

main="Original setting",names=c(expression(hat(gamma)),

expression(hat(delta)[1]),expression(hat(delta)[0]),

expression(hat(zeta))))

zeta=(zeta1+zeta0)/2

points(c(1:4),c(gamma,delta1,delta0,zeta),col="red")

#2.large n & small m # more diversed; estimators are near quartiles

n=5000

m=200

gamma=0.4

delta1=0.7

delta0=0.3

zeta1=0.6

zeta0=0.5

Delta1<-rep(0,5000)

Gamma<-rep(0,5000)

Delta0<-rep(0,5000)

Zeta0<-rep(0,5000)

for (i in 1:5000) {

r <- rbinom(n,1,gamma)

19

Page 20: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

r <- sort(r)

y <- c(rbinom(n-sum(r),1,delta0), rbinom(sum(r),1,delta1))

r2 <- c(rbinom(m-sum(y[1:m]),1,zeta0), rbinom(sum(y[1:m]),1,zeta1))

n1 = sum(r)

n0 = n - n1

n12 = sum(r2)

n02 = m-sum(r2)

Gamma[i] <- n1/n

Delta1[i]<-sum(y[((n-sum(r))+1):n])/n1

sum2 = sum(r2[(m-sum(y[1:m])+1):m])

sum3 = sum(r2[1:(m-sum(y[1:m]))])

Delta0[i]<-sum2/(sum2+sum3)

Zeta0[i]<-(sum2+sum3)/(sum2+sum3+n02)

}

boxplot(split(c(Gamma,Delta1,Delta0,Zeta0),c(rep(1,5000),

rep(2,5000),rep(3,5000),rep(4,5000))),

main="Large n & small m",

names=c(expression(hat(gamma)),expression(hat(delta)[1]),

expression(hat(delta)[0]),expression(hat(zeta))))

zeta=(zeta1+zeta0)/2

points(c(1:4),c(gamma,delta1,delta0,zeta),col="red")

#3.small n & small m: similar to case#2"large n & small m"

n=500

m=200

gamma=0.4

delta1=0.7

delta0=0.3

zeta1=0.6

zeta0=0.5

Delta1<-rep(0,5000)

Gamma<-rep(0,5000)

Delta0<-rep(0,5000)

Zeta0<-rep(0,5000)

for (i in 1:5000) {

r <- rbinom(n,1,gamma)

r <- sort(r)

y <- c(rbinom(n-sum(r),1,delta0), rbinom(sum(r),1,delta1))

r2 <- c(rbinom(m-sum(y[1:m]),1,zeta0), rbinom(sum(y[1:m]),1,zeta1))

n1 = sum(r)

n0 = n - n1

n12 = sum(r2)

n02 = m-sum(r2)

Gamma[i] <- n1/n

Delta1[i]<-sum(y[((n-sum(r))+1):n])/n1

20

Page 21: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

sum2 = sum(r2[(m-sum(y[1:m])+1):m])

sum3 = sum(r2[1:(m-sum(y[1:m]))])

Delta0[i]<-sum2/(sum2+sum3)

Zeta0[i]<-(sum2+sum3)/(sum2+sum3+n02)

}

boxplot(split(c(Gamma,Delta1,Delta0,Zeta0),c(rep(1,5000),rep(2,5000),

rep(3,5000),rep(4,5000))),

main="Small n & small m",names=c(expression(hat(gamma)),

expression(hat(delta)[1]),

expression(hat(delta)[0]),expression(hat(zeta))))

zeta=(zeta1+zeta0)/2

points(c(1:4),c(gamma,delta1,delta0,zeta),col="red")

#4. zeta0 > zeta1: average estimators are now lower than true values

n=5000

m=2000

gamma=0.4

delta1=0.7

delta0=0.3

zeta1=0.5

zeta0=0.6

Delta1<-rep(0,5000)

Gamma<-rep(0,5000)

Delta0<-rep(0,5000)

Zeta0<-rep(0,5000)

for (i in 1:5000) {

r <- rbinom(n,1,gamma)

r <- sort(r)

y <- c(rbinom(n-sum(r),1,delta0), rbinom(sum(r),1,delta1))

r2 <- c(rbinom(m-sum(y[1:m]),1,zeta0), rbinom(sum(y[1:m]),1,zeta1))

n1 = sum(r)

n0 = n - n1

n12 = sum(r2)

n02 = m-sum(r2)

Gamma[i] <- n1/n

Delta1[i]<-sum(y[((n-sum(r))+1):n])/n1

sum2 = sum(r2[(m-sum(y[1:m])+1):m])

sum3 = sum(r2[1:(m-sum(y[1:m]))])

Delta0[i]<-sum2/(sum2+sum3)

Zeta0[i]<-(sum2+sum3)/(sum2+sum3+n02)

}

boxplot(split(c(Gamma,Delta1,Delta0,Zeta0),c(rep(1,5000),rep(2,5000),

rep(3,5000),rep(4,5000))),

21

Page 22: Fitting a Mixture Model for Incomplete Data · Informative non-response is very di cult to model. Recently, in Section 7.2 of \Maximum Likelihood Estimation for Sample Surveys" (Chambers

main=c(expression(zeta[0] > zeta[1])),

names=c(expression(hat(gamma)),

expression(hat(delta)[1]),expression(hat(delta)[0]),

expression(hat(zeta))))

zeta=(zeta1+zeta0)/2

points(c(1:4),c(gamma,delta1,delta0,zeta),col="red")

22