10
J. R . Statist.Soc. A (1983). 146, Part 4,pp. 394-403 On the Validity of Inferences from Non-random Samples By T. M. F. SMITH University of Southampton, UK SUMMARY Random sampling schemes satisfy the conditions for ignoring the selection mechanism in a model-based approach to inference in an observational study, such as a sample survey. In many studies non-random sampling is employed. Conditions for ignoring non-random selection mechanisms are examined. Particular attention is paid to post- statification and to quota sampling. Keywords: RANDOMIZATION; NON-RANDOM SAMPLING; OBSERVATIONAL STUDY; QUOTA SAMPLING; SELECTION; MODEL-BASED INFERENCE 1. INTRODUCTION Randomization, whether in the application of treatments to experimental material or in the selection of units to be observed in a sample survey, is one of the most important contributions of statistics to science. The arguments for randomization are twofold. The first, and most important for science, is that randomization eliminates personal choice and hence eliminates the possibility of subjective selection-bias. The second is that the randomization distribution provides a basis for statistical inference. The question for scientists is whether such statistical inferences are relevant for scientific inference. In most of applied science any uncertainty in "nature" is represented by a stochastic model of the phenomenon under study. The problems of statistical inference are then the problems of testing the fit of alternative models and of making inferences about the parameters of a given model. Even when randomization is employed the randomization distribution plays no direct role in this type of statistical inference. However, the selection of the units to be studied can affect the inferences, see, for example, Pearson (1903), and Birnbaum et al. (1950). Also of statistical importance is the decision whether or not to report the results of the study. Dawid and Dickey (1977) show that the reporting process can have a considerable impact on the scientific validity of statistical analysis as, for example, when only results which are significant at the 5 per cent level are published. Observational studies, including most sample surveys, do not require the allocation of units to treatment groups, and when randomization is employed it is solely to determine which units should be observed. In many observational studies the units for study are selected for convenience without any randomization. For example, Doll and Hill (1964) in a prospective study of smoking and disease selected doctors as their units for study. They examined the relationships in their data and reported their conclusions. Any generalization of their conclusions to other populations, for example, to all adults, would have to be based on assumptions about the relationship between the selected units and the other population. Thus the selection of a particular sub-group for study has restricted the range of the inference. Repeating the study with other specially selected groups and obtaining similar results adds weight to one's belief about the general validity of the results but, of course, such studies can never prove that the results are scientifically valid. Present address: Faculty of Mathematical Studies, The University, Southampton SO9 5NH. O 1983 Royal Statistical Society 0035-9238/83/146394$2.00

J . R . Statist.Soc. A (1983). 146, Part 4 , p p

Embed Size (px)

DESCRIPTION

Keywords: RANDOMIZATION; NON-RANDOM SAMPLING; OBSERVATIONAL STUDY; QUOTA SAMPLING; SELECTION; MODEL-BASED INFERENCE

Citation preview

Page 1: J . R . Statist.Soc. A (1983). 146, Part 4 , p p

J . R . Statist.Soc. A (1983). 146,Part 4 , p p . 394-403

On the Validity of Inferences from Non-random Samples

By T. M. F. SMITH

University of Southampton, UK

SUMMARY

Random sampling schemes satisfy the conditions for ignoring the selection mechanism in a model-based approach to inference in an observational study, such as a sample survey. In many studies non-random sampling is employed. Conditions for ignoring non-random selection mechanisms are examined. Particular attention is paid to post- statification and to quota sampling.

Keywords: RANDOMIZATION; NON-RANDOM SAMPLING; OBSERVATIONAL STUDY; QUOTA SAMPLING; SELECTION; MODEL-BASED INFERENCE

1. INTRODUCTION Randomization, whether in the application of treatments to experimental material or in the selection of units to be observed in a sample survey, is one of the most important contributions of statistics to science. The arguments for randomization are twofold. The first, and most important for science, is that randomization eliminates personal choice and hence eliminates the possibility of subjective selection-bias. The second is that the randomization distribution provides a basis for statistical inference. The question for scientists is whether such statistical inferences are relevant for scientific inference.

In most of applied science any uncertainty in "nature" is represented by a stochastic model of the phenomenon under study. The problems of statistical inference are then the problems of testing the fit of alternative models and of making inferences about the parameters of a given model. Even when randomization is employed the randomization distribution plays no direct role in this type of statistical inference. However, the selection of the units to be studied can affect the inferences, see, for example, Pearson (1903), and Birnbaum et al. (1950). Also of statistical importance is the decision whether or not to report the results of the study. Dawid and Dickey (1977) show that the reporting process can have a considerable impact on the scientific validity of statistical analysis as, for example, when only results which are significant at the 5 per cent level are published.

Observational studies, including most sample surveys, do not require the allocation of units to treatment groups, and when randomization is employed it is solely to determine which units should be observed. In many observational studies the units for study are selected for convenience without any randomization. For example, Doll and Hill (1964) in a prospective study of smoking and disease selected doctors as their units for study. They examined the relationships in their data and reported their conclusions. Any generalization of their conclusions to other populations, for example, to all adults, would have to be based on assumptions about the relationship between the selected units and the other population. Thus the selection of a particular sub-group for study has restricted the range of the inference. Repeating the study with other specially selected groups and obtaining similar results adds weight to one's belief about the general validity of the results but, of course, such studies can never prove that the results are scientifically valid.

Present address: Faculty of Mathematical Studies, The University, Southampton SO9 5NH.

O 1983 Royal Statistical Society 0035-9238/83/146394 $2.00

Page 2: J . R . Statist.Soc. A (1983). 146, Part 4 , p p

Validity of Inferences

If a population of interest is well defined, as in a study of public opinion, then it is usual to design the sample so that the selected units are in some sense representative of the whole population. It is here that random sampling is important because the absence of selection bias adds strength to the belief that a sample is representative. Other forms of sampling, such as balanced sampling, systematic sampling and quota sampling, also claim to give representative samples. This raises the question of when can such non-random samples give useful statistical results?

In this paper we examine the role of the sample selection mechanism in statistical inference when samples are selected from well-defined finite populations. We follow the scientific approach to probability modelling and assume that the values under study, X say, are generated by a para- metric probability model, for example, X is N(p, a2) . It will be shown that random sampling has important properties that are not necessarily shared by non-random sampling schemes, but that under certain conditions non-random sampling, such as quota sampling, can be justified. Whether the conditions are satisfied in any particular case is for the statistician and the other users of the data to consider.

In Section 2 we discuss the types of inference proposed for sample surveys, distinguishing between randomization inference and model-based inference, and between descriptive inference and analytic inference. In Section 3 we adopt the approach in Rubin (1976)and Little (1982)for modelling samples selected from finite populations ~ n d examine some consequences of that approach. In Section 4 we study quota sampling and some other forms of non-random sampling and conditions for the validity of inferences from such samples are established and discussed.

2. INFERENCE F O R FINITE POPULATIONS 2.1 . Basics

A well-defined finite population is one in which the N units in the population are listed in a frame and are identified by labels, which for convenience we can represent by i = 1 , 2, . . .,N. This definition excludes most animal populations, when neither N nor the labels are known, but includes all populations to which strict random sampling can be applied. Associated with the ith unit is a vector of unknown values, Y i , which are to be measured in the survey. In addition some prior knowledge Zi is available for each unit, and this information can be employed to help design a representative sample. The prior information may include quantitative variables such as measures of size and qualitative variables such as membership of a cluster or a stratum. We let Z represent the matrix of prior information for all the units, and Y the matrix of values of the measurement variables.

A sample, s , is a subset of the labels i = 1 , . . .,N, which identifies the units to be observed. It is convenient to introduce a selection indicator variable, Ai, such that

A i = l , i E s ,

Ai = 0, i E2, where ? is the complement of s.

The vector A, = ( A l , A 2 , . . .,AN)T determines which units are selected in the given sample, s. , A sampling scheme, or selection mechanism, is a rule for evaluating A,. In general this rule could depend on the prior information Z, the measurement values Y and an unknown parameter vector $, and so we write it as

AA, I Z, Y;d f ) . (2.1)

Quota sampling is an example of a selection scheme which depends on some of the values in Y as well as on the prior information Z. Non-response is another example, the selection here being of unknown form and depending on Y , Z and possibly on further unknown variables.

2.2. Randomization Inference For standard random sampling schemes A, is a random variable which depends only on Z,

so that

Page 3: J . R . Statist.Soc. A (1983). 146, Part 4 , p p

396 SMITH [Part 4,

is a probability mass function defined on all the subsets s. Furthermore, since Z is known the values of f(As I Z) are known to the person making the selection. The distribution f(As I Z) is the randomization distribution which forms the basis of randomization inference. In randomization inference the measurement values Y are assumed to be unknown constants, that is, they play the role of unknown parameters, whereas in model-based inference they are assumed to have been generated as a sample from a super-population. A descriptive inference is a statistical inference about any known function of Y, for example, about the finite population total or mean of one of the variables in Y. A descriptive inference has the property that if all N units are observed without error then there would be no uncertainty in the inference. It is convenient to call any other inference analytic inference, although the term is often reserved for parametric inferences about wider populations, and includes tests of significance.

Smith (1976) reviews the inferential problems relating to the randomization distribution. Concepts such as sufficiency, likelihood, minimum variance unbiased estimation, all lead to major problems within the randomization framework. Basically the sufficient statistic is the observed data together with the associated labels whch we can write as

By including the labels all we can say is that the units in the sample take their observed values and that the unobserved units can take any set of values in the appropriate parameter space. This is a rather limited inference.

Classical randomization inference ihooses to ignore these problems and to work with the distribution f(As I Z) under repeated sampling using ideas such as unbiasedness, or asymptotic unbiasedness. in order to choose estimators. Havine selected an estimator its randomization variance is calculated and inferences are made by an aipeal to a form of the central limit theorem, see, for example, Madow (1948) and Hajek (1960). It is easy to construct populations for which the central limit theorem does not apply (Smith, 1979), and so the validity of this form of inference requires a restriction on the parameter space of possible values of Y. This restriction needs to be specified for each population and so it is not true that randomization inference is free of assumptions. Finally randomization inference does not address the question of efficiency directly. In fact one of the most famous results in this area is that of Godambe (1955) who shows that in the case of descriptive inference there can be no uniformly minimum variance unbiased estimator.

There are two main responses to these rather strong theoretical criticisms. The first is that inference from a sample survey, or more generally inference about a finite population, is different from other forms of inference and that the theoretical principles used in parametric statistical inference are not applicable. This isolationist viewpoint has some justification but we attempt to show later that standard parametric statistical inference based on models can be usefully applied to sample survey data. The second is more pragmatic and states simply that randomization inference "works". Millions of inferences have been made from thousands of surveys and there has been no public outcry against the results, so what is the problem? One problem is to define "works". since it is rare for true values to be available it is usually impossible to say whether an inference has worked or not. Wildly inaccurate estimates may give rise to considerable suspicion but how can one verify whether a 95 per cent confidence interval has worked or not?

Public opinion polls prior to an election provide a well-known opportunity of assessing the accuracy of survey inferences. In the UK, National Opinion Polls have consistently employed random samples while the Gallup Poll have used quota samples. In Table 1 the survey results for the elections from 1959 to 1979 are listed together with the actual outcomes. Clearly if random sampling "works" then so does quota sampling. So the pragmatic argument is not conclusive.

The final argument against randomization inference is that the randomization distribution is defined on a given sampling frame for a fixed set of measurement variables, Y. Thus only

Page 4: J . R . Statist.Soc. A (1983). 146, Part 4 , p p

Validity of Inferences

TABLE 1 Results of final opinion polls prior to general elections, NOP and Gallup

Lead = Conservative percentage - Labour percentage

NOP Gallup Year o f Actual lead lead Election lead (R/ (Q) Winner

1959 -4.2 - 3.9 - 2 .0 R 1964 1.9 3.1 2 .O Q 1966 7.3 9 .O 11.0 R 1970 -2.4 4.1 7 .O R 1974 (Feb.) 0.6 4.0 2 .O Q 1974 (Oct.) -3.5 -14.5 - 5.5 Q 1979 7.2 7 .O 2 .O R

descriptive inferences can be made to this frame and analytic inference to some wider population requires additional model-type assumptions. In social surveys non-response and missing values add a non-random selection to even the best random sampling schemes and this again destroys the strict basis for randomization inference.

The above discussion shows that randomizatio; inference is fraught with conceptual difficulties. Thus the case for randomization inference is now usually made on the grounds of robustness not on those of statistical efficiency. This argument has a strong intuitive appeal but there is a need to define in what senses randomization inference is robust.

The case for randomization in design is more convincing. The practical requirement that the units to be studied should be seen to be free from selection biases is of far greater scientific importance than philosophical arguments about alternative forms of inference. Randomized selection provides a publicly acceptable method for avoiding selection biases. It is also possible to justify randomization on the grounds of robustness against alternative populations using minimax arguments, see, for example, Blackwell and Girschick (1954) and Scott and Smith (1975). We will see later that robustness arguments for randomization apply also within a model-based approach to inference.

2.3. Model-based Inference As has been said in a model-based approach to either descriptive or analytic inference for a

finite population the measurement variables Y are assumed to have been generated as a sample from a super-population f (Y; 0) . The model provides a relationship between the values Y and its parameters 0 . Thus the observed values, Yj;i Es, can provide both estimates of 0 for analytic inferences and predictions of Yi;i 4s, for descriptive inference. The specification of the distri- butional form for a multivariate, multipurpose survey is a non-trivial task. Ericson (1969) and Sugden (1979) have proposed models based on exchangeability assumptions, whereas Scott and Smith (1969) make classical normal distribution assumptions and they also employ the closely related linear model assumptions. Royall (1970, 1971) and Royall and Herson (1973) pursue the linear model approach. Smith (1976) and Cassel et al. (1977) review these approaches.

The model-based approach is attractive conceptually because it embraces both descriptive and analytic inference. It can also be extended to accommodate missing values and unit non-response (Little, 1982), albeit with strong assumptions.

The arguments against model-based inference for surveys are also very powerful. How can one write down a parametric model for surveys in which hundreds of variables are measured and the population has a large number of identifiable sub-groups such as domains of study, strata and clusters? How robust is a model to departures from its underlying assumptions? Hansen et al. (1982) show that a linear model can be highly susceptible to changes in its mean structure. One answer to these criticisnls from a model-based viewpoint is that for most randomization-based point estimators there is an equivalent model-based estimator (Sarndal, 1978; Smith 1978). The

Page 5: J . R . Statist.Soc. A (1983). 146, Part 4 , p p

398 SMITH [Part 4,

difference between the two approaches lies in variance estimation rather than point estimation. The model-based approach takes the sample units (the labels i) as fixed and averages over the different values of Yithat the model might have generated. The randomization approach fixes the Yivalues and averages over the different values of i that might have been realized by the random sampling scheme. Model-based averaging is over samples like the one actually drawn and seems to have inferential benefits (Holt and Smith, 1979; Royall and Cumberland, 1981). However, Royall and Cumberland show that the linear model variances also have shortcomings and they advocate the use of robust model-based variance estimators based on either regression residuals or on jackknife methods.

3. THE EFFECT O F SELECTION ON MODEL-BASED INFERENCE If sample selection is non-random, whether by design or due to missing values or non-response,

no valid statistical inference can be made using a randomization approach to inference. Practical experience shows that data from social surveys are always subject to non-response and so the analysis of social survey data always requires the statistician to make assumptions beyond those of randomization. A dogmatic statistician who wished to adhere strictly to randomization inference would have to reject social survey data for analysis and retire behind a veil of statistical self-righteousness. As Finney (1974) argues, this will not prevent the scientist from drawing inferences from the data, so would it not be better for the statistician to help the scientist to extract the maximum information from his data? Most statisticians would agree that they should help analyse non-random samples, but that in so doing they should make quite clear the limitations of their conclusions. A model-based approach to inference allows the statistician to analyse non-random samples in a for&al way while at the same time making explicit the under- lying assumptions.

Any without replacement sample selection can be represented by an indicator function A,, as defined in Section 2.1. The general form of the selection function is thenf(A, I Z, Y; 4 ) . The question to answer is what role does this selection function play in a model-based approach to inference? For randomization inference there is no problem sincef(A, I Z) is the only probability distribution defined. A model-based approach on the other hand requires that the statistician should produce a probability model to represent the finite population values Y and the known prior values Z. In general this requires the specification of the conditional distribution

of the population data matrix Y, given the prior matrix Z. Y is an N x p matrix and Z is an N x r matrix, where N and p are usually large in a social survey. Hence the earlier statement that the specification of f(Y I Z; 0 ) is a non-trivial task. Since (3.1) is a conditional distribution, linear regression models, in which Y is a linear function of Z, have an immediate appeal, and the multivariate normal distribution can be employed as a vehicle for generating the appropriate linear model structure. For discrete variables hypergeometric or multinomial distributions may be used (Hartley and Rao, 1968; Royall 1968; Ericson, 1969).

Combining (3.1) with (2.1) gives the general form of the joint model for Y and A, as

It should be noted that A, is the indicator value of the observed sample s and this partitions Y into (Y,, Y;), so that Y, can only be observed when A, has been observed. Writing the data d, from (2.3) as d, = (Y,, A,) we have formally that

f l d s I Z ; 0 , 4 ) = Sf(Ys,YrIZ;4)f(AsIYs,Y$,Z;4)dYi, (3.3)

as in Little (1982). Rubin (1976) examines the question of when the selection scheme AA, I Y, Z; 4 ) can be

ignored for inferences about 0 . If selection is ignored then this is equivalent to working solely

Page 6: J . R . Statist.Soc. A (1983). 146, Part 4 , p p

19831 Validity of Inferences 399

with Y , as data, that is, with

f ( y s I Z; 0 ) = S f(Y7, Y, I Z; 0 ~ Y F . (3.4)

Rubin establishes the conditions under which inferences from (3.3) and (3.4) will be identical from sampling theory, Bayesian and likelihood viewpoints. Sampling theory here refers to repeated sampling from the model distribution (3.4) and not to repeated sampling from the finite population. From a sampling theory viewpoint (3.3) and (3.4) will be identical only if A, is fixed, since otherwise the labels in the integral (3.4) are not well defined. If

f (A, I Y , Z; $1 = f ( A , I Z ; $1, (3.5)

which implies that the selection of units does not depend on the measurement variables Y but only on the prior variables Z, then

fly,, A, I Z; 0 , $) = f ( A s I Z; $) S f(Y.y, YT I Z; 0 ) dY5

= f ( A , I Z; $)f(Y, I Z; 0 1. (3.6)

This suggests that the selection variable A, and the measurement variable Y , satisfy a conditional independence type of factorization, as in Dawid (1979). However, as Rubin (1976) points out, since the distributions are only defined for the given A, the variables are conditionally independent in a probabilistic sense bnly if (3.6) holds for all possible A,, and not just for the given A,.

Provided that the parameters (g are not functions of the parameters 0 , that is $ and 0 are distinct, the sampling distributions generated from (3.4) for given A, will be identical to those generated by (3.6), and so selection can be ignored. Bayesian inferences require in addition that 0 , I$ should be a priori independent but need somewhat weaker conditions on the relationship between Y and A,.

The most important result from our point of view is (3.5). This states that the sample selection should not depend on the values Y if the selection is to be ignored for any model-based inference. Random sampling guarantees that this condition is satisfied and so even in a full model-based approach randomization has a very important role. However, there are many other schemes that satisfy (3.5), including purposive sampling of the units with the largest 2-values(Royal1, 1970) and balanced sampling (Royall, 1971). In fact any method of selection based solely on the prior 2-values will meet condition (3.5). The advantage of randomization is that if a randomized design has been employed no further justification is needed; the whole scientific community will accept the sample that has been selected. With other forms of sampling users would need to be convinced in each case that the sampling scheme could be ignored.

4. INFERENCES FROM NON-RANDOM SAMPLES

4.1. General Considerations Any selection mechanism of the form f (A, I Z; $) can be ignored for model-based inference.

This has led to the suggestion that all sampling schemes can be ignored, but this is clearly untrue because selection schemes which depend on the measurement variables Y are not ignorable. Thus we must examine various classes of non-random sampling schemes to establish the conditions under which the selection can be ignored.

In Section 3 we saw that purposive sampling schemes based only on the Z-values are ignorable. The reason is that the Z-values are assumed known and so any selection based on the Z-values contains no new information. If the analyst does not know the Z-values, however, such as in a secondary analysis of survey data, then the design may carry useful inferential information and cannot be ignored (Scott, 1977). So a design may be ignorable to one statistician who knows the

Page 7: J . R . Statist.Soc. A (1983). 146, Part 4 , p p

400 S M I T H [Part 4,

Z-values but not to another who does not know the Z-values. The only design which contains no useful information, and so is always ignorable, is simple random sampling.

4.2. Observational Studies In an observational study the sample is often chosen for convenience. If we let Z be an

indicator variable which defines the convenience sample, for example the set of doctors, then the selection is of the form f(As I Z). The selection can then be ignored for inferences about 0 based on the conditional distribution of AY, I Z; 0 ) for given s. When Z is an indicator variable this means that inferences can be made within the given sub-group say doctors, but not necessarily to any other groups. In order to make inferences to non-doctors the conditional distribution of flyF I Z; 0 ) must be known. Specifying this distribution is the problem facing the statistician.

A common solution for descriptive inferences about Y is to use post-stratification. Let Y = ( Y 4 , Y m ) , where Y4 are stratification variables and Y m are the measurement variables. Let ( Y f , Y? ) be the values in the sample, s . Post-stratification requires us to examine the conditional distribution

f(Y? I Y f ,z; 6 1. (4.1) If selection on Z , say being a doctor, is to be ignored then we require that

f (Y7 I y:,z; 5 )=f (Y? l y:; 5 1 3 (4.2) that is the variable Z contains no information beyond that in the stratification variables Y f . Analytic inferences about 5 can be made directly from (4.2)but descriptive inferences about Y , involving Y:, also require knowledge of Y:. Often only functions of Y:, such as totals or means, are needed and these are the cases usually covered by post-stratification. Condition (4.2)would appear to be the key assumption in generalizing results from a purposively selected observational study to a wider population, and amounts to a statement that the post-stratifying variables Y4 contain all the inferential information available in the design variables Z. For example, age, sex and social class contains all the information in the label that a person is a doctor, for the variables under study.

4.3. Quota Sampling All forms of quota sampling require the interviewer to fill a quota of selections determined

by certain classifying variables such as age, sex and social class. These variables are not known prior to selection and so can only be determined after an initial selection based on the prior values Z. Let Y4 be the quota variables and Y m the measurement variables, so that Y = (Y4 , Y m ) . We assume that population means or totals are known for Y q . We now have three classes of variables, prior values Z known for all units, quota variables Y4 for which some information is known and measurement variables Y m which are the objects of inference. The decision to measure Yy for unit i now depends not only on whether i is selected but also on whether it satisfies the required quota conditions.

Quota sampling takes many different forms, but the differences are mostly at the first stage of selection before Y4 is measured. Let As be the vector of indicator variables of all units selected regardless of their quota values. In general we have the selection scheme

f(A, I Y , Z ; 0) (4.3) and differences in this scheme determine the differences between the various forms of quota sampling. In some cases the initial selection is tightly specified, procedures being laid down in a central office and involving random selection of clusters, such as constituencies or wards, and a random walk method for selecting individual units. In other cases the control over the selection of 8nal units may be relaxed but will still be based on household interviews. The most criticized forms of quota sampling are those based on "street corner" interviews. Here the interviewer chooses which people to contact prior to' measuring the quota variables. In our framework the

Page 8: J . R . Statist.Soc. A (1983). 146, Part 4 , p p

19831 Validity of In,ferences 401

selection bias will manifest itself in a selection scheme which depends in some unknown way not only on Z but also on Y.

Quota sampling involves a second stage of selection beyond that determined by A,. This second stage depends on both A, and the values Y: of the quota variables. If the interviewer has freedom of choice it may also depend on Y r , or on related variables. Let B,*, s* C s , be the indicator vector for the second stage of sampling, and let

f(B.y* I As,Y,4, Y r ; E, ) (4.4)

denote the selection scheme. The values measured in the final sample are Y$ . The complete set of observed data is

Proceeding as before we have

where q = ($, (,0). Now suppose that for given s, s*,

(i) f (As I Y, Z ; $ ) = f ( A , I Y f , Z ; $1,

and the parameters are distinct. Then '

From this factorization we see that the above conditions would allow sampling theory inferences about O 1 to be made directly from fly$ I Y:, Z; 0 ,) ignoring the method of selection of the units ins*.

Descriptive inferences involving Y$ require knowledge of both Z and Y:. As for observational studies if

then we can ignore Z, which will often be unknown in quota sampling. It may then be possible to make inferences about certain functions of YG using post-stratification if marginal means or totals of Y4 are known.

4.4. Non-response and Missing Values Non-response and missing values can be viewed as a second stage selection given the initial

selection, s. Little (1982) has given a thorough treatment of both problems and his approach has formed the basis of the analysis of other forms of non-random sampling in this paper. One standard method of dealing with non-response when the appropriate information is available is post-stratification. The analysis follows that in Section 4.2. Little points out that in Inany cases the mechanism producing missing values and non-response will depend on the measurement variables Ym and, therefore, cannot be ignored. He proposes some models to represent the relationship between the missing values and the other values and shows that they can provide improvements over other methods. Greenlees et al. (1982) also consider a model-based approach to imputing missing values when selection cannot be ignored. These and related papers from a U.S. National Acadenly of Sciences Symposium on Incomplete Data (1980) give an excellent

Page 9: J . R . Statist.Soc. A (1983). 146, Part 4 , p p

402 SMITH [Part 4,

summary of the present state of the art of dealing with missing values and non-response.

4.5. Discussion Non-random selection methods can be ignored in a model-based approach to inference if

certain conditions are satisfied. The main condition that guarantees ignorability of the initial sample selection is (3.5). Many forms of sampling meet this condition including well-controlled purposive sampling methods and balanced sampling methods, but the most generally accepted method would be any form of random sampling. The condition (3.5) can be seen as a justification for random sampling whenever wide scientific acceptance is required.

Not all random samples are acceptable if prior information, Z, is available. For if a random sample gave a sample estimate of the mean of Z which differed greatly from the known population value then most people would say that that particular random sample was unrepresentative. In such cases further samples could be selected until approximate equality between the sample mean and the known population mean was achieved. Repeated random sampling could be used with a stopping rule that defined when a sample was acceptable. Alternatively purposive sampling could be used to achieve the necessary balance. Royal1 and Herson (1973) show that balanced sampling has desirable robustness properties within a model-based approach to inference.

For many non-random samples post-stratification Is employed as a method for making descriptive inferences to a specific population. This applies to observational studies and to non- response. If selection is to be ignored then condition (4.2) must hold and the selection variable must contain no information beyond that in the post-stratifying variables. The verification of this assumption is the key to valid inferences in observational studies.

Quota sampling is widely used and appears to give satisfactory results in market research. For valid inferences conditions (4.7) and (4.9) should be satisfied and it is interesting to consider the circumstances in which the various forms of quota sampling are employed in terms of these conditions.

Quota sampling which uses approximately random methods for selecting households is most commonly used for surveys in the public sector. Here there is no simple well-defined user and it is reasonable to ask that the sampling method employed should have wide acceptability. Random sampling methods provide this wide acceptability. Those cases where the interviewer is allowed more choice in the selection of units often occur when there is only one major client. In a survey carried out for one client it may be possible for the client and the statistician to agree that the conditions for ignoring sample selection are satisfied, at least approximately. There is no need to satisfy the whole scientific community because the results will not be published. If the results are to be published, as for public opinion polls, then those that publish the results have to be satisfied that the ignorability conditions are met. For voting intentions it may well be that the quota variables age, sex and social class contain all the information that might have been available to the interviewer at the time when the person was selected for interview.

The least controlled "street corner" type of quota sampling is often used for product testing. One particular type of product test is a taste test and again it may be reasonable to conjecture that the physical properties of taste are not related to any variables that an interviewer may use to select a sampling unit, a person, beyond age, sex and social class. So for these physical measure- ments it may be possible to ignore the effects of sample selection. Where the measurements are not physical, such as opinions, it may be less acceptable to give the interviewer so much choice.

By introducing the selection mechanism explicitly into the model-based approach the conditions for ignoring selection can be established. This throws light on many procedures which are currently in use and in particular on the reasons for the wide variety of quota sampling methods. However, if wide acceptability is required random sampling provides the most immediately acceptable sampling method.

All sampling methods may be subject to non-response or missing values and here it is less likely that the conditions for ignoring this special form of selection will be met. Non-response is a form of self-selection and hence can depend on the measurement variables in an unknown way. This

Page 10: J . R . Statist.Soc. A (1983). 146, Part 4 , p p

Validity of Inferences

is one of the most difficult problems both for model-based inference and for randomization inference. One major practical problem with quota sampling methods is that the extent and nature of non-response may be concealed. It is this practical consideration, rather than theoretical problems of ignorability, that creates the weightiest doubts about quota sampling methods.

ACKNOWLEDGEMENT This research was supported by grant number HR7152 from the Social Science Research

Council.

REFERENCES Birnbaum , Z. W., Paulson, E. and Andrews, F. C. (1950) On the effect of selection performed o n some co-

ordinates of a multi-dimensional population. Psychometrika, 15, 2, 191 -204. Blackwell, D. and Girshick, M. A. (1954) Theory of Games and Statistical Decisions. New York: Wiley. Cassel, C.-M., Sarndal, C.-E. and Wretman, J. H . (1977) Foundations of Inference in Survey Sampling. New

York: Wiley. Dawid, A. P., (1979) Conditional independence in statistical theory (with Discussion). J . R . Statist. Soc. B, 4 1 ,

1--31. Dawid, A. P, and Dickey, J . M. (1977) Likelihood and Bayesian inference from selectively reported data. J .

Amer. Statist.Ass., 72, 845-850. Doll, R. and Hill, A. Bradford (1964) Mortality in relation to smoking: ten years' observations of British doctors.

British Med. J . 1964 ( I ) , 1399-1410. Ericson, W. A. (1969) Subjective Bayesian models in sampling,finite populations. J. R. Statist. Soc., B, 31,

195-233. Finney, D. J. (1974) Problems, data and inference: the address of the President (with Proceedings). J.R . Statist.

SOC.A, 1 3 7 , l - 2 3 . Godambe, V. P. (1955) A unified theory pf sampling from finite populations. J . R . Statist.Soc. B, 17, 269-78. Greenlees, J . S., Reele, W. S , and Zieschang, K. D. (1982) Imputation of missing values when the probability of

response depends on the variable being imputed. J.Amer. Statist. Ass., 7 7 , 2 5 1-261. Hajak, J . (1960) Limiting distributions in simple random sampling from a finite population. Pub. Math. Inst.

Hungarian Acad. Sci., 5 , 3 6 1 -374. Hansen, M. H., Madow, W. G. and Tepping, B. J. (1982) An evaluation of model-dependent and probability-

sampling inferences in sample surveys. J.Amer. Statist. Ass. (to appear). Hartley, H. 0. and Rao, J. N. K. (1968) A new estimation theory for sample surveys. Biometrika, 55,547-557. Holt, D. and Smith, T. M. F. (1979) Post stratification. J . R . Statist. Soc. A, 142, 33-46. Little, R. J . A. (1982) Models for nonresponse in sample surveys. .J. Amer.Statist. Ass., 7 7 , 237-250. Madow, W. G. (1948) On the limiting distribution of estimates based on samples from finite universes. Ann.

Math. Statist., 19,535-545. Pearson, K. (1903) On the influence of natural selection on the variability and correlation of organs. Phil. Trans.

Roy. Soc. A, 2 0 0 , l - 6 6 . Royall, R. M. (1968) An old approach to finite population sampling theory. J . Amer. Statist. Ass., 6 3 ,

1269-1279. Royall, R. M. (1970) On finite population sampling theory under certain linear regression models. Biometrika,

57,377-387. -(1971) Linear regression models in finite population sampling theory. In Foundations of Statistical

Inference (V. P. Godambe and D. A. Sprott, eds.), Toronto: Holt, Rinehart and Winston. Royall, R. M. and Cumberland, W. G. (1981) An empirical study of the ratio estimator and estimators of its

variance. J.Amer. Statist. Ass., 76,66-88. Royall, R. M. and Herson, J. (1973) Robust estimation in finite populations I. J. Amer. Statist. Ass., 6 8 ,

880-889. Rubin, D. B. (1976) Inference and missing data. Biometrika ,63,581-592. Sarndal, C.-E. (1978) Design-based and model-based inference in survey sampling. Scand. J. Statist., 5 , 27-52. Scott, A. J. (1977) On the problem of randomisation in survey sampling. SankhyZ, C, 39, 1-9. Scott, A. J. and Smith, T. M. F. (1969) Estimation in multistage surveys. J.Amer. Statist. Ass., 64,830-840. -(1975) Minimax designs for sample surveys. Biometrika, 62, 353-357. Smith, T. M. F. (1976) The foundations of survey sampling: a review. J. R . Statist. Soc. A, 139, 183-204. -(1978) A model building approach to survey analysis. European Meeting of Statisticians, Oslo, August

1978. -(1979) Statistical sampling in auditing: a statistician's viewpoint. The Statistician, 28, 4, 267-280. Sugden, R. A. (1979) Inference on symmetric functions of exchangeable populations. .J. R.Statist.Soc., B, 41,

269-273. U.S. National Academy of Sciences (1980) Panel on incomplete data. National Academy of Sciences,

Washington, D.C.