Muthen and Shedden - Mixture Modeling With Mixture Outcomes

  • Upload
    puer-ma

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

  • 8/6/2019 Muthen and Shedden - Mixture Modeling With Mixture Outcomes

    1/7

    BIOMETRICS5, 463-469June 1999

    Finite Mixture Modeling with MixtureOutcomes Using the EM AlgorithmBengt Muth6n

    Graduate School of Education and Information Studies and Department of Statistics,University of California, Moore Hall, Box 951521, Los Angeles, California 90095-1521, U.S.A.email: [email protected]

    andKerby Shedden

    Department of Statistics, University of California, Los Angeles

    SUMMARY.This paper discusses the analysis of an ex tended finite mixture model where the laten t classescorresponding to t he mixture components for one set of observed variables influence a second se t of observedvariables. The research is motivated by a repeated memurement study using a random coefficient modelto assess the influence of latent growth trajectory class membership on the probability of a binary diseaseoutcome. More generally, this model can be seen as a combination of latent class modeling and conventionalmixture modeling. The EM algorithm is used for estimation. As an illustration, a random-coefficient growthmodel for the prediction of alcohol dependence from three latent classes of heavy alcohol use trajectoriesamong young adults is analyzed.KE Y WORDS: Growth modeling; Latent class analysis; Latent variables; Maximum likelihood; Trajectoryclasses

    1. IntroductionThis paper proposes an extended finite mixture model thatcombines features of Gaussian mixture models and la tent classmodels. Analysis of this model is carried out using maximum-likelihood estimation with the EM algorithm and bootstrapstandard errors. T he research is motivated by an alcohol stud yconcerned with the longitudinal development of heavy drink-ing and its relation to alcohol dependence.Motzvatang ExampleA conventional random-coefficient repeated measurementmodel was initially used to describe heavy drinkingdevelopment by a quadrati c growth model. For individual z ofage at at time t , using centering of age, and letting 77 denotea random coefficient,Yz t = 771%+ ( a t - )7722 + (a t- q2773%+ t t , t = L 2 , . ,T ,

    (1)where yz t is a measure of heavy drinking for individual i atage a t , 771 is the intercept, 772 is the linear rate, 773 is thequadratic growth rate, and t is a residual. Using the estimatedmeans of th e 7 coefficients, this analysis showed that heavydrinking among young adults typically accelerates from 18 t o2 1 years of age and decelerates thereafter , which is in line withnormative development found by alcohol researchers. Using77 values different from the means, other classes of trajectoryshapes recognized in the alcohol litera ture were also seen, such

    as trajectories th at are high at age 18 and either stay high ordecrease with age and trajectories that increase from age 18and show no downturn.

    Of primary interest is how to model the relation betweenthe shape of the heavy drinking trajectory for an individualin the 18-25-year age range and the probability of alcoholdependence at age 30. This could be done using logisticregression of dependence on the three 7 coefficients in (1).This is problematic, however, because a given 77 coefficientassumes different meanings depending on the value of theother 77 coefficients. A better approach, one that more clearlyreflects that the curve shape has predictive value, is providedby an extended finite mixture model. This model allows thejoint estimation of (i) a conventional finite mixture growthmodel where different curve shapes are captured by class-varying random coefficient means and (ii) a logistic regressionof alcohol dependence on the classes.

    In contrast t o this approach, a three-step procedure isneeded using conventional modeling techniques: (i) estimatingthe conventional finite mixture growth model, (ii) estimatingeach individuals most likely class membership based onthe posterior probabilities for the classes derived from theestimated model in step (i), and (iii) regressing alcoholdependence on the estimated class membership. This three-step procedure introduces estimation errors in step (ii) byforcing each individual to be classified into a single class,

    463

  • 8/6/2019 Muthen and Shedden - Mixture Modeling With Mixture Outcomes

    2/7

    464 Biornetrics, June 1999

    whereas each individual typically has a nonzero posteriorprobability for each class.

    The extended finite mixture model allows three generali-zations. First, the class membership probability is allowedto vary as a function of covariates. This is useful in alcoholstudies, where it is known that white males are more likelyto have a high level of heavy drinking at age 18. Second, foreach class, the values of the q coefficients are allowed to beinfluenced by covariates. Third, the latent class variable isallowed to predict more than one binary outcome variable.

    The case of multiple binary outcome variables predictedby the latent class variable warrants special attention. Thismodel feature relates to latent class analysis (cf., Clogg,1995). The alcohol study provides a n interesting application ofmultiple binary outcomes. Drawing on the alcohol literature,the age 18-25-year growth curve classes can be viewedas representing different developmental pathways, some ofwhich manifest themselves as deviant behavior alreadyin adolescence. Using adolescent observations of deviantbehavior, such as early onset of regular drinking and droppingout of high school, to form multiple binary outcomes, theseoutcomes can therefore be viewed as early indicators ofthe latent classes, where alcohol dependence is a laterindicator. Together with the heavy drinking ys of equation(1) and the background variables, these binary outcomescontain information about the laten t class membership. Givenan estimated model, an individuals observations on theseoutcomes can be used to compute posterior probabilityestimates for different classes before the individual reaches age25 . This suggests for which individuals an intervention maybe beneficial in order to avoid heavy drinking developmenttypical of the nonnormative classes and to reduce the risk ofdeveloping alcohol dependence problems.2. The Extended Finite Mixture ModelThe following model incorporate s th e ideas presented in t heIntroduc tion. Consider a pdimensional vector y of continuousvariables and an r-dimensional vector u of binary variables,which are related to each other and to a q-dimensional vectorx of covariates. The three se ts of observed variables are re latedto each other via two vectors of unobserved variables, an m-dimensional vector 1) of latent continuous variables, and aK-dimensional vector c of latent categorical variables. Here,ci = ( c i l , . . , ~ K ) as a multinomial distribution, whereC i k = 1 f individual z falls in class k and is zero otherwise.

    Consider t he set of continuous observed variables y relatedto the continuous latent variables q for individual i,

    ~i = Ayqi + ~ i , ( 2 )where A y is a p x m matrix of parameters and ei is aresidual vector that is uncorrelated with othe r variables inthe model and is normally distributed with mean zero and adiagonal covariance matr ix 0.The continuous lat ent variables1) are related t o th e categorical latent variables c and to theobserved covariate vector x by the relations

    qi = Aci + rqxi +

  • 8/6/2019 Muthen and Shedden - Mixture Modeling With Mixture Outcomes

    3/7

    Finite Mixture Modeling 465

    3.1 The Complete-Data LikelihoodMaximization of (9) can be simplified by using the EMalgorithm (cf., McLachlan and Krishnan, 1997). Here, thecontinuous latent variable observations 171,772, . . . ,v n , an dthe categorical latent variable observations c1, c 2 , . . , n ar eviewed as missing da ta. Given the model, th e complete-datalog likelihood can then be written as

    n1% L C =cc 1 4 C i xi1 + lOd77i I cz,xi1 + log[Y-, 1 7 2 1

    i= 1 + w u i C i l L (10)where

    n n K

    i= l i= l k = landn n r K

    i= 1 z = 1 j=1 k=l + (1- U i j ) l o d l - T i j k l l ,(12)

    where 7 i j k is given in connection with (5). The remainingterms correspond to normal densities given by the model.3.2 The E-StepThe EM algorithm maximizes th e expected complete-data loglikelihood (10) given the d ata on y , x ,u. In t he E-step, we findthe conditional expectations of cik, s,,, see, svs ,s,,, so,and S using the notation Szw = l/nCT=l ziwi. We notethat

    [ C i , 7 7 i I YZ, Xi , Ui l = 1% I YiIXi,Ui1[77i I C i , Y i , X i , U i I . (13)Corresponding to the first term, t he posterior t erm for ci, weneed the expectation E(c2k I y i , x i , u i ) ,

    p(czk = 1 I Y z , x i , u i)= 7rikNp[hy(fXkf Tv Xi ) , Ay*flL -t01

    x [ui I cik = l ] / [Yi , Ui I xi], (14)where [yi,ui I xi] is given in (9). Let pik denote the posteriorprobability of (14). The resulting E-step quantities are givenin the Appendix.3.3 The M-StepThe M-step of the EM algorithm is defined as follows.Inserting the posterior probabilities p2k of (14) in ( l l ) , heM-step maximizes

    with respect t o the parameters of a c r ,.This may be seen asa multinomial logistic regression with fractional observationspik. Similarly, inserting pi,+ in (12), the M-step maximizes

    n r K

    i= l j= 1 k = l

    with respect to the parameters of A,. This may be seenas a type of multivariate-response logistic regression withobservations uz3 and weights p,k. We have found t ha t th e EMalgorithm converges nicely even if only one or two Newton-Raphson steps are taken in the two logistic regressions (seealso generalized EM as discussed in McLachlan and Krishnan,1997). The M-step estimates for the param eter arrays !P, r,,A y , and 0 re obtained in closed form and are given in theAppendix.3.4 CommentsThe model is not identified without some parameterrestrictions, which have to be determined for a givenapplication. As a well-known example drawing on factoranalysis (cf., Lawley and Maxwell, 1971), at least m2restrictions need to be imposed on the elements of A yand/or !P. It is difficult to give rules for identification of thegeneral model (cf., Titte rington, Smi th, and Makov, 1985). Aheuristic approach to understanding the identification sta tusof a particular case of the model is to divide the model intoits parts. As an example, for the growth model in ( l ) , llelements of Ay are fixed so that more than m2 restrictionsare imposed. Here, the y , 77,c par t of the model is the mixed-effect mixture model of Verbeke and Lesaffre (1996). Thecovariance matrices 9 nd 0 re class invariant, while therandom coefficient 77 means of A vary across classes with classprobabilities determined by a,. The identification thereforeconcerns a multivariate normal mixture for y with class-invariant covariance matrix and with means tha t are functionsof the class-varying means of the reduced dimension m ofthe underlying 7. rowth curve dat a with clearly separatedgrowth forms are likely to be able to identify such meanmixtures. Adding x to the model introduces the parameterarrays rc and r,, for which the joint distribution of yan d x carries information. The elements of I', affect theprobabilities of c, where c class membership alters the meansA of 77 and ultimately the means of y . For a certain xvalue, a certain mixture of the class-specific means A of 77is obtained, while at a different x value, a different mixture ofth e means is obtained. T he model is identifiable as long as th eresulting y means are different and there are more distinct xcombinations than corresponding parameters. The elementsof r,,concern effects of x on 17 given c. r, and r, affect themodel differently in th at the elements of r,,affect the 77 meansan equal amount for each class, while the elements of r, affectthe 17 means through a mixture. When the y , c , x part of themodel is identified, the identification of the u, c , x part of themodel concerns only the parameters of A,. Information onthese parameters is obtained from the marginal distributionof u and from the joint distribution of u and x. The marginaldistribution of u may not be sufficient to identify A , unlessthere are many u variables relative to the number of classes(cf., Clogg, 1995). Each row of A, is, however, identifiablefrom the joint dis tribution of each u nd x as in finite mixturelogistic regression with class-varying intercepts (see Follmanand Lambert, 1989, 1991).

    In some applications, i t is of inte rest t o generalize the modelin ( 2 ) , ( 3 ) , (4), (5), and (6 ) by allowing the inclusion of directeffects from x to y and from x to u. The former corresponds tousing time-varying covariates in a growth model setting, whilethe la tter will be shown useful in our application. To includedirect effects from x to u, we extend the logit expression in

  • 8/6/2019 Muthen and Shedden - Mixture Modeling With Mixture Outcomes

    4/7

    466 Biometrics, June 1999

    (5) tologit(r,) = A,c, + KUxZ, (17)

    where, more generally, K, can also be allowed to vary acrossk , with k = 1,2, . . ,K .

    We may generalize the model to allow for class-specificeffects r, in (3), e.g., by extending (3) to

    77, = Ac, + r;c:x, + Cz, (18)where r; = (rq1. . I rqK)nd c: = c, @cIp.Class-specificcovariance matrices rk and 0 may also be of interest.

    Standard errors of the estimates are obtained using abootstrap procedure, where sampling an observation withreplacement n times from the sample of size n is replicated200 times. Reasonably stable standard error estimates areobtained already after 50 replications. Here, th e entire vector( u ~ , y ~ , x ~ )s bootstrapped rather than holding the 5 partfixed. While x, is a covariate vector, in the applicationsconsidered here, it is a random vector, for which no modelstructure is imposed. A comparison of model fit across modelswith different numbers of classes is achieved by using BIC(Bayesian Information Criterion; Schwarz, 1978).4. A Growth Curve Appl icat ionThis section returns to the alcohol research question discussedin the Introduction, i.e., what the influence of membership indifferent growth curve classes for heavy drinking from ages18 to 25 is on alcohol dependence at age 30. Alcohol datafrom the National Longitudinal Survey of Youth (NLSY), anationally representative household survey of young adultsliving in the U.S. in 1979, are used. The heavy drinkingvariable is obtained from the question How often have youhad 6 or more drinks on one occasion during the last 30 days?The variable is scored 0 (never), 1 (once), 2 (2 or 3 times),

    3 (4 or 5 times), 4 (6 or 7 times), 5 (8 or 9 times), and 6 (10or more times). Th e vector y in equation (2) consists of heavydrinking measured at 18,L9,20,24 , and 25 years for the NLSYcohort born in 1964. The growth model of (1) is expressed infinite mixture form by equations (2) and (3))arranging thevector yz s yz= ( ~ ~ 1 ,z2,. . . , ~ 5 ) nd defining 7)as th e 3x 1vector of intercept, linear, and quadratic coefficients and the5 x 3 matrix Ay as having columns of constants 1, ( a t - b),and (u t - i )2 .The time scale is centered so tha t the interceptrefers to t he approximate peak for the normative trajectory,ti = 21.2. As a starting point, a two-class model ( K = 2)is postulated. In equation (3) for 17, the vector x consistsof four covariates scored zero or one: male, Black, Hispanic,and FH123. Here, FH123 is scored one if the respondent hasa first-degree relative and a second- or third-degree relativewith alcohol problems. In equation (5) for u, three binary uvariables are included. The first u , DEP, captures a diagnosisof alcohol dependence at age 30. This diagnosis is basedon alcohol-dependent behaviors such as giving up importantsocial or work-related functions in favor of or as a consequenceof drinking. The second u, S, is scored one if the respondentstarted drinking regularly at or before age 1 4 The third u ,HS, is scored onc if the respondent did not complete highschool by age 22. In equation (6) for c , the same set of 2variables is used as in (3) for 77 . The sample size is 935. Th e yvariables are not normally distributed in this application. Toinvestigate t he sensitivity of the modeling to this assumpt ion,an analysis of variables transformed as log(1 + y) was alsocarried out and gave very similar results.

    The two-class mixture model results in two local solutionswhen using different starting points for the E M algorithm.The estimates for the majority class showing the normativecurve is very similar in the two solutions. However, the twosolutions find different minority classes, one with a curve that

    1 I I I18 20 22 24

    AgeFigure 1. Estimated three-class curves.

  • 8/6/2019 Muthen and Shedden - Mixture Modeling With Mixture Outcomes

    5/7

    Finite Mixture Modeling 467

    Table 1Marginal 3 t of the three-class modelES HS DEP Observed Fitted Frequency0 00 00 10 11 01 01 11 1

    0.6520.0700.1130.0270.0880.0200.0250.005

    0.6370.0810.1260.0190.0950.0170.0210.005

    61065

    106258219235

    Table 2Es t imates fo r the growth fac tor s ofq (standard errors in parentheses)

    Growth factorParamete r Intercept Linear rate Quadratic rate

    A HighNormMaleBlackHispFH123InterceptLinearQuadratic

    UPrv

    P

    2.263 (0.456) -0.268 (0.053)2.655 (0.556) 0.456 (0.062)0.829 (0.102) 0.007 (0.012)0.779 (0.133) 0.048 (0.015)

    -0.762 (0.124) 0.012 (0.014)-0.675 (0.137) -0.003 (0.018)0.076 (0.169) 0.019 (0.024)

    1.191 (0.206)0.025 (0.020) 0.062 (0.003)-0.076 (0.014) -0.001 (0.002)

    0.027 (0.043)-0.012 (0.045)-0.041 (0.008)-0.043 (0.010)0.043 (0.010)

    0.041 (0.011)0.002 (0.014)

    0.005 (0.001)

    is high at age 18 with a subsequent decrease and one witha curve that accelerates from age 18. A three-class solutioncapture s the three classes represented in both two-class solu-tions and is therefore preferred. The three estimated curvesfor the three-class solution are shown in Figure 1 using theestimated mean of 7 or the subgroup of z = (0000), whitefemales with no family history of alcohol problems. The loglikelihood value for the three-class solution is -4934.91. Incomparison, th e log likelihood value for th e two-class solutionthat includes a curve that goes down is -5034.27, while thelog likelihood value for the two-class solution t ha t includes acurve that goes up is -5121.72. The corresponding three BICvalues are 10,218.69, 10,376.36, and 10,551.26, suggesting achoice of three classes over two classes.A relatively strong assumption in the model is tha t th e setof u ariables depends on the zs only through c as indicatedin (5) and (6). A plausible alternative is that there are di-rect effects from some zs to some us as in (17), i.e., someus differ in their probabilities not only as a function of classmembership but also as a function of their covariate charac-teristics for given class membership. In th is application , thereare 12 such direct effects th at could be included. Explorationsof these effects show that only five of them are significant us-ing the improvement in the log likelihood as criterion. Thedirect effects are for male to DEP, for FH123 to DEP, forBlack to ES and to HS, and for Hisp to HS and are all ofexpected signs. For the final three-class model, th e log likeli-hood value is -4909.81, where the improvement in fit relative

    Table 3Estima tes for the latent class memb ership

    of c (standard errors an parentheses)Parameter High versus norm Up versus norm

    -2.510 (0.303) -3.282 (0.444)r,Male 1.293 (0.250) 1.419 (0.393)

    BlackFH123 0.760 (0.353) 0.651 (1.202)

    -1.416 (1.054) -0.539 (0.594)Hisp 0.015 (0.307) -0.476 (0.538)

    to the three-class model without direct effects is significant,corresponding to a likelihood ratio chi-square value of 50.2with 5 d.f.A simple empirical approach t o checking the local identifi-ability of th e estimated model is to sta rt from a model th at isknown to be identified an d check whether adding a parameterchanges the observed-data log likelihood. For the five directeffects, this check was just accomplished by the chi-squaredifference test. Setting the 12 parameters of r, to zero givesa chi-square difference value of 127.16; setting the eight pa-rameters of r, to zero gives a chi-square difference value of71.54; and setting the nine parameters of A, to zero gives achi-square difference test of 1090.5.

    For the final three-class model, Table 1 shows that themarginal table for ES, HS , and DEP is well fitted. The ymeans also fit well. The three curves are almost exactly thesame as in Figure 1.The notation High, Up, and Norm will beused for the classes corresponding to these three curve shapes,where High refers to those who are high at age 18, Up refersto those who accelerate thei r use, and Norm refers to the nor-mative curve. The three classes have estimated proportions0.120, 0.063, and 0.817.

    Table 2 shows th e estimates for the growth factors of q cor-responding to (3). Table 3 shows the estimates for the latentclass membership corresponding t o (6). Table 4 shows the es-timates for y an d u corresponding to (2) an d (5) with directeffects K, corresponding to (17). In Table 5, these estimateshave been translated into conditional probability estimates forthe three u ariables DEP, ES, and HS given class member-ship, evaluated at z = (0000). The DEP probabilities varygreatly as a function of latent class membership, and ES andHS are indicative of membership in the nonnormative classes.The normative class constitutes about 89% of all individualswith z = (0000) , while the High and Up classes constitute 7and 4%, respectively. In contr ast, for z = (10 0 l ) ,white maleswith family history, there are only 51% in th e normative classand 33 and 16% in the High and Up classes, respectively.For this group, th e D EP probabilities still vary greatly acrossthe latent classes: 0.378, 0.560, an d 0.194, for the High, Up,and Norm classes, respectively. This analysis illustrates theexplanatory power of the latent class variable.5. ConclusionTh e extended finite mixture model proposed here offers a flex-ible analysis framework. On the one hand, the model may beseen as a generalization of latent class analysis, so that theemphasis of the analysis is not only on the indicators of the

  • 8/6/2019 Muthen and Shedden - Mixture Modeling With Mixture Outcomes

    6/7

    468 Biometrics, June 1999

    Table 4Estamates for heavy drankang y and bznary outco mes u (standard errors zn parentheses)Parameter

    ~4 Fixeddiag(0) 0.598 (0.130) 1.006 (0.109) 0.980 (0.109) 1.147 (0.138) 0.525 (0.130)

    High UP NormA, DEPESHS

    -1.903 (0.338) -1.159 (0.427) -2.820 (0.202)-0.436 (0.266) -1.243 (0.457) -2.015 (0.137)-1.584 (0.274) -1.440 (0.383) -2.205 (0.182)

    Male Black Hisp FH123KU DEP 0.719 (0.251) 0.689 (0.296)

    HS 0.658 (0.240) 1.153 (0.237)ES -0.738 (0.274)

    Table 5Estimated probabilities for ua

    High UP Norm~ ~ ~DEP 0.131 0.240 0.056

    ES 0.389 0.227 0.117HS 0.169 0.188 0.099Class probabilities 0.073 0.035 0.892

    a Evaluated at z = (0000)

    latent classes but also on incorporating other model partsand outcomes. On the other hand, th e model may be seen asa generalization of conventional Gaussian mixture modelingwhere mixture indicators have been added. The EM algorithmwas found to be a practical tool for estimation of the type ofmodels considered. As an illustration of the broader analysispotentia l, the extended finite mixture model was found usefulfor random-coefficient growth modeling in the presence of sev-eral growth classes. Here, a mixture outcome was measured ata later time point and the latent growth classes were used aspredictors of this outcome. Repeated measurement randomcoefficient modeling with mixtures avoids the normality as-sumption typically used for the random coefficients (see alsoVerbeke and Lesaffre, 1996). A drawback is that multiple so-lutions are often found so that multiple starting points arenecessary.

    ACKNOWLEDGEMENTSThis research was supported by grants 1 KO2 AA 00230-01 and 1 R21 AA10948-01A1 from NIAAA and 40859 fromNIMH. The work has benefitted from discussions in Hen-dricks Browns Prevention Science Methodology group andMuthkns Methods Mentoring Meeting. We thank Tom Har-ford for suggestions regarding the NLSY application, JasonLiao and Linda Muthdn for useful comments, and Noah Yangfor helpful research assistance.

    RESUMBDans cet article, nous discutons dun modkle de mklange finidans lequel les classes latentes correspondant aux composantes

    du mklange pour un ensemble des variables observkes influen-cent un second ensemble de variables observkes. La motivationde cette recherche provient dune ktude de mesures rkp6tdesutilisant un modkle B coefficient alkatoire pour determinerlinfluence de lappartenance B une classe de trajectoire decroissance latente sur la probabilit6 de lissue dune maladiebinaire. P lus gknkralement, on peut voir ce modkle comme unecombinaison de modklisation de classe latente et de modklisa-tion dun mklange conventionnel. Pour lestimation, nous uti-lisons lalgorithme EM. Comme illustration, nous analysonsun modkle de croissance B coefficient alkatoire pour prkdirela ddpendance alcoolique B partir de trois classes latentes deforte consommation alcoolique chez de jeunes adultes.

    REFERENCESClogg, C. C . (1995). Latent class models. In Handbook of Sta-tistical Modeling fo r the Social and Beha vioral Sciences,

    G. Arminger, C. . Clogg, and M. E. Sobel (eds), 311-359. New York: Plenum.

    Follman, D. A. and Lambert, D. (1989). Generalizing logis-tic regression by nonparametric mixing. Journal of theAmerican Stat is t ical Associatzon 84, 294-300.

    Follman, D. A . and Lambert, D. (1991). Identifiability of fi-nite mixtures of logistic regression models. Journal ofStatistical Planning and Inference 27, 75-381.

    Lawley, D. N. and Maxwell, A. E. (1971). Factor Analysis asa Stat is t ical Method. London: Butterworths.

    McLachlan, G . J. and Krishnan, T. (1997). The E M A l g or it h mand Extensions . New York: John Wiley and Sons.Schwarz, G. (1978). Estimating t he dimension of a model. T heA nna l s of Statistics 6 , 461-464.

    Titterington, D. M., Smith, A . F. M., and Makov, U . E.(1985). Statistzcal Analysis of Finite Mixture Distribu-t ions . Chichester, U.K.: John Wiley and Sons.

    Verbeke, G. and Lesaffre, E. (1996). A linear mixed-effectsmodel with heterogeneity in the random-effects popula-tion. Journal of the Am erica n Stat is t ical Assoc iat ion 91,2 17-22 1 .

    Received October 1997. Revised May 1998.Accepted July 1998.

  • 8/6/2019 Muthen and Shedden - Mixture Modeling With Mixture Outcomes

    7/7

    Finite Mixture Modeling 469

    APPENDIXThe E-StepLet pi = (p i l , . . pi^)' and let S denote th e conditional ex-pectation of S. This defines the E-step quantities for the miss-ing data involving c ,

    nScc = l / n diag(pi) (19)

    i= land

    n

    i= 1The average posterior probability for class k is

    i = IFurthermore, for the missing data involving 17,

    where

    c = VA&O-'S,,O-~A,V, (24)D = v~-lr,s,,o-ln, VS-~AS, ,O-~A, , (25)

    sy,= S,,O-~A, + (sy,r;+ S ~ , A ' ~ - ~ V ,26)s,, = v(!j-l(r,s,, AS,,) +A ~ O - ~ S ~ , ) ,27)

    and

    SVc= V(9-'(rvSxp AScc)+ L I ~ O - ~ S ~ ~ ) .28 )The M-StepThe M-step for 9 and r, is obtained using the S matricesfrom the E-step. Maximizing with respect to the regressioncoefficientsr, and A gives

    @,.A) = (s ,zS,c)s& (29)

    9 = s,, f rvs,,r; $. ASCCA'- s,,r; - r ,s , , - SVCA'where S,,c is the joint covariance matrix for (x, ) .Maximiz-ing with respect to the covariance matrix 9 gives

    A -

    -ASc, + r , S x cA ' +Ascxr ; . (30)Th e M-step for Ay and 0 s obtained asA, = s,,s;;, (31)

    while, noting tha t 0 s assumed to be diagonal;6 = diag(S,, - Sy,li&- A y S V y+AgS,vxy). 32)