Power analysis methods for tests in latent class and latent Markov

Tilburg University

Power analysis methods for tests in latent class and latent Markov models

Gudicha, Dereje

Document version:Publisher's PDF, also known as Version of record

Publication date:2015

Link to publication

Citation for published version (APA):Gudicha, D. (2015). Power analysis methods for tests in latent class and latent Markov models Ridderkerk:Ridderprint

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

- Users may download and print one copy of any publication from the public portal for the purpose of private study or research - You may not further distribute the material or use it for any profit-making activity or commercial gain - You may freely distribute the URL identifying the publication in the public portal

Take down policyIf you believe that this document breaches copyright, please contact us providing details, and we will remove access to the work immediatelyand investigate your claim.

Download date: 02. Apr. 2018

https://pure.uvt.nl/portal/en/publications/power-analysis-methods-for-tests-in-latent-class-and-latent-markov-models(fec32acf-48c7-4745-8577-e0ef2ff1349d).html

POWER ANALYSIS METHODS FOR TESTS IN LATENT CLASS AND LATENT

MARKOV MODELS

c© 2015 Dereje W. Gudicha. All Rights Reserved.

Neither this thesis nor any part may be reproduced or transmitted in any form or by any

means, electronic or mechanical, including photocopying, microfilming, and recording,

or by any information storage and retrieval system, without written permission of the

author.

The research presented in this thesis was supported by a grant from The Netherlands

Organization for Scientific Research (NWO, grant number 406-11-039).

Printing was financially supported by Tilburg University.

ISBN: 978-94-6299-150-7

Printed by: Ridderprint BV, Ridderkerk, The Netherlands

Cover design: StudioLIN

POWER ANALYSIS METHODS FOR TESTS

IN LATENT CLASS AND LATENT MARKOV

MODELS

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan

Tilburg University op gezag van rector magnificus,

prof.dr. E.H.L. Aarts, in het openbaar te verdedigen

ten overstaan van een door het college voor promoties

aangewezen commissie in de aula van de Universiteit

op woensdag 7 oktober 2015 om 10.15 uur

door

Dereje Waktola Gudicha

geboren op 18 maart 1982 te Yaya Gulele, Ethiopie

Promotor: prof. dr. J.K. Vermunt

Copromotors: dr. V.D. Schmittmann

dr. F.B. Tekle

Overige leden van de Promotiecommissie: prof. dr. J. de Vries

prof. dr. M.J. de Rooij

prof. dr. C.V. Dolan

dr. D.L. Oberski

dr. M. Moerbeek

Contents

List of Tables ix

List of Figures xi

1 Introduction 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 The most important parameters and hypotheses . . . . . . . . . . . . . . 3

1.2.1 Measurement parameters . . . . . . . . . . . . . . . . . . . . . . 3

1.2.2 Structural parameters . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.3 Transition parameters . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.4 Number of classes . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 More on power analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Outline of the dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Power and Sample Size Computation for Wald Tests in Latent Class

Models 11

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

v

vi CONTENTS

2.2 The LC model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Wald based power analysis for LC models . . . . . . . . . . . . . . . . . 18

2.3.1 The Wald statistic and its asymptotic properties . . . . . . . . . . 18

2.3.2 Power and sample size computation . . . . . . . . . . . . . . . . 20

2.3.3 Design factors affecting the power or the required sample size . . 23

2.4 Numerical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4.1 Numerical study set up . . . . . . . . . . . . . . . . . . . . . . . 26

2.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4.3 Performance of the power computation procedure . . . . . . . . . 29

2.5 Discussion and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.6.1 Elements of the information matrix in an LC model for binary

responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.6.2 An example of the Latent GOLD setup for Wald based power

computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 Statistical Power of Likelihood-Ratio and Wald Tests in Latent Class

Models with Covariates 37

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2 The LC model with covariates . . . . . . . . . . . . . . . . . . . . . . . 41

3.3 Power and sample size computations . . . . . . . . . . . . . . . . . . . . 44

3.3.1 Calculating the non-centrality parameter . . . . . . . . . . . . . . 45

3.3.2 Power computation . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3.3 Sample size computation . . . . . . . . . . . . . . . . . . . . . . 47

3.4 Numerical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4.1 Study set up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49


CONTENTS vii

4 Power Computation for Likelihood-Ratio Tests for the Transition Parameters

in Latent Markov Models 61

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2 The LM model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.2.1 Hypotheses specified on transition parameters . . . . . . . . . . . 68

4.2.2 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . 69

4.3 The likelihood-ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.4 Power computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.4.1 The standard case . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.4.2 The non-standard case . . . . . . . . . . . . . . . . . . . . . . . 75

4.5 Design factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.6 Numerical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77


4.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79


4.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.8.1 Latent GOLD syntax for power computation . . . . . . . . . . . . 89

5 Power Analysis for the Likelihood-Ratio Test in Latent Markov Models:

Short-cutting the Bootstrap p-value Based Method 97

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.2 The LM model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.3 Power analysis for the BLR test . . . . . . . . . . . . . . . . . . . . . . . 105

5.3.1 Power computation . . . . . . . . . . . . . . . . . . . . . . . . . 106

5.3.2 Sample size computation . . . . . . . . . . . . . . . . . . . . . . 109

5.4 Numerical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110


5.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112


viii CONTENTS

6 Summary and discussions 121

6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6.2 Direction for future research and study limitations . . . . . . . . . . . . . 126

6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

References 129

Acknowledgments 141

List of Tables

1.1 Important parameters of latent class and latent Markov models . . . . . . 3

2.1 Entropy based R-square values for different combinations of latent class-

specific design factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.2 Estimated power for different class separation levels and different sample

sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3 Required sample size for different configurations of latent class-specific

design factors and different power levels . . . . . . . . . . . . . . . . . . 28

2.4 Theoretical and simulated power of the Wald test . . . . . . . . . . . . . 30

3.1 The computed entropy R-square for different design cells . . . . . . . . . 50

3.2 The power of the Wald and the likelihood-ratio test to reject the null

hypothesis that the covariate has no effect on class membership in the

2-class model; the case of equal class proportions . . . . . . . . . . . . . 52


hypothesis that the covariate has no effect on class membership in the

3-class model; the case of equal class proportions . . . . . . . . . . . . . 53

ix

x LIST OF TABLES


hypothesis that the covariate has no effect on class membership; the case

of unequal class proportions and six indicator variables . . . . . . . . . . 54

3.5 Sample size requirements for the Wald test when testing the covariate

effect on class memberships for different power levels, class-indicator

associations, number of indicator variables, number of classes, class

proportions, and effect sizes. . . . . . . . . . . . . . . . . . . . . . . . . 55

3.6 Theoretical versus empirical (H1-simulated) power values for the Wald and

likelihood-ratio tests to reject the null hypothesis that the covariate has

no effect on class membership, given the design conditions of interest . . 56

4.1 Typical hypotheses formulated on the transition parameters of the latent

Markov model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.2 The power of the likelihood-ratio test to reject the null hypothesis that

πr|s = πs|r in the 2-state latent Markov model . . . . . . . . . . . . . . 82

4.3 The power of the likelihood-ratio test to reject the null hypothesis that

the covariate has no effect on the transition probabilities . . . . . . . . . 83

4.4 The power of the likelihood-ratio test to reject the null hypothesis π2|1 = 0 84

4.5 Evaluating the quality of the large data set method for likelihood-ratio

power computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.1 Values of conditional response probabilities . . . . . . . . . . . . . . . . . 111

5.2 Power of the BLR test for H0 : S = 2 versus H1 : S = 3 . . . . . . . . . 114

5.3 The power of the BLR test for testing H0 : S = 3 versus H1 : S = 4. . . 115

5.4 Power of the BLR test according to the short-cut and the PBP method

for several 3-state LM population models . . . . . . . . . . . . . . . . . . 117

List of Figures

4.1 Distribution of the likelihood-ratio statistic under the null and alternative

hypotheses and the statistical power. . . . . . . . . . . . . . . . . . . . . 72

5.1 Power by sample size for a 3-state LM population model with varying levels

of the measurement parameters, equal initial state proportions, 6 response

variables, and 3 time points . . . . . . . . . . . . . . . . . . . . . . . . . 116

5.2 Power by sample size for a 3-state LM population model with varying levels

of the transition parameters, equal initial state proportions, 6 response

variables, and 3 time points . . . . . . . . . . . . . . . . . . . . . . . . . 117

xi

CHAPTER 1

Introduction

1.1 Introduction

Statistical models for studying the presence of subgroups within an overall population have

a history that goes back to 1894 when Karl Pearson proposed using a mixture of normal

distributions to demonstrate that a crab population consisted of two subspecies (Pearson,

1894). Much later, Lazarsfeld (1950) and Wiggins (1973) showed the utility of this

approach for social and behavioral sciences by proposing mixture models for categorical

responses which are currently known as latent class and latent Markov models. The

latent Markov model, also referred to as hidden Markov model (Rabiner, 1989) or latent

transition model (Collins & Wugalter, 1992), represents the longitudinal data variant of

the latent class model, where the focus is typically on describing the transitions between

classes at successive measurement occasions (Bartolucci, Farcomeni, & Pennoni, 2010;

Van de Pol & De Leeuw, 1986). It was not until the mid of 1990s that these models

began attracting the attention of both statisticians and applied researchers.

1

2 CHAPTER 1. INTRODUCTION

In recent years, the interest for latent class and latent Markov models has increased

greatly, among others because of the various useful extensions of the basic models and

the general availability of statistical packages implementing these models. Important

extensions include models for variables of different scale types (e.g., nominal, continuous,

ordinal, counts) (Vermunt & Magidson, 2002), models containing time-constant and/or

time-varying covariates (Bartolucci & Farcomeni, 2009; Dayton & Macready, 1988;

Vermunt, Langeheine, & Bockenholt, 1999), models that relax the local-independence

assumption (Hagenaars, 1988), and models with constraints on the parameters of interest

(Bartolucci, 2006; Mooijaart & Van der Heijden, 1992). These extensions, together with

the widespread availability of software packages for mixture modeling, such as Mplus

(L. Muthen & Muthen, 1998-2007), Latent GOLD (Vermunt & Magidson, 2013a), and

routines written for SAS (Lanza & Collins, 2008), R (e.g., FlexMix (Leisch, 2004), polca

(Linzer & Lewis, 2011), and depmixS4 (Visser & Speekenbrink, 2010)), and Stata (Rabe-

Hesketh, Skrondal, & Pickles, 2004), make it possible to successfully apply these models

in both cross-sectional and longitudinal studies.

Despite the more wide-spread use of latent class and latent Markov models, little is

still known about the study design requirements to successfully apply these techniques.

More specifically, statisticians have difficulty answering questions concerning the required

sample size, number and quality of response variables, and/or number of measurement

occasions to achieve sufficient statistical power of the tests used when applying these

methods. Since methods for assessing the statistical power of the performed tests are

currently lacking for mixture models in general and for latent class and latent Markov

models in particular, these techniques are often applied in a suboptimal manner.

The aim of this dissertation is to fill this important gap in the literature by developing

power analysis methods for the most important tests applied when using mixture models

for categorical response variables. Methods are described for a) determining the data

requirements to achieve a certain (acceptable) power level – for example, for determining

the necessary sample size or number of measurement occasions to achieve a power of

1.2. THE MOST IMPORTANT PARAMETERS AND HYPOTHESES 3

.8 or larger – and b) performing power calculations to evaluate whether a specific study

design yields an appropriate power level for the statistical tests of interest. An additional

objective is to learn more about the design factors affecting the power of statistical tests

in latent class and latent Markov analysis, which will make it easier to design studies with

sufficient power, and thus to use available resources more efficiently.

1.2 The most important parameters and hypotheses

Various types of parameters for which one would like to perform statistical tests can be

distinguished in latent class and latent Markov models. Table 1.1 presents a classification

of the important types of parameters and the typical aim of hypotheses about these

parameters. The parameters of main interest are the measurement parameters, the

structural parameters in models with covariates, the transition parameters in latent Markov

models, and the number of latent classes or latent states. Researchers using latent class

or latent Markov models will usually perform statistical tests for some of these parameters.

The most common tests are shown in the last column of Table 1.1. Below we discuss

these tests in more detail.

Table 1.1: Important parameters of latent class and latent Markov models

typical researcher’s aims when testingparameters of interest hypotheses concerning these parameters

measurement parameters determining the structure of classesstructural parameters assessing covariate-class associationstransition parameters describing transitions between states

number of classes determining the number of classes

1.2.1 Measurement parameters

The measurement parameters of latent class and latent Markov models define the class-

specific conditional response probabilities and describe the association between the latent

classes and the (observed) indicator variables. They aid in the interpretation of the

classes. Hypotheses about these parameters often concern the structure of classes; that


is, in the differences in response probabilities between or within latent classes. Examples

include testing the null hypothesis that response probabilities are equal across latent

classes, equal across indicator variables, or equal to specific values. The first hypothesis

finds applications in assessing statistical dependence between two qualitative variables,

here latent variable X with labels x = 1, 2, 3, ...C and the indicator variable Yj (for

j = 1, 2, 3, ....P ). The second hypothesis is useful, for example, when testing whether

indicator variables have equal error rates (Goodman, 1974; McCutcheon, 2002). Other

substantive applications include hypotheses concerning the equality of sensitivities and

specificities across tests, or the equality of sensitivities and specificities to certain values

(I. Yang & Becker, 1997).

As in standard logistic regression analysis (Agresti, 2007), null hypothesis significance

testing can be performed using Wald, likelihood-ratio, or score tests. Under certain

regularity conditions, these three test statistics are asymptotically equivalent, each

following a central chi-square distribution under the null hypothesis and a non-central

chi-square under the alternative hypothesis (Buse, 1982). When discussing tests on

hypotheses about the measurement parameters, in the thesis the focus is on the Wald

test. We discuss how to use its asymptotic distribution under the null and the alternative

hypotheses to compute the power or the sample size.

1.2.2 Structural parameters

In latent class models, the structural parameters refer to parameters describing how

the encountered latent classes are related to covariates, also referred to as explanatory

variables, predictors, external variables, independent variables, or concomitant variables

(Dayton & Macready, 1988). In latent Markov models, covariates may affect the latent

states at the different measurement occasions. Typically, these covariate effects are

modelled using multinomial logistic regression equations. Null significance testing for

the corresponding logistic parameters is usually done using either likelihood-ratio or

Wald tests. Examples of applications include assessing the significance of the effect of

1.2. THE MOST IMPORTANT PARAMETERS AND HYPOTHESES 5

maternal education on latent class memberships of positive health behaviors (Collins &

Lanza, 2010), the effect of education on latent class memberships of political orientations

(Hagenaars & McCutcheon, 2002), and the effect of age on latent class memberships

of crime delinquencies (Van der Heijden, Dessens, & Bockenholt, 1996). See also

Reboussin, Reboussin, Liang, and Anthony (1998) and Vermunt et al. (1999), who

presented applications of latent Markov models with covariates.

1.2.3 Transition parameters

In latent Markov models without covariates, not only the measurement parameters but

also the transition parameters describing the change in latent state membership over time

are important. Applications of hypotheses on the transition parameters include studies on

patients’ health status change over time (Bartolucci et al., 2010), youngsters’ substance

use behavior development over time (Jackson & Schulenberg, 2013), women’s dietary

pattern change over time (Sotres-Alvarez, Herring, & Siega-Riz, 2013), and smokers’

movement through a series of stages in their efforts to quit smoking (Martin, Velicer,

& Fava, 1996). One may also be interested in testing whether these transitions differ

between groups in the population; for example, whether substance use transitions differ

between males and females or whether health status transitions differ between a treatment

and a control group.

Hypotheses about the transition parameters are generally tested by using likelihood-

ratio tests, for which the asymptotic distribution under the null and alternative hypothesis

can be derived. However, when the null hypothesis involves setting one or more transition

probabilities to zero, i.e. on their boundary value, the asymptotic results for the likelihood-

ratio test do not hold anymore (Bartolucci, 2006). This implies that in this non-standard

situation, asymptotic distributions cannot be used for null significance testing or power

computation. Instead, simulation methods need to be used.


1.2.4 Number of classes

Thus far, we assumed that the number of latent classes or states is known. However, in

most applications this number is unknown, in which case the most important statistical

tests concern the number of classes. In principle, hypotheses about the number of classes

can be tested using likelihood-ratio tests. However, the usual asymptotic chi-square

distribution of the likelihood-ratio statistic does not hold when testing a model with C

classes against a model with C+ 1 classes. As an alternative, bootstrap based likelihood-

ratio tests have been suggested (McLachlan, 1987). Another option is to guide the

selection of the number of classes by making use of information criteria (IC) such as the

Akaike IC (Akaike, 1974), the Bayesian IC (Schwarz et al., 1978), and adjusted forms

of these ICs (e.g.,the penalized Akaike IC (Bozdogan, 1994), the consistent Akaike IC

(Bozdogan, 1987), and the sample size adjusted Bayesian IC (Sclove, 1987)). Because

these ICs lack the logic of the null hypothesis significance testing, here we focus on power

computation for the bootstrap likelihood-ratio test.

1.3 More on power analysis

For the development of power analysis methods for tests used in latent class and latent

Markov modeling, we can use input from two fields. The first is the field of mixture

modeling itself in which numerous simulation studies have been published on factors

affecting the correct estimation of the number of classes (Bacci, Pandolfi, & Pennoni,

2014; Bartolucci & Farcomeni, 2009; Collins & Wugalter, 1992; Dias & Goncalves, 2004;

Tofighi & Enders, 2008; Fonseca & Cardoso, 2007; Lukociene, Varriale, & Vermunt, 2010;

McLachlan & Peel, 2000; Nylund, Asparouhov, & Muthen, 2007; C. Yang, 2006). The

aim of most of these simulation studies was to determine which statistic – information

criteria (e.g., Akaike IC, Bayesian IC, etc.) or likelihood-ratio tests – is best able to select

the model with the correct number of classes under a variety of conditions. From these

studies we also know that the ability to find the correct number of classes is not only

1.3. MORE ON POWER ANALYSIS 7

affected by the type of statistic that is used, but also by the sample size, the differences

between the classes (or effect sizes), the number of observed response variables, the scale

types of these variables, the number of measurement occasions, the number of classes, and

the class sizes. Some of these factors, such as sample size and effect size, are relevant

for the power of any statistical test, whereas others are specific for mixture modeling.

However, based on these results it is still not clear how to compute the likelihood of

finding the correct number of classes by manipulating particular factors conditional on

other ones, which is what is needed to set up a study with a certain power level.

The second relevant field is the field of power calculation methods for other types

of analyses, such as log-linear analysis (O’Brien, 1986; Shieh, 2000), logistic regression

analysis (Demidenko, 2007; Whittemore, 1981), and structural equation modeling

(R. MacCallum, Lee, & Browne, 2010; Satorra & Saris, 1985). It should be noted that

latent class and latent Markov models with categorical indicator variables are similar to

log-linear models and with covariates included to logistic regression models, with the

“only” difference that the class membership is not directly observable. Because of these

similarities, for tests concerning the measurement, transition, and structural parameters,

we adapt the power analysis methods developed for log-linear and (multinomial) logistic

regression models. For tests concerning the number of classes, the likelihood-ratio power

analysis methods which are also used in structural equation modeling will be adapted (see

for example, R. MacCallum et al. (2010) and Satorra and Saris (1985)).

Specific aspects that should be addressed when implementing the existing power

analysis methods for mixture models are the following: a) In latent class and latent Markov

models, class membership is not directly observable. Factors affecting uncertainty about

the individuals’ class memberships are expected to affect the statistical power of the

tests, and therefore the power analysis method should take this into account. b) In some

applications of latent class and latent Markov models, the null hypotheses of interest are

specified by setting probabilities to zero, e.g. when testing the absence of transitions to

a particular state (Bartolucci, 2006). In such non-standard situations, testing and power


analysis methods which are based on the theoretical distributions of the test statistic

concerned do not apply. Whereas in other situations one may rely on the theoretical

distribution of the test statistic, calculating power using these theoretical distributions

requires us to specify the non-centrality parameter, which is generally not known. c) The

gold standard for significance testing of the null hypothesis with C- class model, against

the alternative with C + 1-class model, is the bootstrap likelihood-ratio test (McLachlan,

1987). Null significance testing using the bootstrap method involves using the parameter

estimates of the C-class model to generate simulated (also called bootstrap) data sets.

Both the C- and C + 1-class models are then fitted to these bootstrap data sets and the

likelihood-ratio value is obtained by computing the log-likelihood difference between the

two models, yielding the empirical distribution of the likelihood-ratio statistic under the

null hypothesis from which one can read the p-value (Nylund et al., 2007). For power

computation, not only the distribution under the null hypothesis but also the distribution

under the alternative hypotheses is required, which can also be constructed by simulation.

In practice, this means one should repeat the full bootstrap procedure for multiple samples

taking the C + 1- class model as a population model. Given the fact that the bootstrap

procedure itself is already computationally demanding, computing the power or required

sample size by performing the full bootstrap for multiple samples from the model under

the alternative hypothesis is generally not feasible.

1.4 Outline of the dissertation

This dissertation consists of four journal articles dealing with power and sample size

computation for tests concerning parameters of latent class and latent Markov models.

Whereas the chapters can be read independently, this also creates some overlap and

sometimes also slight inconsistencies in notation.

In Chapter 2, we study power analysis for tests concerning the measurement parameters

of latent class models. This chapter provides sample size and power computation methods

for the Wald test. Furthermore, we study design factors affecting the power of –and the

1.4. OUTLINE OF THE DISSERTATION 9

required sample size for –the Wald test. As always, it can be expected that power is

affected by the level of significance, the sample size, and the effect size (Cohen, 1988).

Other relevant factors are the number of classes, the class proportions, and the number of

indicator variables. In this chapter we also examine how to achieve a design with a certain

power level by manipulating these factors. A numerical study is presented in which we

assess the performance of the proposed method and illustrate the power and sample size

computation method, considering different scenarios for the study design.

In Chapter 3, we extend the Wald based power analysis method for measurement

parameters from Chapter 2 to be applicable to the structural parameters in latent class

models with covariates. When testing hypotheses about the structural parameters,

the likelihood-ratio test is sometimes used instead of the Wald test. In this chapter,

we therefore also present power analysis methods for the likelihood-ratio test, as well

as compare the statistical power of the likelihood-ratio and Wald tests for hypotheses

concerning the logit parameters in latent class models with covariates. The study design

and population characteristics affecting the power of these two tests are addressed as well.

In Chapter 4, we study power analysis methods for testing hypotheses about the

transition parameters in latent Markov models. Two types of situations are considered.

The first concerns the standard situation where the test statistic follows a known

theoretical distribution (i.e., chi-square distribution for the likelihood-ratio statistic),

implying that also power computation can be based on this theoretical distribution. The

second situation concerns power computation for the non-standard tests, which arises

when probabilities are fixed to zero. For the former case, we propose the exemplary

data set and large simulated data set methods for obtaining the non-centrality parameter.

For the non-standard case, we discuss power computation by Monte Carlo simulation.

Factors affecting the power of the tests are identified, and the power analysis methods

are illustrated with numerical experiments.

In Chapter 5, we present power analysis methods for the bootstrap likelihood-ratio

test, with a special emphasis on the number of states in latent Markov models. As


always, power can be computed as the proportion of the bootstrap p-values (PBP) for

which the null hypothesis is rejected. Such a method is computationally very demanding

as it requires performing the full bootstrap for multiple samples of the model under the

alternative hypothesis. We propose solving this computational time problem using a short-

cut method, in which the distributions of the test statistic under the null and alternative

hypotheses are constructed by simulation. A numerical study is conducted to (a) illustrate

the proposed power analysis methods and (b) compare the power estimate of the short-cut

method to the one of PBP method.

In Chapter 6, we provide a concluding discussion, describe directions for future

research, and discuss limitations with respect to the specific contribution of this

dissertation.

CHAPTER 2

Power and Sample Size Computation for Wald Tests in Latent

Class Models

Abstract

Latent class (LC) analysis is used by social, behavioral, and medical science researchers

among others as a tool for clustering (or unsupervised classification) with categorical

response variables, for analyzing the agreement between multiple raters, for evaluating

the sensitivity and specificity of diagnostic tests in the absence of a gold standard, and for

modeling heterogeneity in developmental trajectories. Despite the increased popularity

of LC analysis, little is known about statistical power and required sample size in LC

modeling. This chapter shows how to perform power and sample size computations

This chapter has been accepted for publication as: Gudicha, D.W., Tekle, F. B., & Vermunt, J.K. (in press). Power and Sample Size Computation for Wald Tests in Latent Class Models. Journal ofClassification.

11

12 CHAPTER 2. POWER FOR WALD TESTS

in LC models using Wald tests for the parameters describing the association between

the categorical latent variable and the response variables. Moreover, the design factors

affecting the statistical power of these Wald tests are studied. More specifically, we show

how design factors which are specific for LC analysis, such as the number of classes,

the class sizes, and the number of response variables, affect the information matrix.

The proposed power computation approach is illustrated using different scenarios of the

relevant design factors. A simulation study conducted to assess the performance of the

proposed power analysis procedure shows good performance in all situations that may be

encountered in practice.

2.1. INTRODUCTION 13

2.1 Introduction

Latent class (LC) analysis was initially introduced in the 1950s by Lazarsfeld (1950) as a

tool for identifying subgroups of individuals giving similar responses to sets of dichotomous

attitude questions. It took another two decades before LC analysis started attracting the

attention of other statisticians. Since then, various important extensions of the original

LC model have been proposed, such as models for polytomous responses, models with

covariates, models with multiple latent variables, and models with parameter constraints

(Dayton & Macready, 1976, 1988; Formann, 1982, 1992; Goodman, 1974; Magidson &

Vermunt, 2004; McCutcheon, 1987; Vermunt, 1996). More recently, statistical software

for LC analysis has become generally available – e.g., Latent GOLD (Vermunt & Magidson,

2013b), Mplus (L. Muthen & Muthen, 1998-2007), LEM (Vermunt, 1997), the SAS

routine PROC LCA (Lanza, Collins, Lemmon, & Schafer, 2007), and the R package

poLCA (Linzer & Lewis, 2011) – which has contributed to the increased popularity of this

model among applied researchers. Applications of LC analysis include building typologies

of respondents based on social survey data (McCutcheon, 1987), identifying subgroups

based on health risk behaviors (Collins & Lanza, 2010), identifying phenotypes of stalking

victimization (Hirtenlehner, Starzer, & Weber, 2012), and finding symptom subtypes

of clinically diagnosed disorders (Keel et al., 2004). Applications which are specific for

medical research include the estimation of the sensitivity and specificity of diagnostic tests

in the absence of a gold standard (Rindskopf & Rindskopf, 1986; I. Yang & Becker, 1997)

and the analysis of the agreement between raters (Uebersax & Grove, 1990).

Despite the increased popularity of LC analysis in a broad range of research areas,

no specific attention has been paid to power analysis for LC models. However, as in the

application of other statistical methods, users of LC models wish to confirm the validity

of their research hypotheses. This requires that a study has sufficient statistical power;

that is, that it is able to confirm a research hypothesis when it is true. Also reviewers

of journal publications and research grant proposals often request sample size and power

computations (Nakagawa & Foster, 2004). However, in the literature on LC analysis,


methods for sample size and power computation are lacking as well as a thorough study

on the design factors affecting the power of statistical tests used in LC analysis .

In this chapter, we present a method for assessing the power of tests related to

the class-specific response probabilities, which are the parameters of main interest in

confirmatory LC analysis. Relevant tests include tests for whether response probabilities

are equal across latent classes, whether response probabilities are equal to specific

values, whether response probabilities are equal across response variables (indicators),

and whether sensitivities or specificities are equal across indicators (Goodman, 1974; Holt

& Macready, 1989; Vermunt, 2010b). Since the class-specific response probabilities are

typically parameterized using logit equations (Formann, 1992; Vermunt, 1997), as in

logistic regression analysis, hypotheses about these LC model parameters can be tested

using Wald tests (Agresti, 2007). The proposed power analysis method is therefore

referred to as a Wald based power analysis.

For logistic regression models, Demidenko (2007, 2008) and Whittemore (1981)

described the large-sample approximation for the power of the Wald test. In this chapter,

we show how to use this procedure in the context of LC analysis. An important difference

compared to standard logistic regression analysis is that in a LC analysis the predictor

in the logistic models for the responses, the latent class variable, is unobserved. This

implies that the uncertainty about the individuals’ class memberships should be taken into

account in the power and sample size computation. As will be shown, factors affecting this

uncertainty include the number of classes, the class sizes (or proportions), the strength

of the association between classes and indicator variables, and the number of indicator

variables (Collins & Lanza, 2010; Vermunt, 2010a).

The remainder of this chapter is organized as follows. First, we present the LC model

for dichotomous responses and discuss the relevant hypotheses for the parameters of the

LC model. Second, we discuss power computation for Wald tests in LC analysis and,

moreover, show how the LC specific design factors affect the power via the information

matrix. Third, we present a numerical study in which we assess the performance of the

2.2. THE LC MODEL 15

proposed method and illustrate power/sample size computation for different scenarios of

the relevant design factors. Finally, we provide a brief discussion of the main results of

our study.

2.2 The LC model

The LC model is a probabilistic clustering or unsupervised classification model for

dichotomous or categorical response variables (Goodman, 1974; Hagenaars, 1988;

Magidson & Vermunt, 2004; McCutcheon, 1987; Vermunt, 2010b). Taking the

dichotomous case as an example, let yij be the value of response pattern i for the binary

variable Yj , for j = 1, 2, 3, ..., P , where yij = 1 represents a positive response and 0 a

negative response. We denote the full-response vector by yi. For example, for P = 3, yi

takes on one of the following eights triplets of 0 and/or 1’s:

{(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)}.

The three response variables could, for example, represent the answers to the following

questions: “Do you support gay marriage?”, “Do you support a raise of minimum wages?”,

and “Do you support the initiative for health care reform?”In a sample of size n persons,

a particular person could answer these questions with ‘no’, ‘yes’, and ‘yes’, respectively, in

which case the response pattern for this subject becomes (0, 1, 1). In such an application,

the aim of the analysis would be to determine whether one can identify two latent classes

with different response tendencies (say republicans and democrats), and subsequently to

classify subjects into one of these classes based on their observed responses, or to compare

the probability of positive responses to a given response variable between the republican

and the democrat classes.

In general, for p dichotomous response variables, we have 2P tuples of 0 and/or 1’s.

We denote the number of individuals with response pattern yi by ni, where the total

sample size n =∑2P

i=1 ni. The LC model assumes that the response probabilities depend


on a discrete latent variable, which we denote by X with categories t = 1, 2, 3, ..., C.

The probability of having response pattern yi is modeled as a mixture of C class-specific

probability functions (Dayton & Macready, 1976; Goodman, 1974; McCutcheon, 1987;

McLachlan & Peel, 2000; Vermunt, 2010b). That is,

p(yi,Ψ) =

C∑t=1

p(X = t)p(Y = yi|X = t), (2.1)

where p(X = t), which we also denote by πt, represents the relative size of class t,

and p(Y = yi|X = t) is the corresponding class-specific joint response probability.

The class-specific probabilities for binary variable Yj is usually modeled using a logistic

parameterization; that is, θjt = p(Yj = 1|X = t) =exp (βjt)

1+exp (βjt), where βjt is the log-odds

of giving a positive response on item j in class t. Moreover, assuming that the response

variables are independent within classes – which is referred to as the local independence

assumption – the LC model represented by equation (2.1) can be rewritten as follows:

p(yi,Ψ) =

C∑t=1

πt

P∏j=1

θyijjt (1− θjt)1−yij , (2.2)

where πt is such that 0 < πt < 1 and∑Ct=1 πt = 1. The vector of parameters Ψ

consists of the sub-vector π, the class sizes, and the sub-vector β, the class-specific

logits for the indicator variables (also refer to as the measurement parameters). For

example, for C = 2 and P = 3, the parameter vector will be: Ψ′

= (π′,β′) =

(π1, β11, β21, β31, β12, β22, β32). In the application presented above, these parameters

would correspond to the proportion of ‘republicans’, the log-odds of a republican responds

‘yes’ instead of ’no’ to questions Y1, Y2, and Y3, and the log-odds of a democrat responds

‘yes’ instead of ’no’ to questions Y1, Y2, and Y3.

In general, for a LC model having c classes and p binary indicator variables, we have

m = C − 1 + C · P free model parameters. These parameters are usually estimated

by maximum likelihood (ML) (Dayton & Macready, 1976; Goodman, 1974; McLachlan

& Peel, 2000; Vermunt, 2010b), which involves seeking the values of Ψ, say Ψ, which

2.2. THE LC MODEL 17

maximize the log-likelihood function:

l(Ψ) =

2P∑i=1

ni log p(yi,Ψ). (2.3)

Maximizing the log-likelihood function in equation (2.3) produces a unique estimate

for Ψ, provided that the LC model in equation (2.1) is identifiable. As indicated by

(Goodman, 1974), a necessary condition for an LC model to be identified is that the

number of independent response patterns is at least as large as the number of free model

parameters. That is, 2P − 1 ≥ m = C − 1 + C · P . A sufficient condition for local

identification is that the Jacobian is full rank (McHugh, 1956). Because the analytic

evaluation of the rank of the Jacobian is very difficult, Forcina (2008) proposed checking

identification of LC models by evaluating the rank of the Jacobian for a large number

of random parameter values. For the scenarios considered in this chapter we applied

Forcina’s method, which showed that the models were identified.

Typically, researchers using LC models do not only wish to obtain point estimates

for the Ψ parameters, but are also interested in tests concerning these parameters.

For simplicity we will focus on a single type of test, which in most applications is the

test of main interest. That is, the hypothesis testing to determine whether there is

a significant association between the latent classes and a particular indicator variable.

Inference regarding this association involves testing the null hypothesis that the response

logit does not differ across latent classes for the indicator variable concerned. This null

hypothesis can be formulated as H0 : βj1 = βj2 = ... = βjc, for j = 1, 2, 3, ..., P . An

equivalent formulation of this hypothesis is

H0 : βj1 − βj2 = 0

βj1 − βj3 = 0

...

βj1 − βjc = 0

Or, using matrix notation, as H0 : Hβj = 0, where H is a C − 1 by C design


matrix with linear contrasts and βj is a C by 1 column vector with the parameters for Yj ,

i.e., β′

j = (βj1, βj2, ..., βjC). Under the null hypothesis of no association, the difference

βj1 − βjt occurs by chance alone, implying that the indicator does not contribute to the

definition of classes in a statistically significant way.

As already indicated in the introduction section, various other types of hypotheses

concerning the class-specific logit parameters may be of interest. Examples include tests

for whether βjt is equal to a particular value (e.g., β11 = 1), whether the βjt parameters

are equal across two or more items (e.g., β1t − β2t = 0), and whether the value is the

opposite of the value for another class (e.g., β11 + β12 = 0) (Goodman, 1974). In

medical research, we may be interested in comparing the sensitivity and specificity of

diagnostic tests (see, for example I. Yang and Becker (1997)), yielding hypotheses such

as β11 − β21 = 0 and β12 − β22 = 0, respectively. Note that all these hypotheses can be

expressed in the general form Hβ = 0.

2.3 Wald based power analysis for LC models

2.3.1 The Wald statistic and its asymptotic properties

One of the properties of the ML estimator is that, under certain regularity conditions

(McHugh, 1956; White, 1982), the estimator Ψ converges in probability to Ψ as the

sample size tends to infinity. That is, for any sequence Ψn we have Ψna.s.−−→ Ψ. The

other interesting property of the ML estimator is that it has a limiting normal distribution.

More specifically, for large sample size n,

√n(Ψn −Ψ) −→ N(0,V), (2.4)

where −→ denotes convergence in distribution, V = I−1(Ψ) is the asymptotic co-variance

of√nΨn, and I(Ψ) is the m by m information matrix (McHugh, 1956; Redner, 1981;

Rencher, 2000; Wald, 1943; Wolfe, 1970). The latter has the following block structure:

2.3. WALD BASED POWER ANALYSIS FOR LC MODELS 19

I(Ψ) =

I1 = {(πt, πs)} I2 = {(πt, βjl)}

I3 = {(βjq, πs)} I4 = {(βjq, βkl)}

,for t, s = 1, 2, 3, ...., C − 1, l, q = 1, 2, 3, ...., C and k, j = 1, 2, 3, ..., P . The sub-matrices

I1, I2, I3, and I4 are of dimensions C − 1 by C − 1, C − 1 by C ·P , C ·P by C − 1, and

C · P by C · P , respectively. The terms between braces indicate the parameters involved

in the sub-matrix concerned.

Using the algebraic properties of block matrices, it follows that

V = I−1(Ψ) =

A−1 −I−11 I2B−1

−I−14 I3A−1 B−1

, (2.5)

where A = I1 − I2I−14 I3 and B = I4 − I3I

−11 I2. A necessary condition for A to be

invertible, which is a requirement to obtain the covariance matrix of Ψn, is that both I1

and I4 are non-singular matrices (Rencher, 2000). In the appendix, we provide details on

the expressions for I1, I2, I3, and I4.

The consistency and multivariate normality discussed above apply to the estimators

of the component parameters as well. That is, using the property of multivariate normal

random variables which states that the sub-vectors of a multivariate normal are also

normal, the limiting distributions of π and β become

√n(πn − π) −→ N(0,A−1) (2.6)

√n(βn − β) −→ N(0,B−1). (2.7)

Also sub-vector βj of β is normally distributed, with mean βj and with co-variance Vj ,

being a C by C sub-matrix of B−1. In the remaining part of the paper, we focus on this

βj .

Using the Continuous Mapping Theorem (Mann & Wald, 1943), for a design matrix

H that defines the contrasts on the null hypothesis, one can show that Hβj −→


N(Hβj ,HVjH′). The quadratic form of the test for the hypothesis H0 : Hβj = 0

yields the well-known Wald statistic:

W = n(

(Hβj)′(HVjH

′)−1(Hβj)

). (2.8)

Under the null hypothesis, that is, if H0 : Hβj = 0 holds, the Wald statistic W has

an asymptotic (central) chi-square distribution with C − 1 degrees of freedom (Rencher,

2000; Wald, 1943). That is,

W = n(

(Hβj)′(HVjH

′)−1(Hβj)

)−→ χ2

(C−1). (2.9)

Under the alternative hypothesis, W follows a non-central chi-square distribution with

C − 1 degrees of freedom and non-centrality parameter λ. That is,

W = n(

(Hβj)′(HVjH

′)−1(Hβj)

)−→ χ2

(C−1,λ) (2.10)

where λ = n(Hβj)′(HVjH

′)−1(Hβj).

2.3.2 Power and sample size computation

With the establishment of the distribution of the test statistic under the null and

alternative hypotheses and the availability of a closed form expression for the non-centrality

parameter λ, it becomes possible to compute the power of the test for a given sample size

or the sample size for a given power. As in any power analysis, we first have to define the

population model. In our case, this involves defining the number of classes and the number

of response variables, and, moreover, specifying the values for the class sizes π and the

class-specific logits β. For the assumed population model, we can compute the inverse

information matrix V which appears in the formula of the non-centrality parameter.

Once the population parameters are set and V is computed, power computation for a

given sample size and required sample size computation for a given power proceeds along


the steps described below.

Steps for power computation

Power computation proceeds as follows:

1. Compute the non-centrality parameter λ for the specified sample size n (use the

expression in equation (2.10)).

2. For a given value of type I error α, read the 100(1 − α) percentile value from the

(central) chi-square distribution. That is, find χ2(1−α)(C − 1) such that under the

null hypothesis, p(W > χ2

(1−α)(C − 1))

= α. This value is referred as the critical

value of a test.

3. Compute the power as the probability that a random variable W from the non-

central chi-square distribution (with non-centrality parameter λ given in step 1) will

assume a value greater than the critical value obtained under step 2.

Steps for sample size computation

Sample size computation proceeds as follows:

1. For a given value of α, read the 100(1 − α) percentile value from the (central)

chi-square distribution (see the second step for power computation).

2. For a given power and the critical value obtained in step 1, find the non-centrality

parameter λ such that, under the alternative hypothesis, the condition that power

is equal to p(W > χ2

(1−α)(C − 1))

is satisfied.

3. From the expression for λ, solve for the sample size as

n = λ(

(Hβj)′(HVjH

′)−1(Hβj)

)−1.

Software implementation

The above procedure for power computation can be applied using existing software for

LC analysis that allows defining starting values or fixed values for the logit parameters


and that provides the (inverse) information matrix as output, for example, using LEM

(Vermunt, 1997), Mplus (L. Muthen & Muthen, 1998-2007), or Latent GOLD (Vermunt

& Magidson, 2013b). More specifically, with a LC analysis software package, one can

obtain the inverse information matrix V. This will typically require the following two

steps:

A. Create a data set containing all possible data patterns and with the expected

frequencies according to the LC model of interest as weights. This can be achieved

by running the LC software with the population parameters specified as fixed values

and with the estimated frequencies as requested output. The created output is, in

fact, a data set which is exactly in agreement with the population model. Such a

data set is sometimes referred as an ’exemplary’ data set (O’Brien, 1986).

B. Analyze the (exemplary) data set created in step A with the LC model of interest and

request the variance-covariance matrix of the parameters (the inverse information

matrix) as output. Note that when analyzing a data set which is exactly in

agreement with the model, the observed information matrix is identical to the

expected information matrix. The same applies to the approximate observed

information matrix based on the outer-product of the gradient contributions of

the data patterns.

The above two steps provide us with the inverse information matrix V. The actual

power or sample size computations using the steps described above can subsequently

be performed using software that allows performing matrix computations and that has

functions for obtaining the critical value from the chi-squared distribution and the non-

centrality value from the non-central chi-squared distribution. For this purpose, one can

use R.

The procedure described above is fully automated in version 5.0 of the Latent GOLD

program (Vermunt & Magidson, 2013b). Users define the population model and specify

either the sample size or the required power. The program computes the power or the

required sample size for the Wald tests it reports by default, as well as for other Wald


tests defined by the user. In the appendix, we give an example of the Latent GOLD syntax

for power computation.

2.3.3 Design factors affecting the power or the required sample

size

Now let us look in more detail at the factors affecting the power of the Wald test in LC

models. It should be noted that the power is determined by the value of the type I error

and the value of the noncentrality parameter λ. The larger the type I error and the larger

λ, the larger the power. The type I error is in turn increased by increasing the level of

significance α, which makes the null hypothesis more likely to be rejected. As can be

observed from equation (2.10), λ is a function of the sample size n, the precision of the

estimator (Vj), and the effect size Hβj . Note that in our case the effect size is the

difference between the class-specific β parameters or, equivalently, the strength of the

association between the classes and the response variable concerned.

Specific for LC models is that the precision of the estimator is affected by the fact

that class membership is unobserved; that is, that we are uncertain about a person’s class

membership. Recall from equation (2.5) that the block of V concerning the β parameters

is obtained as the inverse of B = I4−I3I−11 I2. This means that B becomes larger when I4

and I1 become larger and when I2 and I3 become smaller. To show how the uncertainty

about the class membership affects B, let us have a closer look at I4, which is the most

important term in B. Its elements are obtained as follows:

I4(βjq, βkl) =

2P∑i=1

p(X = q|yi)p(X = l|yi)(yij − θjq)(yik − θkl)p(yi), (2.11)

where θjq = exp(βjq)/(1 + exp(βjq)). (see the appendix for further detail on its

derivation.) As can be seen, specific for a LC analysis, the elements of the information

matrix are not only a function of the model parameters, but also of the posterior class

membership probabilities p(X = q|yi). For example, the contribution of response pattern


i to the information on parameter βjq equals p(X = q|yi)2(yij − θjq)2p(yi). In other

words, response pattern i contributes with “weight” p(X = q|yi)2 to the information on

a parameter of class q. The contribution to total of the parameters of all C classes equals∑Ct=1 p(X = t|yi)2. This shows that the information is maximual when p(X = q|yi)

equals 1 for one class and 0 for the other classes, in which case the total contribution equals

1. This occurs when the classes are perfectly separated or when the class membership is

observed rather than latent.

Also the entries of I1 become larger when the posterior class membership probabilities

get closer to either 0 or 1. The matrices I2 and I3 capture the overlap in information

between the class sizes and the β parameters. The elements of this matrix are 0 when

separation is perfect and become larger with lower class separation.

The implication of the above is that the power can be increased by increasing the

separation between the classes; that is, by influencing the factors affecting the posterior

class membership probabilities. The posterior class membership probabilities depend on

the number of classes, the class sizes, the class-specific conditional response probabilities,

and the number of response variables (Collins & Lanza, 2010; Vermunt, 2010a). More

specifically, class separation is better with less latent classes, a more uniform (or balanced)

class distribution, response variables which are more strongly related to the classes, and

a larger number of response variables.

Note that the conditional response probabilities have a dual role. The more the

conditional response probabilities θjq or the logit parameters βjq differ across latent

classes, the larger the effect size and thus also the higher the power of the test for

the parameters of indicator variable Yj . However, a larger difference between classes in

the response on Yj also increases the class separation, and thus the power of all tests,

also the ones for the other response variables.

2.4. NUMERICAL STUDY 25

2.4 Numerical study

In this section, we present a numerical study that illustrates the Wald based power

analysis for different configurations of design factors. As was shown in section 2.3.3,

in addition to the usual factors (i.e., sample size, level of significance, and effect size),

power computation in LC models involves the specification of design factors such as the

number of classes, the number of observed response variables, the class sizes, and the

class-specific probabilities (or logits) for the response variables, which we refer to as LC-

specific design factors.

As already indicated in section 2.3.3, LC-specific design configurations yielding better

separated classes, or posterior class membership probabilities which are closer to either

0 or 1, yield more precise estimators, and as a result larger power of the Wald tests.

Therefore, in order to be able to compare different design configurations, it is important

to have a measure for class separation. For this purpose, we use the entropy based R-

square. The entropy of the posterior class membership probabilities for data pattern i,

denoted by Ei, equals∑Ct=1−p(X = t|yi) log p(X = t|yi). Note that Ei gets closer to

0 when the posteriors are closer to 0 and 1. The average entropy across data patterns,

denoted by E, equals∑2P

i=1Eip(yi). The entropy based R-square can now be obtained

as follows: R2entropy = 1 − E/E(0). Here, E(0) is the maximum entropy given the

class sizes; that is, E(0) =∑Ct=1−p(X = t) log p(X = t). The entropy based R-square

takes on values between 0 and 1, where larger R2entropy indicate larger separation between

classes. Values lower than .5, between .5 and .75, and larger than .75 correspond to LC

models with small, medium, and large class separation, respectively. Closer inspection

of the expression R2entropy = 1 − E/E(0) shows that the largest entropy based R-square

is obtained when E equals 0. This occurs when p(X = t|yi) is either 0 or 1 for each

response pattern yi; that is, when class separation is perfect.


2.4.1 Numerical study set up

The LC-specific design factors that were varied are the number of classes, the number

of indicator variables, the class-specific conditional probabilities, and the class sizes. The

number of classes varied from 2 to 4 (i.e., C = 2, 3, 4). The number of indicator variables

was set to P = 6 and P = 10. In line with Vermunt (2010a), the class-specific conditional

probabilities θjt were 0.7, 0.8, and 0.9 (or, depending on the class, 1-0.7, 1-0.8, and 1-0.9),

corresponding to a weak, medium, and strong association between classes and indicator

variables. The θjt were high for class 1, say 0.8, and low for class C, say 1-0.8; with

C = 3, class 2 had high θjt values for the first half of the items and low values for the

other items; with C = 4, class 2 had low θjt values for the first half of the items and

high values for the other items, and class 3 had high θjt values for the first half of the

items and low values for the other items. The class sizes were equal or unequal, where for

the unequal conditions we used class sizes of (0.75, 0.25), (0.5, 0.3, 0.2), and (0.4, 0.3,

0.2, 0.1), for 2-, 3-, and 4-class LC models, respectively. For a 3-class LC model, unequal

class sizes of (0.6,0.3, 0.1) were also considered.

In addition to the four LC-specific design factors, we varied the sample size, power,

and effect size (Cohen, 1988). For power computation, the sample size was set to 75,

100, 200, 300, 500, 700, 1000, and 1500, whereas for sample size determination, the

power was set to .8, .9, and .95. The effect size is already specified via the response

probabilities θjt, where it should be noted that the logit coefficients βjt for which the

Wald tests are performed equal βjt = log θjt/(1 − θjt). The other factor considered is

the level of significance α which, in line with the common research practice where the

type I error rate is often fixed in advance, was fixed to 0.05.

2.4.2 Results

Table 2.1 presents the entropy based R-square for several combinations of the LC-specific

design factors. It shows how the value of this R-square measure is affected by the number

of classes, the class sizes, the number of indicators, and the strength of the class-indicator


Table 2.1: Entropy based R-square values for different combinations of latent class-specificdesign factors

Class sizeEqual Unequal More unequal

Number of classes C = 2 .818 .811(for P = 6 and θj1=0.8) C = 3 .627 .624

C = 4 .594 .589Number of indicators P = 6 .627 .624(for C = 3 and θj1=0.8) P = 10 .790 .788

θj1=0.7 .332 .330 .314Class-indicator associations θj1=0.8 .627 .624 .607(for C = 3, and P = 6) θj1=0.9 .880 .879 .871

Note: the ’unequal’ and ’more unequal’ class size conditions refer to the level of deviationfrom uniform class distribution. For example, for C = 3, we used (0.5, 0.3, 0.2) and (0.6,0.3, 0.1) to represent a smaller and larger deviation from a uniform class distribution,respectively.

associations, given specific values of the other design factors. As can be seen, the smaller

the number of the classes, the larger the number of indicator variables, or the stronger the

class-indicator associations, the larger the value of the entropy based R-square. Moreover,

the more equal the class sizes, the larger the entropy. It can also be seen that the entropy

based R-square may become very low when all conditions are less favorable.

Table 2.2: Estimated power for different class separation levels and different sample sizes

Entropy based R-squareSample size .314 .330 .607 .624 .790

75 .069 .115 .221 .515 .941100 .075 .139 .283 .645 .984200 .102 .239 .520 .922 1.000300 .130 .342 .706 .987 1.000500 .190 .533 .908 1.000 1.000700 .252 .687 .976 1.000 1.000

1000 .345 .842 .997 1.000 1.0001500 .492 .957 1.000 1.000 1.000

Note: H0 : βj1 = βj2 = ... = βjC for which j = 1 andC = 3.

To investigate the effect of class separation on the power of the Wald test for the

significance of a class-indicator association, the power is computed for five of the design

configurations that were presented in Table 2.1 under different sample sizes. The results


are presented in Table 2.2. From this table, we can see that the power of a Wald test

for a class-indicator association strongly depends on the class separation. When classes

are well separated, a sample size of 100 can be large enough to achieve a power of .8 or

more. With a class separation of .330, .607, and .624, a sample sizes of 900, 370, and

140, respectively, is required to achieve such a power. With very badly separated classes

as in the worst condition, even a sample size of 1500 is not large enough to achieve a

power of .8.

Table 2.3: Required sample size for different configurations of latent class-specific designfactors and different power levels

Number of classes Number of indicatorsPower C = 2 C = 3 C = 4 P = 6 P = 10

.8 33 82 83 82 49

.9 45 108 108 108 64.95 55 131 130 131 78

Class-indicator Classassociations sizes

Power Low Medium High Equal Unequal More unequal.8 419 82 34 82 141 371.9 550 108 45 108 185 487

.95 671 131 55 131 226 594

Note: The baseline model is the model with C = 3, P = 6, equal size classes,and medium association between classes and indicators. One design factor isvaried to get the other conditions reported in the table.

Table 2.3 reports the required sample size for a specified power for various combinations

of LC-specific design factors. We use the condition with C = 3, P = 6, equal class sizes,

and medium class-indicator associations as the baseline. This condition requires sample

sizes of 82, 108, and 131, respectively, to achieve the three reported power levels. The

other conditions are obtained by varying one design factor at the time.

The results in Table 2.3 show that, as expected, the required sample size depends

on the number of classes, the number of indicators, the strength of the class-indicator

associations, and the class sizes. More specifically, keeping the other LC-specific design

factors constant, the larger the number of classes and the fewer the number of indicators,

the larger the required sample size to achieve the specified power level. The strength


of the class-indicator associations turns out to be one of the key factors affecting the

power; for example, to obtain a power of .80, we need at least 419 observations when

these associations are weak, but only 34 observations when these are strong. Moreover,

many more observations are required when the class sizes are unequal than when they are

equal; for example, to achieve a power of .95, we need approximately 130, 225, and 600

observations for the (0.334, 0.333, 0.333), (0.5, 0.3, 0.2), and (0.6, 0.3,0.1) condition,

respectively.

In summary, these results show that the strength of the class-indicator associations

and the class distribution have a much stronger impact on the power than the number

of classes and the number of indicator variables. The fact that the strength of the class-

indicator association is so important can be explained by the fact it affects both the class

separation and the effect size. For example, for P = 6, C = 3, and equal class sizes,

when the θjt value changes from .9 to .7, the class separation drops from .880 to .332

and the difference between classes in their conditional response probabilities drops from

.8 to .4. Thus, a θjt value of .9 yields not only a much larger R-square value but also a

much larger effect size than a θjt value of .7. The class sizes are important because the

power of a test regarding difference between groups depends strongly on the size of the

smallest group.

2.4.3 Performance of the power computation procedure

An important question is whether the theoretical power computed using the formulae

presented in this paper agrees with the actual power when using the Wald with empirical

data. To answer this question, we conducted a simulation study in which the theoretical

power is compared with the actual power in data sets generated from the assumed

population model. Note that the actual power equals the proportion of simulated data

sets in which the null hypothesis is rejected.

The population model is a 3-class LC model with six indicators and equal class sizes.

We varied the strength of the class-indicator associations (same three levels as above)


and the sample size (75, 100, 200, 300, 500, 700, and 1000). The actual power was

computed using 500 samples from the population under the alternative hypothesis. For

each of these samples, the LC model is estimated and it is checked whether the Wald

value for the test of interest exceeds the critical value.

Table 2.4: Theoretical and simulated power of the Wald test

Class-indicator

Sample size

associations Method 75 100 200 300 500 700 1000Weak Theoretical .200 .254 .470 .649 .869 .958 .994

Simulated .145 .234 .444 .628 .838 .920 .960Medium Theoretical .762 .877 .995 1.000 1.000 1.000 1.000

Simulated .714 .848 .944 .992 1.000 1.000 1.000Strong Theoretical .989 .999 1.000 1.000 1.000 1.000 1.000

Simulated .986 1.000 1.000 1.000 1.000 1.000 1.000

Note: The power presented here is for the null hypothesis H0 : βj1 = βj2 = ... = βjC forwhich j = 1. Moreover, C = 3, P = 6, and class sizes are equal.

Table 2.4 presents the theoretical and actual power of the Wald test under the

investigated simulation conditions. As can be seen, both measures show the same overall

trend, namely that the power increases with increasing sample size and increasing effect

sizes (and class separation). However, the actual power of the Wald test is always slightly

lower than its theoretical value, where the differences are larger for the smaller sample

size and the weaker class separation conditions. An explanation for these differences is

that the estimated asymptotic variance-covariance matrix used in the simulated power

computations overestimates the variability of the βj parameters. On the other hand,

substantive conclusions are the same for the simulated and theoretical power levels

reported in Table 2.4. With the small effect size and the corresponding weak class

separation condition, a sample size of 500 is needed to achieve a power of .8; with

the medium class separation, a sample size of 100 suffices; and with the strong class

separation, less than 75 observations are needed.

2.5. DISCUSSION AND CONCLUSIONS 31

2.5 Discussion and conclusions

In LC analysis, the association between class membership and the response variables is

usually modeled using a logistic parametrization. This chapter dealt with power analysis

for Wald tests for these logit coefficients, for example, for the hypothesis of no association

between class membership and the response provided on one of the indicators. We showed

that, in addition to the usual design factors – that is, effect size, sample size, and level

of significance – the power of Wald tests in LC models depends on factors affecting the

amount uncertainty about the subjects’ class memberships. More specifically, factors

affecting the class separation also affect the power. The most important of these LC-

specific design factors are the number of classes, the class sizes, the strength of the

class-indicator associations, and the number of indicator variables.

A numerical study was conducted to illustrate the proposed power and sample size

computation procedures. More precisely, it was shown how class separation – quantified

using the entropy-based R-square – is affected by the number of classes, the class sizes,

the strength of the class-indicator associations, and the number of indicator variables,

and, moreover, how class separation affects the power. It turned out that under the

most favorable conditions a sample size of 100 suffices to achieve a power of .8 or .9.

For the situation where the entropy-based R-square is small, a considerably larger sample

size is required. It was shown that under the least favorable conditions, even a sample

size of 2000 did not suffice to achieve an acceptable power level. This demonstrates the

importance of performing a power analysis prior to conducting a study that will make use

of LC analysis.

If power turns out to be too low given the planned sample size, instead of increasing

the sample size, one may try to increase the class separation, for example, by using a larger

number of indicators or, if possible, also by using indicators of a better quality. Note that

improving the quality of indicators has a dual effect on the power of the Wald test for

class-indicator associations: It increases both the effect size and the class separation. This

dual effect could be seen in our numerical study where we saw a dramatic reduction of


the required sample size when the θjt value increased from .7 to .9. In practice, improving

the quality of the indicators will not be easy, even in the type of confirmatory LC analyses

we were dealing with.

A simulation study was conducted to evaluate whether the theoretical power

corresponds with the actual power of the Wald test. It turns out that the estimated

power obtained with the formulae provided in this chapter is slightly larger than the

actual power, where we see a larger overestimation for smaller sample sizes and lower

power levels. This implies that to be on the safe side, to achieve the specified power, a

slightly larger sample size may be used than the estimated sample size.

In this paper, we restricted ourselves to power computations for Wald tests. However,

likelihood-ratio test are often used in LC models as well, either for testing the same kinds

of hypotheses as discussed here or for comparing models with different number of latent

classes. Future research will focus on power computation for likelihood-ratio tests in LC

models.

Another limitation of the current work is that we restricted ourselves to simple LC

models. In future work, we will investigate whether the methods discussed in this paper

can be extended to more complex LC models, such LC models with covariates, latent

Markov models, mixture growth models, and mixture regression models.

Most of the simulation studies on LC and mixture modelling show that larger sample

sizes may be needed than those found with the power computation method described in

the current paper (see for example, Nylund et al. (2007), Tofighi and Enders (2008), and

C. Yang (2006)). Those studies are, however, about deciding on the number of classes,

whereas here we focus on the class-indicator association for a single response variable

assuming that the number of classes is known. Note also that these studies typically do

not look at significance testing, but at the performance of measures like the Bayesian

information criteria (BIC), which may have less power because of their penalty for model

complexity. Further research is needed on the power of statistical tests for deciding about

the number of classes, for example, of the bootstrapped likelihood-ratio test.

2.6. APPENDIX 33

2.6 Appendix

2.6.1 Elements of the information matrix in an LC model for binary

responses

The elements of the information matrix I(Ψ), with Ψ′

= (π′,β′), equal to minus the

expected value of the second-order partial derivatives of the log-likelihood function defined

in equation (2.3) with respect to the free parameters divided by the sample size.

In a LC model, these have the following from:

I(ψl, ψq) = −E(∂2l(Ψ)

∂ψl∂ψq

)/n =

∑ ∂ log p(yi,Ψ)

∂ψl

∂ log p(yi,Ψ)

∂ψqp(yi,Ψ).

This shows that the computation of the information matrix requires solving the first-

order partial derivatives ∂ log p(yi)∂ψl

. For a class-proportion πt and a class-specific response

logit βjt, these take on the following form:

∂ log p(yi,Ψ)

∂πt=

p(X = t|yi)πt

− p(X = C|yi)πC

,

∂ log p(yi,Ψ)

∂βjt= p(X = t|yi)(yij − θjt).

This yields the following forms for the entries of the sub-matrix I1, I2, I3, and I4:

I1(πt, πs) =

2P∑i=1

(p(X = t|yi)

πt− p(X = C|yi)

πC

)(p(X = s|yi)

πs− p(X = C|yi)

πC

)p(yi,Ψ),

I2(πt, βjl) =

2P∑i=1

(p(X = t|yi)

πt− p(X = C|yi)

πC

)p(X = l|yi)(yij − θjl)p(yiΨ),

I3(βjq, πs) =

2P∑i=1

p(X = q|yi)(yij − θjq)(p(X = s|yi)

πs− p(X = C|yi)

πC

)p(yiΨ),

I4(βjq, βkl) =

2P∑i=1

p(X = q|yi)(yij − θjq)p(X = l|yi)(yik − θkl)p(yi,Ψ).

Note that p(yi,Ψ) =∑Ct=1 πt

∏Pj=1 θ

yijjt (1− θjt)1−yij is the probability for response


pattern yi. Moreover p(X = t|yi) = πtp(Y =yi)|X=t)p(yi,Ψ) is the posterior class membership

probability, where p(Y = yi)|X = t) =∏

Pj=1θ

yijjt (1− θjt)1−yij is the joint class-specific

probability.

2.6.2 An example of the Latent GOLD setup for Wald based power

computation

The Latent GOLD 5.0 (Vermunt & Magidson, 2013b) Syntax system implements the

power computation procedure described in this paper. In order to perform such a Wald

power computation, one should first create a small “example” data set; that is, a data

set with the structure of the data one is interested in. With six binary response variables

(y1 through y6), this file could be of the form:

y1 y2 y3 y4 y5 y6

0 0 0 0 0 0

which is basically a data set with a single observation with a response of 0 on all six

variables.

For this small data set, one defines the model of interest and requests the power or the

required sample size using the output options. This is done as follows using the Latent

GOLD “options”, “variables”, and “equations” sections:

options

output parameters standarderrors

WaldPower=<number> WaldTest=’fileName’;

variables

dependent y1 2, y2 2, y3 2, y4 2, y5 2, y6 2;

latent x nominal 2;

equations

x <- 1;

y1 - y6 <- 1 | x;

2.6. APPENDIX 35

{0.0000000000

1.386294361 -1.386294361

1.386294361 -1.386294361

1.386294361 -1.386294361

1.386294361 -1.386294361

1.386294361 -1.386294361

1.386294361 -1.386294361}

In the “variables” section, we define the variables which are in the model and also

their number of categories. These are the six response variables and the latent variable

“x”. The “equations” section specifies the logit equations defining the model of interest,

as well as the values of the population parameters. Note that the value 1.386294361 for

a logit coefficients corresponds to a conditional response probability of .80.

The “output” line in the “options” section lists the output requested. With

WaldPower=<number>, one requests a power or sample size computation. When using

a “number” between 0 and 1, the program reports the required sample size for that

power, and when using a value larger than 1, the program reports the power obtained

with that sample size. The optional statement WaldTest=‘filename’ can be used to define

user-specific Wald test in addition to the test which are provided by default. The linear

contrasts for the user-defined hypotheses of interest are defined in a text file.

CHAPTER 3

Statistical Power of Likelihood-Ratio and Wald Tests in Latent

Class Models with Covariates

Abstract

This chapter discusses power and sample size computation for the Likelihood-ratio and

Wald tests used to test the significance of covariate effects in latent class models with

covariates. For both tests asymptotic distributions can be used; that is, the test statistic

can be assumed to follow a central chi-square under the null hypothesis and a non-central

chi-square under the alternative hypothesis. Power or sample size computation using these

asymptotic distributions requires specification of the non-centrality parameter which in

practice is rarely known. We show how to calculate this non-centrality parameter using a

large simulated data set, a data set generated according to the model under the alternative

This chapter is in preparation for submitting to a journal.

37

38 CHAPTER 3. POWER IN LC MODELS WITH COVARIATES

hypothesis. Simulations are conducted to evaluate the adequacy of the proposed power

analysis methods, determine study design requirements for achieving a certain power level,

and compare the power of the likelihood-ratio and the Wald test. The proposed power

analysis methods turn out to perform very well over a broad range of conditions. Moreover,

an important factor affecting the power is the class separation, implying that when class

separation is low, rather large sample sizes are needed to achieve a reasonable power level.


3.1 Introduction

In recent years, latent class (LC) analysis has become part of the standard statistical

toolbox of researchers in the social, behavioral, and health sciences. A considerable

amount of articles have been published in which LC models are used (a) to identify

subgroups of subjects with similar behaviors, attitudes, or preferences, and (b) to

investigate whether the respondents’ class memberships can be explained by explanatory

variables such as age, gender, educational status, and treatment. This latter type of use

is often referred to as LC analysis with covariates or concomitant variables. Example

applications include the assessment of the effect of maternal education on latent classes

differing in health behavior (Collins & Lanza, 2010), of education and age on latent classes

with different political orientations (Hagenaars & McCutcheon, 2002), of age on latent

classes of crime delinquencies (Van der Heijden et al., 1996), and of paternal occupation

on latent classes with different gender-role attitudes (Yamaguchi, 2000). Methodological

aspects of the LC analysis with covariates were addressed by among others by Bandeen-

Roche, Miglioretti, Zeger, and Rathouz (1997), Dayton and Macready (1988), Formann

(1992), and Vermunt (1996).

As in standard logistic regression analysis, hypotheses about the effects of covariates

on the individuals’ latent class memberships are tested using either likelihood-ratio (LR)

or Wald tests (Agresti, 2007). Researchers planning to perform such tests often ask

questions such as: “What sample size do I need to detect a covariate effect of a certain

size?”, “If I want to test the effect of a covariate, should I worry about the number

and/or quality of the indicators used the LC model?”, and “Should I use LR or a Wald

test?”These questions can be answered by assessing the statistical power of the planned

tests; that is, by investigating the probability of correctly rejecting a null hypothesis when

the alternative is true. The aim of the current paper is to present power analysis methods

for the LR and the Wald test in LC models with covariates, as well as to assess the

data requirements for achieving an acceptable power level (say of .8 or larger). We also

compare the power of the LR and the Wald test for a range of design and population


characteristics.

Recently, power and sample size determination in LC and related models have received

increased attention in the literature. Gudicha et al. (in press) studied the power of the

Wald test for hypotheses on the association between the latent classes and the observed

indicator variable(s), and showed that power is strongly dependent on class separation.

Tein, Coxe, and Cham (2013) and Dziak, Lanza, and Tan (2014) studied statistical power

of tests used for determining the number of latent classes in latent profile and LC analysis,

respectively. To the best of our knowledge, no previous study has yet investigated power

analysis for LC analysis with covariates, nor compared the power of the LR and the Wald

test in LC analysis in general.

Hypotheses concerning covariate effects on latent classes may be tested using either

LR or Wald tests, but it is unknown which of these two types of tests is superior in this

context. While the LR test is generally considered to be superior (see, for example, Agresti

(2007) and Williamson, Lin, Lyles, and Hightower (2007)), the computational cost of the

LR test will typically be larger because it requires fitting both the null hypothesis and

the alternative hypothesis model, while the Wald test requires fitting only the alternative

hypothesis model. Note that when using LR tests, a null hypothesis model should be

estimated for each of the covariates, which can become rather time consuming given the

iterative nature of the parameter estimation in LC models and the need to use multiple sets

of starting values to prevent local maxima. A question of interest though is whether the

superiority of the LR test is substantial enough to outweigh the computational advantages

of the Wald test in the context of LC modeling with covariates.

For standard logistic regression analysis, various studies are available on power and

sample size determination for LR and Wald tests (Demidenko, 2007; Faul, Erdfelder,

Buchner, & Lang, 2009; Hsieh, Bloch, & Larsen, 1998; Schoenfeld & Borenstein, 2005;

Whittemore, 1981; Williamson et al., 2007). Here we not only build upon these studies,

but also investigate design aspects requiring special consideration when applying these

tests in the context of LC analysis. A logistic regression predicting latent classes differs

3.2. THE LC MODEL WITH COVARIATES 41

from a standard logistic regression in that the outcome variable, the individual’s class

membership, is unobserved, but determined indirectly using the responses on a set of the

indicator variables. This implies that factors affecting the uncertainty about the class

memberships, such as the number of indicators, the quality of indicators, and the number

of latent classes, will also affect the power and/or the required sample size (Gudicha et

al, in press) .

In the next section, we introduce the LC model with covariates and discuss the LR

and Wald statistics for testing hypotheses about the logit parameters of interest, present

power computation methods for the LR and the Wald tests, and provide a numerical study

illustrating the proposed power analysis methods. Then this chapter ends with discussion

and conclusions.

3.2 The LC model with covariates

Let X be the latent class variable, C the number of latent classes, and x = 1, 2, 3, ..., C

the class labels. We denote the vector of P indicator variables by Y = (Y1, Y2, Y3, ..., YP ),

and the response of subject i (for i = 1, 2, 3, ..., n) to a particular indicator variable by yij

and to all the P indicator variables by yi. Denoting the value of subject i for covariate

Zk (for k = 1, 2, 3, ...,K) by zik, we define the LC model with covariate as follows:

p(yi|zi) =

C∑x=1

p(X = x|Z = zi)

P∏j=1

p(Yj = yij |X = x) (3.1)

where zi is the vector containing the scores of subject i on the K covariates. The term

p(X = x|Z = zi) represents the probability of belonging to class x given the covariate

values zi, and p(Yj = yij |X = x) is the conditional probability of choosing response yij

given membership of class x.

The LC model defined in equation (3.1) is based on the following assumptions. Firstly,

we assume that the latent classes are mutually exclusive and exhaustive; that is, each

indvidual is a member of one and only one of the C latent classes. The second assumption


is the local independence assumption, which specifies that the responses to the indicator

variables are independent given the class membership. For simplicity, we also assume that,

given the class membership, the covariates have no effect on the indicator variables.

The term p(X = x|Z = zi) in equation (3.1) is typically modeled by a multinomial

logistic regression equation (Magidson & Vermunt, 2004). Using the first class as the

reference category, we obtain:

p(X = x|Z = zi) =exp (γ0x +

∑Kk=1 γkxzik)

1 +∑Cs=2 exp (γ0s +

∑Kk=1 γqszik)

,

where γ0x represents an intercept parameter and γkx a covariate effect. For each covariate,

we have C − 1 effect parameters. Assuming that the responses Yj are binary, the logistic

model for p(Yj = 1|X = x) may take on the following form:

p(Yj = 1|X = x) =exp(βjx)

1 + exp(βjx).

The γ parameters are sometimes referred as the structural parameters, and the β

parameters as the measurement parameters. We denote the full set of model parameters

by Φ, which with binary responses is a column vector containing (K + 1)(C − 1) +C · J

non-redundant parameters.

The parameters of the LC model with covariates are typically estimated by means of

maximum likelihood (ML) estimation, in which the log-likelihood function

l(Φ) =

n∑i=1

log p(yi|zi) (3.2)

is maximized using, for instance, the expectation maximization (EM) algorithm. Inference

concerning the Φ parameters is based on the ML estimates Φ, which can be used for

hypotheses testing or confidence interval estimation. In the current work, we focus on

testing hypotheses about the γ parameters, the most common of which is testing the

statistical significance for the effect of covariate k on the latent class memberships. The

corresponding null hypothesis can be formulated as

3.2. THE LC MODEL WITH COVARIATES 43

H0 : γk = 0,

which specifies that the γkx values in γ′

k = (γk1, γk2, γk3, ...γk(C−1)) are simultaneously

zero.2 Using either the LR or the Wald test, the null significance of this hypothesis is

tested against the alternative hypothesis:

H1 : γk 6= 0.

Following Agresti (2007)and Buse (1982), we define the LR and the Wald statistic for

this test as follows:

LR = 2l(Φ1)− 2l(Φ0)

W = γ′

kvar(γk)−1γk,

(3.3)

where l(.) is the log-likelihood function as defined in equation (3.2), Φ1 and Φ0 are the ML

estimates of Φ under the unconstrained alternative and constraint null model, respectively,

γk are the ML estimates for the logit coefficients of covariate Zk, and var(γk) is the C−1

by C − 1 covariance matrix for γk.

Probability theory for large samples suggests that, under certain regularity conditions,

if the null hypothesis holds, both the LR and W statistics asymptotically follow a central

chi-square distribution with C − 1 degrees of freedom (see for example Agresti (2007),

Buse (1982), and Wald (1943)). From this theoretical distribution, the p-value can

be obtained. Whether the null hypothesis should be rejected or retained is tested by

comparing the obtained p-value with the nominal type I error α. The decision rule is

to reject the null hypothesis if the p-value is smaller than α, or if the value of the test

statistic computed using equation (3.3) exceeds the critical value of the central chi-square

distribution that we obtain given C − 1 degrees of freedom and a type I error of α.

2For parameter identification, the logit parameter associated with the reference category is set to

zero, resulting in C − 1 non-redundant γ effect parameters. Note also that γ′

denotes the transpose ofa column vector γ.


3.3 Power and sample size computations

For power or sample size computation, not only the distribution of the test statistic under

the null hypothesis needs to obtain, but the distribution under the alternative hypothesis

as well. Under certain regularity conditions, if the alternative hypothesis holds, both the

LR and the Wald statistics follow a non-central chi-square distribution with C−1 degrees

of freedom and non-centrality parameter λ:

λLRn= n (2E[l(Φ1)]− 2E[l(Φ0)])

λWn= n

(γ′

kvar(γk)−1γk

).

(3.4)

Here, E[l(Φ1)] and E[l(Φ0)] denote the expected value of the log-likelihood for a

single observation under the alternative and null model, respectively, assuming that the

alternative model holds. In the definition of λWn, var(γk)−1 is the matrix of parameter

covariances based on the expected information matrix for a single observation. For the

Wald test, this large sample asymptotic approximation requires multivariate normality of

the ML estimates of the logit parameters, as well as that var(γk) is consistently estimated

by var(γk) (Redner, 1981; Satorra & Saris, 1985; Wald, 1943).

The power of a test is defined as the probability that the null hypothesis is rejected

when the alternative hypothesis is true. Using the theoretical distribution of the LR and

Wald tests under the alternative hypothesis, we calculate this probability as

powerLR = p(LR > χ2

(1−α)(C − 1))

powerW = p(W > χ2

(1−α)(C − 1)),

(3.5)

where χ2(1−α)(C − 1) is the (1 − α) quantile value of the central chi-square distribution

with C − 1 degrees of freedom, and LR and W are random variates of the corresponding

non-central chi-square distribution. That is, LR,W v χ2(C−1, λ), where λ is as defined

in equation (3.4).

Computing the asymptotic power (also called the theoretical power) using equation

3.3. POWER AND SAMPLE SIZE COMPUTATIONS 45

(3.5), requires us to specify the non-centrality parameter. However, in practice, this non-

centrality parameter is rarely known. Below, we show how to obtain the non-centrality

parameter using a large simulated data set, that is, a data set generated from the model

under the alternative hypothesis.

3.3.1 Calculating the non-centrality parameter

O’Brien (1986) and Self, Mauritsen, and Ohara (1992) showed how to obtain the non-

centrality parameter for the LR statistic in log-linear analysis and generalized linear analysis

using a so-called “exemplary” data set representing the population under the alternative

model. In LC analysis with covariates, such an exemplary data set would contain one

record for each possible combination of indicator variable responses and covariate values,

with a weight equal to the likelihood of occurrence of the pattern concerned. Creating

such an exemplary data set becomes impractical with more than a few indicator variables,

with indicator variables with larger numbers of categories, and/or when one or more

continuous covariates are involved. As an alternative, we propose using a large simulated

data set from the population under the alternative hypothesis. Though such a simulated

data set will typically not include all possible response patterns, if it is large enough, it

will serve as a good approximation of the population under H1.

By analyzing the large simulated data set using the H0 and H1 models, we obtain

the values of the log-likelihood function under the null and alternative hypotheses. The

large data set can also be used to get the covariance matrix of the parameters based

on the expected information matrix. These can be used to calculate the non-centrality

parameters for the LR and Wald statistics as shown in equation (3.4). More specifically,

the non-centrality parameter is calculated, using this large simulated data set, via the

following simple steps:

1. Create a large data set by generating say N = 1000000 observations from the model

defined by the alternative hypothesis.

2. Using this large simulated data set, compute the maximum value of the log-


likelihood for both the constrained null model and the unconstrained alternative

model. These log-likelihood values are denoted by l(Φ0) and l(Φ1), respectively. For

the Wald test, use the large simulated data to approximate the expected information

matrix under the alternative model. This yields var(γk), the approximate covariance

matrix of γk.

3. The non-centrality parameter corresponding to a sample of size 1 is then computed

as follows:

λLR1=

2l(Φ1)− 2l(Φ0)

Nand λW1

=γ′

kvar(γk)−1γkN

for the LR and Wald test, respectively. As can be seen, this involves computing

the LR and the Wald statistics using the information from step 2, and subsequently

rescaling the resulting values to a sample size of 1.

4. Using the proportionality relation between sample size and non-centrality parameter

as shown in equation (3.4), the non-centrality parameter associated with a sample

of size n is then computed as λLRn= nλLR1

and λWn= nλW1

(Brown, Lovato,

& Russell, 1999; McDonald & Marsh, 1990; Satorra & Saris, 1985).

3.3.2 Power computation

The power computation itself proceeds as follows:

1. Given the assumed population values under the alternative hypothesis, compute the

non-centrality parameter λ using the large simulated data set as discussed in section

3.3.1. Make sure that the non-centrality parameter is rescaled to the sample size

under consideration as shown in step 4 in section 3.3.1.

2. For a given type I error α, read the (1 − α) quantile value from the (central) chi-

square distribution with C−1 degrees of freedom. That is, find χ2(1−α)(C−1) such

that p(LR > χ2

(1−α)(C − 1))

= α and p(W > χ2

(1−α)(C − 1))

= α for the LR

3.3. POWER AND SAMPLE SIZE COMPUTATIONS 47

and Wald test statistics, respectively. This quantile – also called the critical value

– can be read from the (central) chi-square distribution table, which is available

in most statistics text books. For example, for α = .05 and C = 2, we have

χ2(.95)(1) = 3.84 (Agresti, 2007).

3. Using the non-centrality parameter value obtained in step 1 and the critical value

obtained in step 2, evaluate equation (3.5) to obtain the power of the LR or Wald

test of interest. This involves reading the probability concerned from a non-central

chi-square distribution with degrees of freedom C − 1 and non-centrality parameter

λ.

3.3.3 Sample size computation

The expression for sample size computation can be derived from the relation in equation

(3.4):

nLR = λ {2E[l(Φ1)]− 2E[l(Φ0)]}−1

nW = λ[γ′

kvar(γk)−1γk

]−1,

(3.6)

where nLR and nW are the LR and Wald sample size, respectively.

Using equation (3.6), the sample size required to achieve a specified level of power is

computed as follows:

1. For a given value of α, read the (1− α) quantile value from the central chi-square

distribution table.

2. For a given power and the critical value obtained in step 1, find the non-

centrality parameter λ such that, under the alternative hypothesis, the condition

that the power is equal to p(LR > χ2

(1−α)(C − 1))

for the LR statistic and

p(W > χ2

(1−α)(C − 1))

for the Wald statistic is satisfied.

3. Given the parameter values of the model under the alternative hypothesis and the

λ value obtained in step 2, use equation (3.6) to compute the required sample size.


Note that for sample size computation a large simulated data set is used as well to

approximate E[l(Φ0)], E[l(Φ1)], and var(γ).

3.4 Numerical study

The purpose of this numerical study is to 1) compare the power of the Wald test with the

power of the LR test, 2) investigate the effect of factors influencing the uncertainty about

the individuals’ class members – mainly the measurement parameters – on the power of

the Wald and LR tests concerning the structural parameters, 3) evaluate the quality of

the power estimation using the non-centrality parameter value obtained from the large

simulated data set, and 4) give an overview of the sample sizes required to achieve a power

level of .8 or higher, .9 or higher, or .95 or higher in several typical study designs. In the

current numerical study, we consider models with one covariate only, but the proposed

methods are also applicable with multiple covariates. We assume asymptotic distributions

for both the tests, and estimate the non-centrality parameter of the non-central chi-square

distribution using the large data set method described earlier. All analyses were done using

the syntax module of the Latent GOLD 5.0 program (Vermunt & Magidson, 2013a).

3.4.1 Study set up

The power of a test concerning the structural parameters is expected to depend on three

key factors: the population structure and the parameter values for the other parts of

the model, the effect sizes for the structural parameters to be tested, and the sample

size. Important elements of the first factor include the number of classes, the number

of indicator variables, the class-specific conditional response probabilities, and the class

proportions (Gudicha et al, in press). In this numerical study, we varied the number of

classes (C = 2 or 3) and the number of indicator variables (P = 6 or 10). Moreover, the

class-specific conditional response probabilities were set to 0.7, 0.8, or 0.9 (or, depending

on the class, to 1-0.7, 1-0.8, and 1-0.9), corresponding to the conditions with weak,

medium, and strong class-indicator associations. The conditional response probabilities


were assumed to be high for class 1, say 0.8, and low for class C, say 1-0.8, for all

indicators. In class 2 of the three-class model, the conditional response probabilities are

high for the first half and low for the second half of the indicators.

The effect size was varied for the structural parameters to be tested, that is, for the

logit coefficients that specify the effect of a continuous covariate Z on the latent class

memberships (see equation (3.2) above). Using the first class as the reference category,

the logit coefficients were set to 0.15, 0.25, and 0.5, representing the three conditions of

small, medium, and large effect sizes. Two conditions were used for the intercept terms:

in the zero intercept condition, the intercepts were set to zero for both C = 2 and C = 3,

while in the non-zero intercept condition the intercepts equaled -1.10 for C = 2, and -1.10

and -2.20 for C = 3. Note that the zero intercept condition yields equal class proportions

(i.e., .5 each for C = 2 and .33 each for C = 3), whereas the non-zero intercept condition

yields unequal class proportions (i.e., .75 and .25 for C = 2, and .69, .23, and .08 for

C = 3).

In addition to the above mentioned population characteristics, we varied the sample

size (n = 200, 500, or 1000) for the power computations. Likewise, for the sample size

computations, we varied the power values (power = .8, .9, or .95). The type I error was

fixed to .05 in all conditions.

Gudicha et al, (in press) showed that a study design with low separation between

classes leads to low statistical power of tests concerning the measurement parameters

in a LC model. Therefore, Table 3.1 shows the entropy R-square, which measures the

separation between classes for the design conditions of interest.

3.4.2 Results

Tables 3.2, 3.3, and 3.4 present the power of the Wald and LR tests for different sample

sizes, class-indicator associations, number of indicator variables, class proportions, and

effect sizes. Several important points can be noted from these tables. Firstly, the power

of the Wald and LR tests increases with sample size and effect size, which is also the


Table 3.1: The computed entropy R-square for different design cells

equal class proportions unequal class proportionsclass-indicator associations class-indicator associationsweak medium strong weak medium strong

C = 2 P = 6 .574 .855 .981 .534 .838 .978C = 2 P = 10 .732 .935 .997 .704 .944 .998

C = 3 P = 6 .354 .650 .900 .314 .618 .878C = 3 P = 10 .502 .805 .969 .462 .782 .963

Note. C = the number of classes; P = number of indicator variables. Theentropy R-square values reported in this table pertain to the model with smalleffect sizes for the covariate effects, and these entropy R-square values slightlyincrease for the case when we have larger effect sizes.

case for standard statistical models (e.g., logistic regression for an observed outcome

variable). Secondly, specific to LC models, the power of these tests is larger with stronger

class-indicator associations, a larger number of indicator variables, and more balanced

class proportions. These LC specific factors affect the class separations as well, as can be

seen from Table 3.1. Comparing the power values in Table 3.2 and 3.3, we also observe

that the statistical power of the tests depends on the number of classes as well. Thirdly,

the power of the LR test is consistently larger than the Wald test, though in most cases

differences are rather small.

The results in Tables 3.2, 3.3, and 3.4 suggest that, for a given effect size, a desired

power level of say .8 or higher can be achieved by using a larger sample, more indicator

variables, or, if possible, indicator variables that have a stronger association with the

respective latent classes. Given a set of often unchangeable population characteristics

(e.g., the class proportions, the class conditional response probabilities, and the effect

sizes of the covariate effects on latent class memberships), the most common research

practice to increase the power of a test is to increase the sample size. Table 3.5 presents

the required sample size for the Wald test to achieve a power of .8, .9, and .95 under

the investigated conditions. As can be seen from Table 3.5, for the situation where the

class proportions are equal, the number of response variables is equal to 6, the number

of classes is equal to 2, and the class-indicator associations are strong, a power of 0.80 or

higher is achieved 1) for a small effect size, using a sample of size 1434, 2) for a medium


effect size, using a sample of size 527, and 3) for a large effect size, using a sample of size

143. When the class-indicator associations are weak, the class proportions are unequal,

or the requested power is .9, the required samples become even larger. We also observe

from the same table that in 3-class LC models with 6 indicator variables and strong class-

indicator associations, a power of .80 or higher is achieved by using sample sizes of 2120,

777, and 210 for small, medium and large effect sizes, respectively.

To assess the accuracy of the proposed power analysis method, we also calculated the

empirical power by Monte Carlo simulation. Using the critical value from the theoretical

central chi-square distribution, we computed the empirical power as the proportion of the

p-values rejected in 5000 samples generated from the population under the alternative

hypothesis. In Table 3.6, we refer to this empirical power as ’LR empirical’ and ’Wald

empirical’, indicating the power values computed from the empirical distribution of the

LR and Wald statistics under the alternative hypothesis. We report results for the study

conditions with a small effect size and equal class proportions, but similar results were

obtained for the other conditions. Comparison of the theoretical with the corresponding

empirical power values shows that these are very close in most cases, meaning that the

approximation of the non-centrality parameter using the large simulated data set works

well. Overall, the differences between the theoretical and empirical power values are small,

with a few exceptions, which are situations in which the power is very low anyhow. The

exceptions occur when the class-indicator associations are weak in 2-class LC models with

6 indicator variables and in 3-class LC models with 6 as well as 10 indicator variables,

which in Table 3.1 correspond to the design conditions with entropy R-square values of

.574, .345, and .502, respectively.


Tab

le3.

2:T

he

pow

erof

the

Wal

dan

dth

elik

elih

oo

d-r

atio

test

tore

ject

the

nu

llh

ypot

hes

isth

atth

eco

vari

ate

has

no

effec

ton

clas

sm

emb

ersh

ipin

the

2-cl

ass

mo

del

;th

eca

seof

equ

alcl

ass

prop

orti

ons

n=

200

n=

500

n=

1000

effec

tcl

ass-

ind

icat

orcl

ass-

ind

icat

orcl

ass-

ind

icat

orsi

zeas

soci

atio

ns

asso

ciat

ion

sas

soci

atio

ns

wea

km

ediu

mst

ron

gw

eak

med

ium

stro

ng

wea

km

ediu

mst

ron

g

Six

ind

icat

orva

riab

les

smal

lW

ald

.12

5.1

64

.18

1.2

42

.33

8.3

79

.42

9.5

87

.64

5L

R.1

26

.16

6.1

80

.24

5.3

43

.37

7.4

34

.59

4.6

45

med

ium

Wal

d.2

69

.36

3.4

08

.54

6.7

21

.77

9.8

35

.94

5.9

71

LR

.26

0.3

69

.41

1.5

48

.72

9.7

84

.83

6.9

53

.97

3

larg

eW

ald

.70

2.8

68

.91

3.9

76

.99

81

11

1L

R.7

43

.88

5.9

23

.98

5.9

98

11

11

Ten

ind

icat

orva

riab

les

smal

lW

ald

.14

7.1

77

.18

4.2

97

.36

9.3

85

.52

3.6

33

.65

5L

R.1

51

.17

6.1

81

.30

7.3

67

.38

0.5

39

.63

.64

7

med

ium

Wal

d.3

19

.39

7.4

12

.65

3.7

66

.78

6.9

14

.96

7.9

74

LR

.31

5.4

02

.42

2.6

47

.77

3.7

96

.91

.96

9.9

76

larg

eW

ald

.81

2.9

03

.91

7.9

94

.99

9.9

99

11

1L

R.8

37

.91

8.9

30

9.9

96

.99

9.9

99

11

1

No

te.

Th

ep

ower

valu

esre

por

ted

inth

ista

ble

are

ob

tain

edby

assu

min

gth

eore

tica

lch

i-sq

uar

ed

istr

ibu

tio

ns

for

bo

thth

eW

ald

and

the

likel

iho

od

-rat

iote

stst

atis

tics

,fo

rw

hic

hth

en

on

-cen

tral

ity

par

amet

ero

fth

en

on

-cen

tral

chi-

squ

are

isap

prox

imat

edu

sin

ga

larg

esi

mu

late

dd

ata

set.


Tab

le3.

3:T

he

pow

erof

the

Wal

dan

dth

elik

elih

oo

d-r

atio

test

tore

ject

the

nu

llh

ypot

hes

isth

atth

eco

vari

ate

has

no

effec

ton

clas

sm

emb

ersh

ipin

the

3-cl

ass

mo

del

;th

eca

seof

equ

alcl

ass

prop

orti

ons

n=

200

n=

500

n=

1000

effec

tcl

ass-

ind

icat

orcl

ass-

ind

icat

orcl

ass-

ind

icat

orsi

zeas

soci

atio

ns

asso

ciat

ion

sas

soci

atio

ns

wea

km

ediu

mst

ron

gw

eak

med

ium

stro

ng

wea

km

ediu

mst

ron

g

Six

ind

icat

orva

riab

les

smal

lW

ald

.08

1.1

06

.12

5.1

31

.20

0.2

52

.22

2.3

65

.46

4L

R.0

80

.10

8.1

26

.13

0.2

06

.25

5.2

21

.37

7.4

71

med

ium

Wal

d.1

35

.21

4.2

72

.28

1.4

78

.59

9.5

17

.78

9.8

94

LR

.14

0.2

15

.27

2.2

95

.48

.60

0.5

40

.79

2.8

94

larg

eW

ald

.36

5.6

42

.77

9.7

52

.96

7.9

94

.96

81

1L

R.4

36

.68

6.8

10

.83

7.9

78

.99

6.9

89

11

Ten

ind

icat

orva

riab

les

smal

lW

ald

.08

9.1

18

.13

0.1

55

.23

3.2

65

.27

2.4

30

.49

LR

.09

2.1

19

.13

3.1

63

.23

6.2

74

.28

9.4

36

.50

4

med

ium

Wal

d.1

63

.25

2.2

87

.35

3.5

59

.62

8.6

32

.86

4.9

13

LR

.17

8.2

63

.29

0.3

91

.58

3.6

32

.68

6.8

82

.91

5

larg

eW

ald

.47

1.7

38

.80

7.8

71

.98

9.9

96

.99

41

1L

R.5

71

.77

2.8

23

.93

8.9

93

.99

7.9

99

11

No

te.

Th

ep

ower

valu

esre

por

ted

inth

ista

ble

are

ob

tain

edby

assu

min

gth

eore

tica

lch

i-sq

uar

ed

istr

ibu

tio

ns

for

bo

thth

eW

ald

and

the

likel

iho

od

-rat

iote

stst

atis

tics

,fo

rw

hic

hth

en

on

-cen

tral

ity

par

amet

ero

fth

en

on

-cen

tral

chi-

squ

are

isap

prox

imat

edu

sin

ga

larg

esi

mu

late

dd

ata

set.


Tab

le3.

4:T

he

pow

erof

the

Wal

dan

dth

elik

elih

oo

d-r

atio

test

tore

ject

the

nu

llh

ypot

hes

isth

atth

eco

vari

ate

has

no

effec

ton

clas

sm

emb

ersh

ip;

the

case

ofu

neq

ual

clas

spr

opor

tion

san

dsi

xin

dic

ator

vari

able

s

n=

200

n=

500

n=

1000

effec

tcl

ass-

ind

icat

orcl

ass-

ind

icat

orcl

ass-

ind

icat

orsi

zeas

soci

atio

ns

asso

ciat

ion

sas

soci

atio

ns

wea

km

ediu

mst

ron

gw

eak

med

ium

stro

ng

wea

km

ediu

mst

ron

g

2-c

lass

mo

del

smal

lW

ald

.10

2.1

33

.14

8.1

83

.26

3.2

99

.31

9.4

65

.52

5L

R.1

03

.13

6.1

53

.18

5.2

68

.31

2.3

22

.47

5.5

47

med

ium

Wal

d.1

95

.28

3.3

22

.41

1.5

90

.65

8.6

88

.87

2.9

18

LR

.19

7.2

82

.33

1.4

14

.59

0.6

74

.69

3.8

71

.92

6

larg

eW

ald

.54

9.7

61

.82

6.9

09

.98

8.9

96

.99

51

1L

R.5

90

.78

3.8

44

.93

3.9

91

.99

7.9

98

11

3-c

lass

mo

del

smal

lW

ald

.07

7.1

00

.12

0.1

20

.18

5.2

38

.19

8.3

34

.43

9L

R.0

76

.10

1.1

21

.11

9.1

88

.24

2.1

97

0.3

4.4

47

med

ium

Wal

d.1

25

.19

7.2

57

.25

3.4

39

.57

0.4

67

.74

6.8

73

LR

.12

7.2

08

.26

7.2

57

.46

5.5

93

.47

4.7

75

.88

9

larg

eW

ald

.33

7.6

00

.75

1.7

12

.95

1.9

90

.94

5.9

99

1L

R.3

87

.64

1.7

85

.78

2.9

66

.99

4.9

77

11

No

te.

Th

ep

ower

valu

esre

por

ted

inth

ista

ble

are

ob

tain

edby

assu

min

gth

eore

tica

lch

i-sq

uar

ed

istr

ibu

tio

ns

for

bo

thth

eW

ald

and

the

likel

iho

od

-rat

iote

stst

atis

tics

,fo

rw

hic

hth

en

on

-cen

tral

ity

par

amet

ero

fth

en

on

-cen

tral

chi-

squ

are

isap

prox

imat

edu

sin

ga

larg

esi

mu

late

dd

ata

set.


Tab

le3.

5:S

amp

lesi

zere

qu

irem

ents

for

the

Wal

dte

stw

hen

test

ing

the

cova

riat

eeff

ect

oncl

ass

mem

ber

ship

sfo

rd

iffer

ent

pow

erle

vels

,cl

ass-

ind

icat

oras

soci

atio

ns,

nu

mb

erof

ind

icat

orva

riab

les,

nu

mb

erof

clas

ses,

clas

spr

opor

tion

s,an

deff

ect

size

s.

power

=.8

power

=.9

power

=.95

effec

tcl

ass-

ind

icat

orcl

ass-

ind

icat

orcl

ass-

ind

icat

orsi

zeas

soci

atio

ns

asso

ciat

ion

sas

soci

atio

ns

wea

km

ediu

mst

ron

gw

eak

med

ium

stro

ng

wea

km

ediu

mst

ron

g

2-c

lass

mo

del

wit

heq

ual

clas

spr

op

orti

on

san

dsi

xin

dic

ator

vari

able

ssm

all

24

73

16

52

14

34

33

12

22

10

19

25

40

97

27

34

23

80

med

ium

91

16

06

52

71

21

08

11

70

51

50

91

00

38

72

larg

e2

53

16

51

43

33

82

21

19

14

18

27

32

36

2-c

lass

mo

del

wit

heq

ual

clas

spr

op

orti

on

san

dte

nin

dic

ator

vari

able

ssm

all

19

29

14

85

14

12

25

82

19

88

18

91

31

93

24

58

23

38

med

ium

70

95

44

51

89

49

72

96

93

11

73

90

18

57

larg

e1

94

14

81

40

26

01

98

18

83

21

24

52

32

2-c

lass

mo

del

wit

hu

neq

ual

clas

spr

op

orti

on

san

dsi

xin

dic

ator

vari

able

ssm

all

35

44

22

41

19

16

47

45

30

00

25

66

58

68

37

10

31

73

med

ium

13

06

81

17

00

17

49

10

98

93

72

16

31

35

71

15

9la

rge

36

22

21

18

74

84

29

52

50

59

93

65

31

03

-cla

ssm

od

elw

ith

equ

alcl

ass

pro

por

tio

ns

and

six

ind

icat

orva

riab

les

smal

l4

92

22

78

52

12

06

46

43

65

72

78

67

88

84

46

33

40

0m

ediu

m1

86

91

02

57

77

24

54

13

47

10

20

29

95

16

44

12

45

larg

e5

58

28

32

10

73

33

72

27

68

95

45

43

37


Tab

le3.

6:T

heo

reti

cal

vers

us

emp

iric

al(H

1-s

imu

late

d)

pow

erva

lues

for

the

Wal

dan

dlik

elih

oo

d-r

atio

test

sto

reje

ctth

en

ull

hyp

oth

esis

that

the

cova

riat

eh

asn

oeff

ect

oncl

ass

mem

ber

ship

,gi

ven

the

des

ign

con

dit

ion

sof

inte

rest

n=

200

n=

1000

clas

s-in

dic

ator

clas

s-in

dic

ator

asso

ciat

ion

sas

soci

atio

ns

wea

km

ediu

mst

ron

gw

eak

med

ium

stro

ng

2-c

lass

mo

del

wit

hsi

xin

dic

ator

vari

able

sW

ald

theo

reti

cal

.12

5.1

64

.18

1.4

29

.58

7.6

45

Wal

dem

pir

ical

.13

1.1

56

.17

6.4

29

.58

4.6

48

LR

theo

reti

cal

.12

6.1

66

.18

0.4

34

.59

4.6

45

LR

emp

iric

al.1

38

.17

7.1

82

.43

2.5

8.6

48

2-c

lass

mo

del

wit

hte

nin

dic

ator

vari

able

sW

ald

theo

reti

cal

.14

7.1

77

.18

4.5

23

.63

3.6

55

Wal

dem

pir

ical

.13

8.1

75

.19

6.5

13

.63

2.6

52

LR

theo

reti

cal

.15

1.1

76

.18

1.5

39

.63

.64

7L

Rem

pir

ical

.15

0.1

79

.18

9.5

37

.63

8.6

65

3-c

lass

mo

del

wit

hsi

xin

dic

ator

vari

able

sW

ald

theo

reti

cal

.08

1.1

06

.12

5.2

22

.36

5.4

64

Wal

dem

pir

ical

.18

7.1

34

.12

3.2

23

.36

8.4

54

LR

theo

reti

cal

.08

.10

8.1

26

.22

1.3

77

.47

1L

Rem

pir

ical

.23

8.1

46

.13

4.2

67

.37

4.4

56

3-c

lass

mo

del

wit

hte

nin

dic

ator

vari

able

sW

ald

theo

reti

cal

.08

9.1

18

.13

0.2

72

.43

0.4

90

Wal

dem

pir

ical

.16

9.1

18

.11

8.2

83

.42

6.5

08

LR

theo

reti

cal

.09

2.1

19

.13

3.2

89

.43

6.5

04

LR

emp

iric

al.1

61

.13

3.1

34

.28

6.4

43

.49

3

No

te.

Th

ep

ower

valu

esre

por

ted

inth

ista

ble

are

for

the

stu

dy

des

ign

con

dit

ion

sw

ith

smal

leff

ect

size

and

equ

alcl

ass

pro

por

tio

ns.



Hypotheses concerning the covariate effects on latent class membership are tested using

a LR test or a Wald test. In the current study, we presented and evaluated a power

analysis procedure for the LR and the Wald test in latent class analysis with covariates.

We discussed how the non-centrality parameter involved in the asymptotic distributions

of the test statistics can be calculated using a large simulated data set, and how the value

of the obtained non-centrality parameter can subsequently be used in the computation of

the asymptotic power or the sample size. The proposed method requires us to specify the

population values under the alternative hypothesis, as is typical in power computation.

A numerical study was conducted to study how data and population characteristics

affect the power of the LR test and the Wald test, to compare the power of the two

tests, and to evaluate the adequacy of the proposed power analysis method. The results

of this numerical study showed that, as in any other statistical model, the power of both

tests depend on sample size and effect size. In addition to these standard factors, the

power of the investigated tests depends on factors specific to latent class models, such as

the number of indicator variables, the number of classes, the class proportions, and the

strength of the class-indicator associations. These latent class specific factors affect the

separation between the classes, which we assessed using the entropy R-square value.

We saw that the sample size required to achieve a certain level of power depends

strongly on the latent class specific factors. The stronger the class-indicator variable

associations, the more indicator variables, the more balanced the class proportions, and

the smaller the number of latent classes, the smaller the required sample size that is

needed to detect a certain effect size with a power of say .8 or higher. We can describe

the same finding in terms of the entropy R-square, that is, the larger the entropy R-

square, the smaller the sample size needed to detect a certain effect size with a power of

say .8 or higher. A more detailed finding is that for a given effect size, the improvement in

power obtained through adding indicator variables is more pronounced when class-indicator

associations are weak or medium than when they are strong.


In line with the previous studies (see for example Williamson et al. (2007)), the

power for the LR test is larger than for the Wald test, though the difference between

the two tests is rather small. An advantage of the Wald test is, however, that it is

computationally cheaper. Given the population values under the alternative hypothesis

and the corresponding non-centrality parameter, the sample size for the Wald test can be

computed using equation (3.6) directly. When using the LR test, the log-likelihood values

under both the null hypothesis and the alternative hypothesis must be computed, which

can be somewhat cumbersome when a model contains multiple covariates.

The adequacy of the proposed power analysis method was evaluated by comparing

the asymptotic power values with the empirical ones. The results indicated that the

performance of the proposed method is generally good. In the study design condition for

which the entropy R-square is low – this occurs when few indicator variables with weak

associations with the latent classes are used – and the sample size is small, the empirical

power seemed to be larger than the asymptotic power. But these were situation in which

the power turned out to be very low anyhow.

We presented the large data set power analysis method for a simple LC model with

cross-sectional data, but the same method may be applied with LC models for longitudinal

and multilevel data. Moreover, although the simulations in the current paper were

performed with a single covariate, it is expected that increasing the number of covariates

to two or more would improve the entropy R-square and therefore also the power. The

method may also be generalized to the so-called three-step approach for the analysis

of covariate effects on LC memberships (Bakk, Tekle, & Vermunt, 2013; Gudicha &

Vermunt, 2013; Vermunt, 2010a).

This research has several practical implications. Firstly, it provides an overview of

the design requirements for achieving a certain acceptable level of power in LC analysis

with a covariate affecting class memberships. Secondly, it presents a tool for determining

the required sample size given the specific research design that a researcher has in mind

instead of relying on a rule of thumb. Based on the literature and on the results of our


study, we can conclude that easy rules of thumb, such as a sample size of 500 suffices

when the number of indicator variables is 6, cannot be formulated for LC analysis.

CHAPTER 4

Power Computation for Likelihood-Ratio Tests for the

Transition Parameters in Latent Markov Models

Abstract

Latent Markov (LM) models are increasingly used in a wide range of research areas

including psychological, sociological, educational, and medical sciences. Methods to

perform power computations are lacking, however. This chapter presents methods to

preform power analysis in LM models. Two types of hypotheses about the transition

parameters in LM models are considered. The first concerns the situation where the

likelihood-ratio test statistic follows a chi-square distribution, implying that also the power

This chapter has been accepted for publication as: Gudicha, D.W., Schmittmann, V. D.,& Vermunt, J. K. (2015). Power Computation for Likelihood-Ratio Tests for the TransitionParameters in Latent Markov Models. Structural Equation Modeling: A Multidisciplinary Journal, DOI:10.1080/10705511.2015.1014040.

61

62 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS

computation can be based on this theoretical distribution. In the second case, power needs

to be computed based on empirical distributions constructed via Monte Carlo methods.

Numerical studies are conducted to illustrate the proposed power computation methods

and to investigate design factors affecting the power of this test.


4.1 Introduction

Models involving latent classes are receiving increasing interest from applied researchers,

not only for the analysis of cross-sectional data but also in longitudinal studies, in which

respondents are assumed to switch between classes during the period of observation. The

occurrence of these transitions between latent classes (also called latent states) can be

studied by using latent Markov (LM) models, which are also referred to as hidden Markov

models or latent transition models (Collins & Wugalter, 1992; Poulsen, 1990; Rabiner,

1989; Van de Pol & De Leeuw, 1986; Visser, Raijmakers, & Molenaar, 2002).

This growing interest in LM models is fueled by both the progress that has been

achieved in extending the basic model (e.g., Wiggins (1973)) and the development of

various statistical packages for analyzing data using the LM models. Extensions to the

basic model include the use of time-constant and/or time-varying covariates (Chung,

Park, & Lanza, 2005; Reboussin et al., 1998; Vermunt et al., 1999), multiple response

variables (Bartolucci, 2006; Langeheine & Van de Pol, 1993; Wall & Li, 2009), and

grouping variable(s) (Collins & Lanza, 2010). These extensions, together with the growing

number of statistical packages (e.g., Latent GOLD (Vermunt & Magidson, 2013a), Mplus

(L. Muthen & Muthen, 1998-2007), the R-packages dempixS4 (Visser & Speekenbrink,

2010), and the SAS procedure PROC LTA (Lanza & Collins, 2008)) make it possible to

successfully apply LM models to many practical problems in longitudinal studies.

Despite these developments, methods to perform power computation in LM models

have received no attention in the methodological literature, as far as we know. In

many applications of LM models, hypotheses are typically tested using the likelihood-

ratio (LR) test without addressing power issues. Computing the power of tests (i.e., the

probability that the test rejects the null hypothesis when it is false) is, however, extremely

important for various reasons. When planning a study, power computation can help to

make an informed decision on the sample size or the number of measurement occasions

required to achieve a pre-specified power level for the tests of interest. When testing a

particular hypothesis, power calculation assesses the ability of a test to detect a statistically


meaningful effect when indeed there is such an effect in the population. This is of interest

when we wish to determine the usefulness of a test.

To perform a power calculation in LM models, we not only need to take into account

the sample size, effect size, and the level of significance, but also several other design

factors. For instance, in the latent class model, which can be conceived of as a special

case of the LM model, Gudicha et al.(in press) showed that a test can be underpowered

when associations between latent classes and response variables are weak, that is, if the

latent classes are poorly separated. See also Tein et al. (2013) who discussed statistical

power to detect the number of clusters in latent profile analysis. In LM models, also

the number of measurement occasions and the transition probabilities are expected to

affect the power. The objective of this chapter is twofold: to provide power computation

methods for hypotheses regarding the parameters of LM models and to identify design

factors that affect the power.

In general, two kinds of statistical test are of interest when using LM models. The

first kind pertains to hypotheses about the number of latent states (e.g., the test of a

model with three latent states against a model with two latent states). The second kind

concerns hypotheses for the parameters of the LM model, for example, for the transition

probabilities. In this chapter we focus on the latter type of test. More specifically,

we assume that the number of states is known, and focus on equality and fixed value

hypotheses for the model parameters. These include hypotheses stating that transition

probabilities are constant across time points, that certain transition probabilities are equal

to zero, or that transition probabilities are equal across two groups.

As we explain in detail below, for certain hypotheses on model parameters the standard

asymptotic results for the LR hold, implying that power computation can be based on

asymptotic distributions. For other hypotheses or, more specifically, for hypotheses stating

that probabilities are equal to zero, these asymptotic results do not hold (Bartolucci,

2006). For this non-standard situation in which asymptotic distributions cannot be used

for power computation, we propose constructing the empirical distribution of the LR

4.2. THE LM MODEL 65

statistic via Monte Carlo (MC) methods. Hereafter, we refer to the former and latter

situations as power computation under the standard and non-standard case, respectively.

The remainder of the chapter is organized as follows. We first introduce the LM model

and present examples of hypotheses that can be specified on the transition parameters of

this model. We then briefly explain the LR test and its asymptotic properties. Next, power

computation is presented for both the standard and the non-standard case. In addition,

we describe the design factors that affect the power of the LR test. We also present a

numerical study to illustrate the proposed power computation methods, and to examine

design configurations with acceptable power levels. In the final section, we provide a

discussion of the different power computation methods, as well as recommendations for

applied researchers and suggestions for future methodological studies.

4.2 The LM model

The LM model is considered as a probability based model, in which the observed response

patterns at a given time point are related with latent states producing these response

patterns, analogous to the latent class model. In addition, the probability of being in a

particular state at the current time point depends on the latent state of the previous time

point. The model has two sub-parts. The first sub-part, the measurement model, relates

the latent variables to the observed response variables. The second sub-part, the Markov

model, describes the probabilities of switching between latent states over time. The latter

model applies Markovian chains to account for the dependence between the latent states

at successive measurement occasions.

The LM model relies on two assumptions. The first is the local independence

assumption, which implies that the observed response patterns produced at time t depend

only on the current state. The second is the first-order Markov assumption, which implies

that the state occupied at time point t depends only on the state occupied at time point

t − 1 (Bartolucci, 2006; Vermunt et al., 1999). These two assumptions are specified on

the measurement and the Markov model, respectively. Below, we first introduce some


notation and then present the LM model.

Let yitj be the response of subject i to the jth response variable measured at occasion

t, for i = 1, 2, 3, .., n, j = 1, 2, 3, .., P , and t = 1, 2, 3, .., T . We denote the vector of

responses for subject i at occasion t by yit, and the vector of responses at all occasions

by yi. Let us denote a discrete latent state at time point t by Xt and its possible value

by xt where xt = 1, 2, 3, ...C. Then the probability of observing the response pattern yi

can be defined as

p(yi,Φ) =∑x1

∑x2

...∑xT

initial state probabilities︷︸︸︷p(X1 = x1)

∏t

transition probabilities︷︸︸︷p(Xt = xt|Xt−1 = xt−1)

∏j

p(yitj |Xt = xt)︸︷︷︸conditional response probabilities

(4.1)

where Φ is the vector of model parameters.

As can be seen from equation (4.1), the LM model has three fundamental sets of

parameters: The initial state probabilities, p(X1 = x1), the transition probabilities,

p(Xt = xt|Xt−1 = xt−1), and the conditional response probabilities, p(yitj |Xt = xt).

The initial state probabilities show the state proportions (or sizes) at the first measurement

occasion. The transition probabilities, conveniently collected in the so-called transition

matrix A(t) as shown below, provide the probabilities of switching between the states from

one measurement occasion to the next. The conditional response probabilities provide

information on the association between states and the response variables.

If we set the number of states to 3, for example, the transition between latent states

at time point (t− 1) and t can be expressed using a matrix of transition probabilities as


A(t) =

t

π1|1 π2|1 π3|1

t− 1 π1|2 π2|2 π3|2

π1|3 π2|3 π3|3

, (4.2)

where the principal diagonal elements of matrix A(t) represent the probability of staying

in the same state between consecutive measurement occasions, and the off-diagonal

elements are the probabilities for switching from a particular state at time t − 1 to

another particular state at time t. For instance, π1|1 = p(Xt = 1|Xt−1 = 1)

represents the probability of remaining in state 1 at the current measurement occasion,

and π3|2 = p(Xt = 3|Xt−1 = 2) represents the probability of switching from state 2 at

the previous measurement occasion to state 3 at the current measurement occasion.

In certain applications, the effect of one or more covariates Z on the transition

probabilities may also be of interest. For instance, the effect of a dichotomous grouping

variable (e.g., Z = 0 for the control group, and Z = 1 for the treatment group). This

can be done by inserting the covariate Z into equation (4.1) as follows:

p(yi,Φ|zi) =∑x1

∑x2

...∑xT

p(X1 = x1)∏t

p(Xt = xt|Xt−1 = xt−1, zi)

∏j

p(yitj |Xt = xt). (4.3)

When covariates are included, the transition probabilities are generally re-parameterized

by specifying a multinomial logistic regression:

p(Xt = r|Xt−1 = s, zi) = πr|s =exp (βsr + γsrzi)∑cl=1 exp (βsl + γslzi)

. (4.4)

That is, in this case, we estimate the logit coefficients β and γ as parameters, rather than

the probabilities π directly.


4.2.1 Hypotheses specified on transition parameters

We can distinguish several types of hypotheses on the transition parameters of LM models.

Table 4.1 contains a classification of the most common hypotheses. A first distinction

concerns whether the hypothesis implies an equality constraint (say, π1|2 = π2|1), or a

fixed value constraint (say, π1|2 = 0.3); this is shown in the first column of Table 4.1.

Fixed value constraints can be further distinguished into boundary constraints, where the

parameter is fixed on a value on the boundary of the parameter space (i.e., zero or one

for probabilities) and non-boundary constraints, where the parameter is fixed to a value

inside the parameter space. This distinction is important, because fixed value boundary

constraints require non-standard hypothesis testing and power calculation methods, as

we address in detail below. Which testing methods can be used is shown in the last

column of Table 4.1. Further distinctions concern whether the constraints are imposed

on a specific parameter or on the whole set of transition parameters, and whether the

constraints are imposed on the transition parameters of the basic LM model (i.e., on the

probabilities) or on the transition parameters of the LM model with covariates (i.e., on

the logit coefficients).

We will now describe the hypotheses in Table 4.1 in more detail. H10 states the

probability of switching from state s to state r is equal to the probability of switching

from state r to state s; H20 states that given state s, the probability of a transition to state

r and state k is equal; H30 indicates that the probabilities in two cells of the transition

matrix are equal (e.g., π1|2 = π4|3); H40 assumes the transition matrix is symmetric (i.e.,

A equals its transpose); H50 implies that the transition matrix is time homogeneous; H6

0

sets the transition matrix of one group (e.g., the treatment group) equal to that of another

group (e.g., the control group); H70 fixes the probability of switching from state s to state

r to v, where v ∈ (0, 1) can be any user defined value; H80 defines the covariate to have no

effect on the probability of switching to the state r; H90 sets the probability of switching

from state s to state r to 0; and H100 assumes the transition matrix is diagonal, meaning

that there are no changes in state over time.


Table 4.1: Typical hypotheses formulated on the transition parameters of the latentMarkov model

hypothesesconstraint on selected on whole testing

types transition parameters transition matrix methods

equalityH1

0 : πr|s = πs|r for some r, s H40 : A = A

′

standardH20 :πr|s = πk|s for some s H5

0 : A(t) = AH3

0 :πr|s = πk|l for some r, s, k, l H60 : A1 = A2

fixedvalue

non- H70 : πr|s = v, v ∈ (0, 1)

standardboundary H8

0 : γr = 0on H9

0 : πr|s = 0 H100 : A = diag{πs|s}, non-standard

boundary for some r, s for r = 1, 2, 3, ..., c

Note. A = a square matrix with entries equals the transition probabilities, πr|s; A′= transpose of

matrix A; A(t) = probability matrix for transitions between states at time point t− 1 and t.

It should be noted that in certain applications, hypotheses about the initial state and

the conditional response probabilities may be of interest as well (Visser et al., 2002).

As for the transition probabilities, also for the conditional response probabilities, one may

define equality or fixed-value restrictions, as discussed for latent class models by Goodman

(1974) and Mooijaart and Van der Heijden (1992). This means that the distinction

between different types of hypotheses in Table 4.1 (i.e., equality constraints, boundary

fixed value constraints, and non-boundary fixed value constraints) can analogously be

applied to hypotheses about the initial state and conditional response probabilities.

4.2.2 Parameter estimation

Hypothesis testing using the LR requires estimating both the restricted model defined by

the hypothesis of interest and the unrestricted model by means of maximum likelihood.

Assuming that the responses of the individuals are identically and independently

distributed, the log-likelihood for the model defined in equation (4.1) (also for equation

(4.3)) can be specified as

l(Φ) =∑i

log (p(yi)), (4.5)

As in other latent class and mixture models, maximum likelihood estimates of the

parameters of LM models can be obtained using the expectation maximization (EM)


algorithm. This is an iterative method which alternates between the E step in which

the expected value of the complete data log-likelihood – the log-likelihood if the latent

states would be observed – conditional on the observed data and the current parameter

estimates is computed, and the M step in which the parameters are updated by maximizing

the expected complete data log-likelihood (McLachlan & Krishnan, 2007).

Estimating the parameters using the above mentioned procedure is, however, not

always straightforward. Firstly, the log-likelihood function in equation (4.5) may contain

local maxima to which the optimization algorithm may converge (Visser et al., 2002).

Inference based on such a local maximum may result in erroneous conclusions about the

parameters and the fit of the LM model of interest. To prevent local maxima, one should

therefore make sure that the model is re-estimated with multiple sets of start values.

Secondly, in LM models, initial state, transition, and measurement model probabilities

are mutually dependent. Because of this contingency, misspecification in one part of the

model (e.g., the transition part) affects the estimate of parameters for the other part

(e.g., measurement model).

4.3 The likelihood-ratio test

Once the parameter estimates and the corresponding log-likelihood values are obtained

for the null (restricted) and the alternative (unrestricted) model, hypotheses such as those

presented in Table 4.1 can be tested using the LR. We define the LR statistic to compare

the null and alternative models as

LR = −2(l(Φ0)− l(Φ1)),

where l(.) is the log-likelihood function as shown in equation (4.5), and Φ1 and Φ0

are the parameters of the model under the alternative and null hypotheses, respectively.

Alternatively, the LR can be obtained by taking the difference between the goodness-of-fit

test statistics of the null and the alternative model, that is, LR = LR0 − LR1, where

4.4. POWER COMPUTATION 71

LR0 and LR1 compare the model concerned with the saturated model.

Under certain regularity conditions, under the null hypothesis, the LR follows

a (central) chi-square distribution with df degrees of freedom (Giudici, Ryden, &

Vandekerkhove, 2000). The number of degrees of freedom of the test is determined

by subtracting the number of parameters under the null from the number of parameters

under the alternative hypothesis. The general principle of this test is to reject the null

hypothesis if the observed value of the LR exceeds the (1− α) quantile value, also called

the critical value, of the central chi-square distribution with df degrees of freedom. Such

a testing procedure can be classified under what we referred to above as the standard

case. However, there are hypotheses for which the LR statistic does not follow a chi-

square distribution. For example, with hypotheses of the type H90 : πr|s = 0 and H10

0 :

A = diag{πs|s} (see Table 4.1).

4.4 Power computation

To compute the power, we should know or estimate the distribution of the test statistic

under both the null and alternative hypothesis. The distribution under the null hypothesis,

which is indicated with H0 in Figure 4.1, is required to compute the critical value, Q1−α,

corresponding to the pre-defined type 1 error α. The distribution under the alternative

hypothesis, indicated with H1 in Figure 4.1, is required to compute the power, that is,

the probability that the test statistic exceeds this critical value given that the alternative

hypothesis is true. In Figure 4.1, this probability corresponds to the shaded area; that is,

the area below the H1 curve to the right of the vertical dashed line at the critical value.

The next sub-sections describe various procedures for computing this probability under

the standard and non-standard testing cases.

4.4.1 The standard case

As already mentioned above, in the standard case, the LR statistic follows a central

chi-square distribution when the null hypothesis holds. When instead the alternative


Q1−α

H0

H1

power

Figure 4.1: Distribution of the likelihood-ratio statistic under the null and alternativehypotheses and the statistical power.

hypothesis holds, which is what we assume in power computation, the distribution of the

LR becomes a non-central chi-square. One approach to power computation involves

computing the non-centrality parameter λ, which quantifies the extent to which the

distribution of the LR under the alternative hypothesis deviates from its distribution under

the null hypothesis.

First let us describe a general power computation approach which does not require the

computation of the non-centrality parameter. Instead, the empirical distribution of the

LR under the alternative hypothesis is constructed using a Monte Carlo (MC) procedure.

This procedure, which we refer to as MC-based power computation, works as follows.

Step 1. A sample of a specified size, n, is repeatedly simulated (say M times) from the

population under the alternative hypothesis, and for each of these samples, the LR value

is computed by estimating both the null model and the alternative model. We denote the

LR value obtained with the m sample by LRm.

Step 2. The actual power associated with a sample of size n is computed as the proportion

of the simulated data sets in which the null hypotheses is rejected given the critical value


Q1−α, which can be obtained from the central chi-square distribution with df degrees of

freedom. More formally,

powerMC1=

∑Mm=1 I(LRm > Q1−α)

M, (4.6)

where I(LRm > Q1−α) is an indicator function taking the value 1 when the LR value of

the mth sample exceeds the critical value, and is 0 otherwise.

The second, more elegant and more standard way of power computation involves

obtaining an estimate of the non-centrality parameter and subsequently computing the

power for a given n using the non-central chi-square distribution concerned. We discuss

two methods to obtain the non-centrality parameter, which both require analyzing a

single constructed data set. The first method, which we refer to as the exemplary data

method, uses a data file which is exactly in agreement with the population model under

the alternative hypothesis (O’Brien, 1986; Self et al., 1992). Power computation is

implemented in four steps as follows.

Step 1. An ’exemplary’ data set is created, which contains all possible response patterns

with weights equal to the model expected proportions under the alternative hypothesis.

Step 2. Using this data set, the log-likelihood is computed for both the constrained null

and the alternative model.

Step 3. The non-centrality parameter is approximated as

λ1 = −2(l(Φ0)− l(Φ1)), (4.7)

where l(Φ0) and l(Φ1) are the log-likelihood values under the null and the alternative

hypothesis, respectively, and λ1 represents the noncentrality corresponding to a sample

size of 1. Note that λ1 can also be computed as the difference between the goodness-of-fit

tests for the null and alternative models. Since the latter equals 0 (the alternative model

fits perfectly), λ1 equals the value of the likelihood-ratio goodness-of-fit statistic obtained

when estimating the null model.


Step 4. The non-centrality parameter obtained in Step 3 is rescaled to the sample size

n of interest. This is achieved using the proportionality between the sample size and the

non-centrality parameter: λn = n · λ1, where λn denotes the non-centrality parameter

for sample size n (Satorra & Saris, 1985; R. C. MacCallum, Browne, & Cai, 2006). The

power can now be computed as

power = p (LR > Q1−α(df)) = Fχ2(Q1−α, df, λn), (4.8)

where Fχ2(df, λ) is a function for a non-central chi-square distribution with df degrees of

freedom and non-centrality parameter λn, and Q1−α = χ2(1−α)(df) is the (1−α) quantile

value of the central chi-square distribution.

The number of response patterns in the exemplary data set, which depends on the

number of measurement occasions, the number of response variables, and the number

of response categories, can quickly become very large. For instance, even in a relative

small problem with four time points (T = 4) and six response variables (P = 6) with

two categories, the number of possible response patterns is already larger than 16 million.

This shows that the exemplary data method may quickly become impractical. We propose

resolving this problem by using a large simulated data set from the population under the

alternative hypothesis instead of an exemplary data set. We refer to this alternative to

the exemplary data method as the ’large simulated data’ method. The steps that need

to be taken for power computation are the following:

Step 1. Generate a large data set, say of size N = 100000, according to the model under

the alternative hypothesis.

Step 2. Estimate the models under both the null and the alternative hypotheses based

on the data obtained in Step 1. This yields the log-likelihood values for both models.

Step 3. Compute the non-centrality parameter as

λ1 =−2(l(Φ0)− l(Φ1))

N, (4.9)


where λ1 is again the noncentrality parameter for a sample size of 1. Note that now the

likelihood-ratio goodness-of-fit test is not equal to 0 under the alternative model, which

means that we have to estimate both models.

Step 4. As in the exemplary data method, get λn = n · λ1 and obtain the power using

equation (4.8).

4.4.2 The non-standard case

In the non-standard case, the regularity conditions under which the LR follows an

asymptotic χ2-distribution are not satisfied. This happens, for instance, if parameters

are fixed on the boundary of the parameter space, as in hypotheses H90 and H10

0 . In this

non-standard case, the cut-off value Q1−α is generally not equal to χ2(1−α)(df). Thus,

the critical value of the LR obtained from the central chi-square cannot be used in the

subsequent power computation under the alternative hypothesis. Neither can we use the

non-central chi-square to approximate the distribution of the LR under the alternative

hypothesis, implying that the theoretical distributions mentioned above cannot be used

here for power computation. However, with the advance in computing facilities, instead

of relying on these theoretical distributions, one may compute power by applying MC

simulations. Two MC simulations are needed: one simulation is performed to obtain the

value of Q1−α, and the other simulation is performed to compute the power given the

Q1−α.

More specifically, in order to compute the value of Q1−α, the empirical distribution of

the LR under the null hypothesis should be constructed first. That is, generate M data sets

according to the model under the null hypothesis, and compute the LR statistic for each of

these samples. For sufficiently large M , the distributions of these LR values approximate

the population distribution of the LR statistic under the null hypothesis. Next, for the

specified α-level, this (1− α) quantile value is obtained as the value LR(1−α) that splits

the sorted LR values into the following two sets: the 100(1−α) percent smaller LR values

and the top 100α percent large LR values. That is,


Q1−α = {LR(1−α) : p(LR > LR(1−α)|H0) = α}. (4.10)

Similarly, the distribution of the LR under the alternative hypothesis is constructed

using M samples, but now generated according to the model under the alternative

hypothesis. Using this distribution the power is computed as the proportion of these

LR values that exceeds the Q1−α value obtained from equation (4.10). That is,

powerMC01= p(LR > Q1−α|H1) =

∑Mm=1 I(LRm > Q1−α)

M, (4.11)

where I(·) is again an indicator function indicating whether the LR value (computed

based on the H1 sample) exceeds the Q1−α value. Note that powerMC01in equation

(4.11) indicates that MC methods are applied under both the null and the alternative

hypothesis, while powerMC1in equation (4.6) indicates MC simulation is applied under

the alternative hypothesis only.

4.5 Design factors

As in other standard statistical model, the power of a test in LM models depends on the

significance level α, the effect size (difference between the parameter values under the

null and alternative hypotheses), and the sample size. This can be explained using Figure

4.1. If the value of α becomes larger, Q1−α shifts to the left, and consequently the region

under the curve H1 to the right of Q1−α gets larger. This implies that the larger α, the

larger the power of a test. For a fixed α-level, if the effect size gets larger, the value of

the non-centrality parameter gets larger, meaning that the curve indicated by H1 shifts to

the right, and consequently the power becomes larger. From the non-centrality parameter

and sample size relationship, λn = n · λ1, if the sample size increases, the non-centrality

parameter increases, meaning that the overlap between the probability distributions under

H0 and H1 decreases and thus the power increases.

In addition to these standard factors, the power of a test in LM models is expected


to depend on aspects of the measurement part and the transition part of these models.

In LM analysis, state membership is not directly observable, but is determined based on

responses provided to a set of observed response variables. Among others, uncertainty

about the state membership depends on the number of states, the state proportions,

the number of response variables, and the strength of the association between the latent

states and the response variables. See for example Collins and Lanza (2010) and Gudicha

et al.(in press). The stronger the state-response associations, the better the separation

between states will be. The better the states are separated, the less uncertain we are about

the respondents state membership given his/her responses to the observed variables. Also,

each additional measurement occasion provides additional information regarding the way

in which respondents change their state membership.

4.6 Numerical study

The purpose of this numerical study is to 1) illustrate the power computation methods

under the standard and non-standard case, 2) investigate how the study design factors

and the population model characteristics mentioned above may affect the power, and 3)

identify which design configurations yield an acceptable power level (power ≥ 0.8). We

focus on three of the hypotheses shown in Table 4.1. The first two hypotheses, H40 :

πr|s = πs|r for some r, s in the basic LM model, and H80 : γr = 0 in the LM model with

covariates are examples of the standard case. The third hypothesis concerns testing for

a zero entry in the transition probability matrix, H90 : πr|s = 0 for some s and r, is an

example of the non-standard case. The latent GOLD program (Vermunt & Magidson,

2013a) syntax examples used to perform this numerical study is shown in appendix.


In this numerical study, we restricted ourselves to LM models for dichotomous response

variables (say with the categories negative and positive). The α value was always assumed

to be .05, given it has obvious effect on the power of the statistical tests and the fact


that this value is often fixed in advance.

We varied the sample size (n = 300, 500, or 900), the number of measurement

occasions (T = 2 or 4), the number of latent states (C = 2 or 3), the number of response

variables (P = 6 or 10), the initial states proportions (uniform or non-uniform), the

strength of the association between latent states and response variables (weak, medium,

or strong), and the stability of the state membership (unstable, moderately stable, and

stable). The non-uniform initial state proportions were set to (0.7, 0.3) for C = 2 and to

(0.6, 0.3, 0.1) for C = 3). The settings for the association between states and responses

were specified using response probabilities equal to 0.7, 0.8, and 0.9 (or .3, .2, and

.1), respectively. For example, in the weak association condition, the probability of a

positive response was set to 0.7 for all variables in latent state one, to 0.3 in latent state

two, and to 0.7 for the first half of the items and to 0.3 for the remaining items in

state three. The basic settings for the latent transitions where obtained by setting the

main diagonal elements of the transition matrix to πr|r = 0.7, 0.8, or 0.9, and the other

elements to1−πr|rC−1 , which corresponds to unstable, moderately stable, and highly stable

state memberships, respectively.

1) For hypothesis H10 : πr|s = πs|r, we arrived at the respective transition matrices

under the alternative model (i.e., with differences in the off-diagonal elements) by

specifying the transition odds-ratios comparing the transition from s to r with the

transition from r to s, which is defined asπr|s/πs|sπs|r/πr|r

, to be equal to 1.3498, 1.8222 and

3.3201. These odds-ratios, which we hereafter refer to as small, medium, and large effect

sizes, correspond to differences in the transition probabilities ranging from 0.01 to 0.25.

2) For hypothesis H80 ; γr = 0, we added the effect of a dichotomous covariate on

the transitions. The covariate effect was specified by setting its (effect coded) coefficient

in the logistic regression model for the transitions to 0.25, 0.5, and 1. Or equivalently,

by setting the transition odds-ratio to 1.648, 2.7182, and 7.389, which corresponds to a

small, medium, and large effect size, respectively.

3) For hypothesis H90 ; π1|2 = 0, we restricted ourselves to the situation with T = 2,


P = 6, π1|1 = 0.7 or 0.9, C = 2, and equal initial state probabilities. This setting gives

a transition matrix of the type

0.7 0.3

δ 1− δ

and

0.9 0.1

δ 1− δ

,where δ is the value of πr|s under the alternative hypothesis. The value of δ, which

defines the effect size for the hypothesis that πr|s = 0, was set to 0.05, 0.1, or 0.2. The

association between states and response variables was set to weak, medium, and strong

as defined earlier.

4.6.2 Results

Table 4.2 presents the power to reject the null hypothesis πr|s = πs|r. As expected, the

power of the LR test depends on the association between the latent states and the response

variables, the number of measurement occasions, the sample size, the size of initial states,

the number of response variables, the transition probabilities, and the effect sizes. More

specifically, the stronger the association between latent states and response variables, the

larger the power. For a given number of response variables and measurement occasions,

say P = 6 and T = 2, reasonable power levels are achieved by increasing the sample

size. Or, for a given sample size, say n = 300, reasonable power levels are achieved by

increasing the number of response variables or measurement occasions. The power gain

achieved by increasing number of measurement occasions from 2 to 4 is larger than the

power gain achieved by increasing the sample size from 300 to 900. Also, with the current

design and population model characteristics, sampling from a population with equal initial

state probabilities increases the power.

Another interesting observation is that, keeping the other design factors constant, the

more unstable the state membership ( or the larger the transition probabilities), the larger

the power to demonstrate differences between transition probabilities. One can also see

from Table 4.2 that for the situation where the initial state proportions/probabilities are


equal, the number of response variables is equal to 6, and the sample size is equal to 300,

a power of 0.80 or higher is achieved 1) for low transition probabilities, when the effect

size is large and the association between states and response variables is strong, 2) for

moderate transition probabilities, when the effect size is large and the association between

states and response variables is medium or strong, and 3) for high transition probabilities,

when the effect size is large. For the situation when we have weak associations between

states and response variables, highly stable transition probabilities, and a low or moderate

effect size, such a power level is achieved only at the expense of increasing the sample

size, or the number of measurement occasions.

For the 3-state LM model, we do not show the results of the power calculation, as

they provide similar information as the 2-state LM model shown in Table 4.2. We should

however note that the power to demonstrate differences in the transition probabilities for

the 3-state LM model is in general lower than its corresponding power value for the 2-state

LM model, implying that the power depends on the number of states as well.

The power of the LR test to reject the null hypothesis that the covariate has no

effect on the transition probabilities is shown in Table 4.3. With respect to the design

factors, the general trend found is similar to the results from Table 4.2. That is, power

increases with sample size and effect size. Also, for a fixed sample size and effect size,

one can achieve a desired level of power by improving the measurement part of the LM

model, for example, by using response variables which have a strong association with the

latent states, or by increasing the number of response variables. Increasing the number

of measurement occasions could also greatly help in obtaining a desired power level. One

can also see from this table that power for the effect of the covariate on the transition

probabilities becomes larger when the state membership is more unstable, say πr|r = 0.7,

than when we have highly stable states, say πr|r = 0.9.

Table 4.4 shows the power to reject the null hypothesis that the probability of switching

from one state at time point (t − 1) to another state at time point t equals zero. As

compared with the other two hypotheses, the roles of transition probabilities on power is


small. Whereas the role of state-response association on the power is high. For example,

in a sample of 100 observations, if the state-response association is weak, the power to

detect a small proportion difference of .05 from the null hypothesis stating a proportion

of 0, is lower than 20 %. In contrast, when this state-response association is strong, the

power becomes over 90 %. When the state-response associations are strong, that is the

measurement model is strong, the state separation becomes high. In such a condition,

the possibility of observing the expected pattern from state 1 while in state 2 becomes

extremely small. Therefore, when the true underlying model generates some transitions

that are impossible under the restricted model, the likelihood of this restricted model will

decrease dramatically, because each impossible transition, say, a transition from state 1

to state 2, needs to be accommodated by assigning the observed pattern from state 1

to state 2, or vice versa. The decrease in the likelihood of the restricted model will be

accompanied by biased parameter estimates: the estimates of the conditional response

probabilities in the two states will be biased to be closer to each other, to increase the

likelihood of observing a state 1 pattern given state 2 or a typical state 2 pattern given

state 1.

The results presented in Tables 4.2 and 4.3 concern the standard case, and were thus

obtained using the large data method which assumes that the distributions under H0 and

H1 are known. That is, we assume a central chi-square under the null and non-central

chi-square under the alternative hypothesis, for which the value of the non-centrality

parameter is approximated based on a large data set. The results showing the quality of

the latter approximation as well as the asymptotic approximation of the chi-square itself

are presented in Table 4.5. As can be seen from this table, both the large data set and

the chi-square approximations are very good.


Tab

le4.

2:T

he

pow

erof

the

likel

iho

od

-rat

iote

stto

reje

ctth

en

ull

hyp

oth

esis

thatπr|s

=πs|r

inth

e2-

stat

ela

ten

tM

arko

vm

od

el

πr|r

=0.9

πr|r

=0.8

πr|r

=0.7

effec

tst

ate-

resp

on

sest

ate-

resp

on

sest

ate-

resp

on

sesi

zeas

soci

atio

nas

soci

atio

nas

soci

atio

nw

eak

med

ium

stro

ng

wea

km

ediu

mst

ron

gw

eak

med

ium

stro

ng

Eq

ual

init

ial

stat

e,P

=6

,an

dT

=2

smal

l.0

75

.09

4.1

17

.09

6.1

42

.17

6.1

13

.19

9.2

24

n=

300

med

ium

.16

6.2

79

.30

6.2

21

.44

4.5

32

.28

0.5

81

.62

2la

rge

.39

7.7

14

.87

3.6

82

.95

4.9

81

.79

5.9

84

.99

6

smal

l.0

93

.12

5.1

64

.12

8.2

06

.26

2.1

56

.29

9.3

41

n=

500

med

ium

.16

2.4

25

.46

6.3

35

.65

1.7

50

.42

7.7

98

.83

5la

rge

.59

2.9

03

.98

0.8

81

.99

7.9

99

.94

91

.00

1.0

0

smal

l.1

27

.18

7.2

56

.19

2.3

31

.42

6.2

43

.48

5.5

47

n=

900

med

ium

.25

4.6

61

.71

1.5

39

.88

3.9

42

.66

3.9

76

.97

6la

rge

.83

7.9

92

1.0

0.9

88

1.0

01

.00

.99

81

.00

1.0

0E

qu

alin

itia

lst

ate,P

=10

,an

dT

=2

smal

l.0

75

.10

8.1

27

.13

1.1

51

.18

1.1

53

.20

3.2

38

n=

300

med

ium

.19

5.2

85

.33

5.3

88

.50

5.5

44

.45

5.6

22

.64

5la

rge

.55

9.8

36

.86

5.8

74

.97

2.9

88

.94

5.9

95

.99

7E

qu

alin

itia

lst

ate,P

=6

,an

dT

=4

smal

l.1

93

.23

2.2

67

.26

4.4

03

.44

7.2

47

.48

6.5

56

n=

300

med

ium

.52

0.6

86

.77

9.7

57

.92

7.9

44

.76

4.9

70

.98

4la

rge

.97

1.9

99

1.0

0.9

99

1.0

01

.00

.99

91

.00

1.0

0U

neq

ual

init

ial

stat

e,P

=6

,an

dT

=2

smal

l.0

58

.09

5.1

13

.09

1.1

36

.14

9.0

87

.14

8.1

63

n=

300

med

ium

.10

8.2

03

.30

2.1

80

.35

8.4

30

.21

9.4

50

.58

6la

rge

.26

9.6

06

.78

7.5

75

.89

8.9

52

.68

3.9

54

.98

4

No

te.

Th

ep

ower

valu

esre

por

ted

inth

ista

ble

are

com

pu

ted

usi

ng

the

larg

ed

ata

set

met

ho

d.P

andT

den

ote

the

nu

mb

ero

fre

spo

nse

vari

able

san

dm

easu

rem

ent

occ

asio

ns,

resp

ecti

vely

.


Tab

le4.

3:T

he

pow

erof

the

likel

iho

od

-rat

iote

stto

reje

ctth

en

ull

hyp

oth

esis

that

the

cova

riat

eh

asn

oeff

ect

onth

etr

ansi

tion

prob

abili

ties

πr|r

=0.9

πr|r

=0.8

πr|r

=0.7

effec

tst

ate-

resp

on

sest

ate-

resp

on

sest

ate-

resp

on

sesi

zeas

soci

atio

nas

soci

atio

nas

soci

atio

nw

eak

med

ium

stro

ng

wea

km

ediu

mst

ron

gw

eak

med

ium

stro

ng

Eq

ual

init

ial

stat

e,P

=6

,an

dT

=2

smal

l.0

85

.12

7.1

87

.11

3.2

39

.31

6.1

46

.30

7.3

95

n=

300

med

ium

.18

0.4

28

.58

6.3

52

.71

9.8

74

.48

2.8

60

.94

9la

rge

.58

7.9

61

.99

6.8

89

.99

91

.00

.96

61

.00

1.0

0

smal

l.1

09

.18

5.2

89

.15

9.3

76

.49

6.2

17

.48

0.6

06

n=

500

med

ium

.27

6.6

49

.81

7.5

47

.91

6.9

83

.71

2.9

79

.99

7la

rge

.81

8.9

98

1.0

0.9

87

1.0

01

.00

.99

91

.00

1.0

0

smal

l.1

62

.30

4.4

85

.25

7.6

16

.76

2.3

63

.74

6.8

63

n=

900

med

ium

.46

5.8

94

.97

5.8

13

.99

51

.00

.93

21

.00

1.0

0la

rge

.97

51

.00

1.0

01

.00

1.0

01

.00

1.0

01

.00

1.0

0E

qu

alin

itia

lst

ate,P

=6

,an

dT

=4

smal

l.2

24

.42

1.5

23

.32

8.5

84

.75

1.3

96

.74

8.8

60

n=

300

med

ium

.62

3.9

42

.98

3.8

93

.99

71

.00

.95

11

.00

1.0

0la

rge

.99

81

.00

1.0

01

.00

1.0

01

.00

1.0

01

.00

1.0

0E

qu

alin

itia

lst

ate,P

=10

,an

dT

=2

smal

l.1

39

.17

8.1

93

.18

0.2

77

.32

7.2

23

.38

8.4

18

n=

300

med

ium

.28

7.5

38

.63

1.5

62

.85

0.8

85

.68

5.9

28

.94

9la

rge

.85

3.9

92

.99

7.9

90

1.0

01

.00

.99

91

.00

1.0

0U

neq

ual

init

ial

stat

e,T

=2

,an

dP

=6

smal

l.0

78

.13

5.1

87

.12

4.1

98

.31

1.1

63

.28

6.4

03

n=

300

med

ium

.17

1.4

46

.59

8.3

80

.71

9.8

65

.53

9.8

70

.94

7la

rge

.60

5.9

68

.99

8.9

12

1.0

01

.00

.98

31

.00

1.0

0

No

te.

Th

ep

ower

valu

esre

por

ted

inth

isT

able

are

com

pu

ted

usi

ng

the

larg

ed

ata

set

met

ho

d.

84 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERST

able

4.4:

Th

ep

ower

ofth

elik

elih

oo

d-r

atio

test

tore

ject

the

nu

llh

ypot

hes

isπ2|1

=0

n=

100

n=

200

n=

300

stat

e-re

spo

nse

stat

e-re

spo

nse

stat

e-re

spo

nse

asso

ciat

ion

asso

ciat

ion

asso

ciat

ion

π2|1

wea

km

ediu

mst

ron

gw

eak

med

ium

stro

ng

wea

km

ediu

mst

ron

g

π2|2

=0.7

.05

.18

2.6

40

.90

6.2

63

.81

0.9

87

.34

4.9

17

.99

9.1

.38

1.8

77

.99

1.5

44

.97

81

.00

.71

0.9

97

1.0

0.2

.53

3.9

68

1.0

0.7

95

.99

81

.00

.99

91

.00

1.0

0π2|2

=0.9

.05

.18

1.5

97

.89

4.2

62

.80

6.9

87

.33

0.9

10

.99

9.1

.36

2.8

72

.99

2.5

54

.97

91

.00

.68

5.9

96

1.0

0.2

.53

5.9

57

.99

9.7

58

.99

91

.00

0.8

87

1.0

01

.00

No

te.

Th

ep

ower

valu

esre

por

ted

inth

ista

ble

are

for

the

sim

ula

tio

nco

nd

itio

ns

invo

lvin

g6

resp

on

seva

riab

les,

2m

easu

rem

ent

occ

asio

ns,

and

equ

alin

itia

lst

ate

pro

por

tio

ns

ina

2-s

tate

late

nt

Mar

kov

mo

del

;p

ower

com

pu

tati

on

for

the

no

n-s

tan

dar

dca

se.

4.6. NUMERICAL STUDY 85T

able

4.5:

Eva

luat

ing

the

qu

alit

yof

the

larg

ed

ata

set

met

ho

dfo

rlik

elih

oo

d-r

atio

pow

erco

mp

uta

tion

πr|r

=0.9

πr|r

=0.8

πr|r

=0.7

effec

tth

eore

tica

lst

ate-

resp

on

sest

ate-

resp

on

sest

ate-

resp

on

sesi

zeve

rsu

sas

soci

atio

nas

soci

atio

nas

soci

atio

nem

pir

ical

wea

km

ediu

mst

ron

gw

eak

med

ium

stro

ng

wea

km

ediu

mst

ron

g

smal

lth

eore

tica

l.0

93

.12

5.1

64

.12

8.2

06

.26

2.1

56

.29

9.3

41

emp

iric

al.0

77

.13

4.1

79

.12

6.2

17

.25

6.1

51

.29

3.3

41

med

ium

theo

reti

cal

.16

2.4

25

.46

6.3

35

.65

1.7

50

.42

7.7

98

.83

5em

pir

ical

.17

5.3

86

.47

4.3

68

.65

8.7

50

.43

8.7

96

.84

6

larg

eth

eore

tica

l.5

92

.90

3.9

80

.88

1.9

97

.99

9.9

49

1.0

01

.00

emp

iric

al.5

44

6.9

07

.97

5.8

75

.99

7.9

99

.93

91

.00

1.0

0

No

te.

Th

eth

eore

tica

lp

ower

valu

esar

eco

mp

ute

dby

assu

min

ga

cen

tral

chi-

squ

are

un

der

the

nu

llan

dn

on

-cen

tral

chi-

squ

are

un

der

the

alte

rnat

ive

hyp

oth

eses

,fo

rw

hic

hth

en

on

-cen

tral

ity

par

amet

eris

appr

oxim

ated

byu

sin

ga

larg

ed

ata

set.

Wh

erea

s,fo

rth

eem

pir

ical

case

,th

ep

ower

isco

mp

ued

bysi

mu

lati

ng

the

dis

trib

uti

on

of

the

test

stat

isti

cu

nd

erth

eal

tern

ativ

eh

ypo

thes

is.

Th

ep

ower

valu

esre

por

ted

inth

ista

ble

are

for

the

des

ign

con

dit

ion

sin

volv

ing

6re

spo

nse

vari

able

s,tw

om

easu

rem

ent

occ

asio

ns,

and

equ

alin

itia

lst

ate

pro

por

tio

ns

in2

-sta

tela

ten

tM

arko

vm

od

el.



This chapter addressed power computation methods for testing hypotheses about

transition parameters of LM models, which are the transition probabilities themselves

in the basic LM model and the logistic regression coefficients in the LM model with

covariate(s). We showed how the hypotheses of main interest can be specified by imposing

equality constraints across parameters or fixing parameter(s) to some user defined value(s).

We distinguished power computation for the standard case and power computation for

the non-standard case, where the latter arises when probabilities are fixed to zero.

For the standard case, in which the likelihood-ratio statistic follows an asymptotic chi-

square distribution, two power computation approaches were discussed. The first consists

of approximating the distribution under the alternative hypothesis for a given sample size

n using MC simulation (referred to as MC1). The second approach involves estimating

the non-centrality parameter using either an exemplary data set or large simulated data

set and subsequently obtaining the power for any sample size n from the non-central chi-

square distribution. The advantage of the second approach is that it is computationally

cheaper. However, when we have doubts that the distribution of the LR test statistic under

the alternative is non-central chi-square, the MC1 simulation approach is the preferred

option. The MC1 simulation approach can also be applied when the distribution under

the null is known but the distribution under the alternative is unknown. We will come

back to this issue when discussing topics for future research.

The non-standard case occurs when the likelihood-ratio does not follow a standard

chi-square distribution. The most obvious example for this is when a parameter is fixed

to the boundary of the parameter space, which equals zero or one for probabilities. In

such situations, power computation by MC simulation is applicable (referred to as MC01).

We use the MC01 method to compute both the critical value under the null hypothesis

and the power under the alternative hypothesis given this critical value. Note that this

procedure is similar to the MC1 simulation approach discussed for the standard case, with

the only difference that the theoretical distribution under the null hypothesis is replaced


by its empirical counterpart.

In our numerical study, we saw that the power to detect large effects can be small

even with a not very small sample of say 500 observations. Based on the results of the

numerical study, we therefore strongly recommend researchers who apply LR tests in LM

models to perform a power analysis prior to data collection. Our findings indicate several

important issues that should be taken into account. Firstly, in addition to the usual design

factors (i.e., effect size, sample size, and significance level α), a power analysis for LM

models should also involve various other design factors, namely, the number of time points,

the number of response variables, the strength of association between latent states and

response variables, the number of states, the initial state probabilities, and the transition

probabilities. Secondly, for a given effect size, a desired level of power can be achieved by

increasing the number of measurement occasions, by increasing the number of response

variables, or by using response variables that have strong associations with the latent

states. Moreover, situations in which the transition probabilities are small need special

care, since power may be low in such situations. Thirdly, when the association between

states and response variables is weak or the effect size is small, a reasonable power level

can be achieved at the expense of gathering more data, that is, by increasing the sample

size or the number of measurement occasions. In the scenarios we studied, increasing

the number of measurement occasions was more efficient than increasing the sample size.

This is probably connected to the fact that we looked at hypotheses for the transition

probabilities; that is, with more measurement occasions one has more information on

the transition probabilities. When testing hypotheses on the initial state or the response

probabilities, increasing the sample size is probably more effective.

In the MC-based power computation, the accuracy of the estimated power depends

strongly on the number of replications used. This is especially the case in the MC01

method used in the non-standard case, in which not only the power but also the

critical value under H0 was estimated by MC simulation. In our study, we used 5000

MC replications, which seemed to be large enough for the hypotheses we investigated.


However, the required number of MC replications may depend on the type of hypothesis

and the model complexity, hence, further research might explore the required number of

MC replications for LR tests in LM models.

While for the non-standard case we proposed approximating the distribution of the

LR statistic under H0 by simulation, its asymptotic distribution has been shown to be

chi-bar square (Bartolucci, 2006). This means one may also obtain the critical value

under H0 from the chi-bar square distribution, which for multiple parameter hypotheses

also requires performing some kind of MC simulation. However, power computation using

an asymptotic approach requires the distribution under the alternative as well. This is

a problem that has not yet been resolved. Another possible area for future research is

investigating whether this distribution is, for instance, a certain type of non-central chi-bar

square distribution (Shapiro, 1988).

Other possible areas for further research concern the application of the proposed power

computation methods with other types of hypotheses relevant for LM modeling. It seems

that both the standard and non-standard case methods can be directly transferred to

hypotheses about other parameters of the LM model, namely, the initial probabilities and

the conditional response probabilities. The MC method proposed for the non-standard

case may also be applicable in hypotheses tests concerning the number of latent states,

that is, when comparing models with C- and C + 1- states.

For the power computation, we estimate the (incorrect) model under H0 for data sets

generated under H1. That is, the measurement parameters are not fixed, but estimated

under this incorrect model. Using such a procedure, when state separation is strong,

estimating the parameters of the model with the transition probability constrained to

zero can be problematic: the measurement parameters are down-estimated. When the

bias in the measurement parameter cannot compensate (in terms of the log-likelihood

value) for the misspecification of the transition model, this may lead to overestimation of

power. Future research investigating parameter estimation with constraints on transition

parameters would be interesting.

4.8. APPENDIX 89

For simulation conditions in which the latent states are highly separated, when the

true underlying model generates some fraction of cross-over observations, the likelihood of

the restricted model decreases, because each impossible transition, say, a transition from

state 1 to state 2, needs to be accommodated by assigning the observed pattern from

state 1 to state 2, or vice versa. The decrease in the likelihood of the restricted model will

be accompanied by biased parameter estimates: the estimates of the conditional response

probabilities in the 2 states will be biased to be closer to each other, to increase the

likelihood of observing a state 1 pattern given state 2 or a typical state 2 pattern given

state 1. This could also result in overestimation of the LR power for rejecting the null

hypothesis that the transition probability is zero.

4.8 Appendix

4.8.1 Latent GOLD syntax for power computation

This appendix illustrates the application of the proposed power computation methods

using the Latent GOLD program. As an example, we use a 2-state LM model with six

binary response variables (y1 through y6). The H1 population model contains unequal

transition probabilities, and we test the H0 model with equal transition probabilities

against the H1 model.

In order to perform a power computation, one should first define a data file indicating

the time structure and the variables in the model. With T = 4 and p = 6, this file could

be of the form

id time y1 y2 y3 y4 y5 y6 n100000 n300

1 1 0 0 0 0 0 0 100000 300

1 2 0 0 0 0 0 0 100000 300

1 3 0 0 0 0 0 0 100000 300

1 4 0 0 0 0 0 0 100000 300

This data file contains 4 records (one for each measurement occasion) which are connected


by an identifier variable, arbitrary values for the response variables, and variables indicating

sample sizes to be used later on.

A latent GOLD syntax model consists of three sections:“options”, “variables”, and

“equations”. The relevant LM model is defined as follows

// basic model

options

output parameters=first standarderrors profile;

variables

caseid id;

dependent y1 nominal 2, y2 nominal 2, y3 nominal 2,

y4 nominal 2,y5 nominal 2,y6 nominal 2;

latent State nominal dynamic 2;

equations

State[=0] <- 1;

State2 <- (beta~tra) 1 | State[-1];

y1-y6 <- 1 | State;

The “output” option indicates that we wish to use dummy coding for the logit parameters

with the first category as the reference category. Subsequently, we define the variables

which are part of the model. Note that the latent variable “State” is specified to be

dynamic, which yields a latent variable which changes its value across measurement

occasions.

The three equations represent the logit equations for the initial state, the transitions,

and the measurement part of the model, respectively. Note that “1” indicates an intercept,

and “|” that the intercept depends on the variable concerned. A special type of coding

(called transition coding and indicated with ”˜ tra” ) is used for the logit parameters of

the transition model and, moreover, a label (beta) is specified for these parameters. This

label will be used below to impose restrictions.

4.8. APPENDIX 91

a). Standard case

Option 1. Implementation of the Monte Carlo based power computation method (MC1)

involves defining the H0 and H1 model in a single input file. Denoting the parts which

remain the same as in basic model defined above by “...”, the H1 model may equal:

// H1 model for MC-based method

...

equations

...

{0.000000000

-0.54729786 -1.14729786

1.386294361 -1.386294361

1.386294361 -1.386294361

1.386294361 -1.386294361

1.386294361 -1.386294361

1.386294361 -1.386294361

1.386294361 -1.386294361}

The numbers shown inside the curly bracket represent the values for the logit

parameters in the H1 model. The first row contains the initial state parameter(s),

the second row the transition parameters, and the remaining rows the measurement

parameters.

The H0 model in which the transitions are restricted to be equal is defined as follows:

// H0 model for MC-based method

options

...

montecarlo replicates=1000 power=’H1’ N=300 alpha=0.05;

variables

...


equations

...

beta[1,1] = beta[2,1];

As can be seen, the equality restriction on the transition logits is defined at the end

of the equations section. What is important to note is that the H0 model should

contain the “montecarlo” option indicating the number of Monte Carlo replications, the

“power” command with the name of the H1 model, the sample size “N”, and the level of

significance “alpha”. Running the H0 model will yield the power for LR test comparing

the two models.

Option 2. When using the large data set method, one should first simulate a large data

set from the population defined by H1 and subsequently analyze this data set using both

the H0 and H1 model. Simulating the large data set is done as follows:

// H1 model for simulating a large data file

options

...

outfile ’sim.dat’ simulation;

variables

...

caseweight n100000;

equations

...

{...}

Compared to the basic specification, we use the “outfile” option to indicate that a data

file should be simulated, use the “caseweight” to indicate the size of the large data set

(here 100000), and specify the parameter values of the population model.

To obtain the power we analyze the large data set with an input file containing both

the H0 and H1 model. The H1 model equals:

4.8. APPENDIX 93

// H1 model for large data based power computation method

options

...

output LLdiff=’H0’ LLdiffPower=300;

...

That is, we indicate that a log-likelihood difference test should be performed (“LLdiff”)

and that the power of this test should be computed for the specified sample size

(“LdiffPower”). We also define the H0 model itself, which again is the basic LM model

with the constraint “beta[1,1] = beta[2,1]”.

Option 3. Power computation using the exemplary data method is similar to the large

data method. First an exemplary data file which is exactly in agreement with the H1

model is created, and subsequently this data file is analyzed with both the null and the

alternative model. That is, first create an exemplary data file as

// H1 model for creating the exemplary data file

options

...

output WriteExemplaryData=’exemplary.dat’;

variables

...

equations

...

{...}

Next compute power using the created exemplary data file, by specifying the H0 and

H1 model in the same way as the power computation using the simulated large data file

method. The only difference, when compared with the simulated large data file method

discussed above, is that the case weight of the exemplary data file has to be specified in

both the H0 and H1 model. This requires adding the line “caseweight frequency;” to the


“variables” section.

b). Non-standard case

We will illustrate MC-based power computation for the non-standard case using an

example in which one of the transition probabilities is fixed to 0, implying that the

transition logit concerned is fixed to a large negative value (say -100). Power computation

in the non-standard case proceeds in two steps. First, we obtain the critical values under

H0 by simulation and subsequently we obtain the power given this critical value.

To obtain the critical value, we define the H0 and H1 model in the same input file.

In the H1 model, we use the “MCstudy” option and specify the number of Monte Carlo

replications, the H0 model, the sample size, and level of significance “alpha”, that is,

// H1 model for obtaining CV by simulation

options

....

montecarlo replicates=5000 MCstudy=’H0’ N=300 alpha=0.05;

...

The H0 model contains the population values for the free parameters as well as the

constraint. That is,

// H0 model for obtaining CV by simulation

...

equations

...

beta[1,1] = -100;

{...}

Running the H1 model gives us the critical value (CV).

In the final step, we obtain power by running the H1 and H0 models; that is, define

the H0 and H1 models in a single input file as

4.8. APPENDIX 95

// H0 model for obtaining power by simulation

options

...

montecarlo replicates=5000 power=’H1’ N=300 CV=2.2344;

variables

...

equations

...

b[1,1]=-100;

The H1 model is again equal to the basic model with the population values for the

parameters. Running the H0 model will give us the power for the specified sample size N

and the estimated critical value CV, which we set here to N = 300 and CV = 2.2344.

CHAPTER 5

Power Analysis for the Likelihood-Ratio Test in Latent Markov

Models: Short-cutting the Bootstrap p-value Based Method

Abstract

The latent Markov (LM) model is a popular method for identifying distinct unobserved

states and transitions between these states over time in longitudinally observed responses.

The bootstrap likelihood-ratio (BLR) test yields the most rigorous test for determining

the number of latent states, yet little is known about power analysis for this test. Power

could be computed as the proportion of the bootstrap p-values (PBP) for which the null

hypothesis is rejected. This requires performing the full bootstrap procedure for a large

number of samples generated from the model under the alternative hypothesis, which is

computationally infeasible in most situations. This chapter presents a computationally

This chapter has been submitted for publication.

97

98 CHAPTER 5. POWER FOR THE BLR TEST IN LM MODELS

feasible short-cut method for power computation for the BLR test. The short-cut method

involves the following simple steps: 1) obtaining the parameters of the model under the

null hypothesis, 2) constructing the empirical distributions of the likelihood-ratio under the

null and alternative hypotheses via Monte Carlo simulations, and 3) using these empirical

distributions to compute the power. We evaluate the performance of the short-cut method

by comparing it to the PBP method, and moreover show how the short-cut method can

be used for sample size determination.


5.1 Introduction

In recent years, the latent Markov (LM) model has proven useful to identify distinct

underlying states and the transitions over time between these states in longitudinally

observed responses. In LM models, as in latent class models, or more generally in finite

mixture models, the observed responses are governed by a set of discrete underlying

categories, which are named states, classes, or mixture components. Moreover, the LM

model allows transitions between these states from one time-point to another, that is,

the state membership of respondents can change during the period of observation. The

LM model finds its application, for example, in educational sciences to study how the

interests of students in certain subjects changes over time (Vermunt et al., 1999), and in

medical sciences to study the change in health behavior of patients suffering from certain

diseases (Bartolucci et al., 2010). Various examples of applications in social, behavioral,

and health sciences are presented in the textbooks by Bartolucci, Farcomeni, and Pennoni

(2013) and Collins and Lanza (2010).

In most research situations, including those just mentioned, the number of states is

unknown and must be inferred from the data itself. The bootstrap likelihood-ratio (BLR)

test, proposed by McLachlan (1987) and extended by Feng and McCulloch (1996) and

Nylund et al. (2007), is often used to test hypotheses about the number of mixture

components. These previous studies focused on p-value computation, rather than on

power computation for the BLR test, which is the topic of the current study.

The assessment of the power of a test, that is, the probability that the test will

correctly reject the null hypothesis when indeed the alternative hypothesis is true, is

important at several stages of a research study. At the planning stage, an a priori power

analysis is useful for determining the data requirements of the study: e.g., the sample

size or number of time points at which measurement takes place. In general, the smaller

the sample size, the less power we have to reject the null hypothesis when it is false.

Therefore, too small a sample size may result in an under-extraction of the number of

states (see for example, Nylund et al. (2007) and C. Yang (2006)). This not only misleads


the conclusion about the number of states but also the interpretation of the state specific

parameters. Moreover, when the sample size is too small, the parameter estimates are

prone to be unstable and inaccurate estimates (Marsh, Hau, Balla, & Grayson, 1998).

Performing an a priori power analysis helps to determine the smallest necessary sample

size required to achieve a certain power level, usually a power level of .8 or larger, thereby

allowing the researcher to avoid excessively large, uneconomical sample sizes. Therefore,

when applying for a research grant, the funding agency may ask to justify the number

of subjects to be enrolled for the study through a power analysis. At the analysis stage,

a post hoc assessment of the power achieved given the specific design scenario and the

parameter values obtained should aid the interpretation of the study results. Therefore,

in order to assure confidence in the study results (or conclusions), journal editors often

ask to report the power.

Power computation is straightforward if under certain regularity conditions the

theoretical distributions of the test statistic under the null and the alternative hypothesis

are known. This is not the case for the BLR test in LM models. The power of a statistical

test can be computed as the proportion of the p-values (stemming from multiple data-

sets that were simulated given the alternative hypothesis), for which the null hypothesis is

rejected. When using the BLR statistic to test for the number of states in LM models, such

a power calculation becomes computationally expensive, because it requires performing

the bootstrap p-value computation for multiple sets of data. As explained in detail below,

it requires generating M data sets from the model under the alternative hypothesis,

and for each data set, estimating the models under the null and alternative hypotheses

to obtain the LR value. Whether the null hypothesis will be rejected for a particular

generated data set is determined by computing the bootstrap p-value, which in turn

requires (a) generating B data sets from the model estimates under the null hypothesis

and (b) estimating the models under the null and alternative hypotheses using these

B data sets. Hereafter, we refer to this computationally demanding procedure, which

involves calculating the power as the proportion of the bootstrap p-value for which the


model under the null hypothesis is rejected, as the PBP method.

Because using the PBP method is infeasible in most situations, we propose an

alternative method which we refer to as the short-cut method. Computing the power using

the short-cut method involves constructing the empirical distributions of the LR under

both the null and alternative hypotheses. We show how the “population” parameters of

the model under the null hypothesis can be obtained based on a certain large data set,

and these parameters will in turn be used in the process to obtain the distribution of the

LR statistic under the null hypothesis. As explained in detail below, the distribution of the

LR under the null hypothesis is used to obtain the critical value, given a predetermined

level of significance. Given this critical value, we compute the power by simulating the

distribution of the LR under the alternative hypothesis. Using numerical experiments, we

examine the data requirements (e.g., the sample size, the number of time points, and the

number of response variables) that yield reasonable levels of power for given population

characteristics.

The remaining part of the paper is organized as follows. First, we describe the LM

model and the BLR test for determining the number of states. Second, we provide power

computation methods for the BLR test and discuss how these methods can be applied

to determine the required sample size. Third, numerical experiments that illustrate the

proposed methods of power and sample size computation are presented. Finally, we

provide a concluding discussion on the main results of our study.

5.2 The LM model

Let Yt = (Yt1, Yt2, Yt3, ...YtP ) for t = 1, 2, 3, ..., T be the P -dimensional response variable

of interest at time point t. Denoting the latent variable at time point t by Xt, in a LM

model the relationships among the latent and observed response variables at the different

time points can be represented by using the following simple path diagram.


X1 X2 XT

Y1 Y2 YT

...

...

An LM model is a probabilistic model defining the relationships between the time-

specific latent variables Xt (e.g., between X1, X2, and X3) and the relationships between

the latent variables Xt and the time-specific vectors of observed responses Yt (e.g., X1

with Y1). In the basic LM model, the latent variables are assumed to follow a first-order

Markov process (i.e., the state membership at t+1 depends only on the state occupied

at time point t), and to the response variables are assumed to be locally independent

given the latent states. Based on these assumptions, we define the S-state LM model as

a mixture density of the form

p(yi,Φ) =

S∑x1=1

S∑x2=1

S∑x3=1

...

S∑xT=1

p(x1)

T∏t=2

p(xt|xt−1)

P∏j=1

p(ytji|xt),

where yi denotes the vector of responses for subject i over all the time points, ytji the

response of subject i to the j-th variable measured at time point t, xt a particular latent

state at time point t, and Φ the vector of model parameters (Bartolucci et al., 2013;

Vermunt et al., 1999).

The LM model has three sets of parameters:

1. The initial state probabilities (or proportions) p(X1 = s) = πs satisfying∑Ss=1 πs =

1. That is, the probability of being in state s at the first time point;

2. The transition probabilities p(Xt = s|Xt−1 = r) = πts|r satisfying∑Ss=1 π

ts|r = 1.

These transition probabilities indicate the probabilities of remaining in a state or

switching to another state, conditional on the state membership at the previous time


point. All transition probabilities are conveniently collected in a transition matrix,

in which the entry in row r and column s represents the probability of a transition

from state r at time point (t− 1) to state s at time point t;

3. The state-specific parameters of the density function p(ytji|xt), which govern the

association between the latent states and the observed response variables. The

choice of the specific density form for p(ytji|xt), which depends on the scale type

of the response variable, determines the state-specific parameters for this density

function. With continuous responses, one may, for example, define the state-specific

density to be a normal distribution, for which the parameters are the mean µtj|s

and the variance σ2t

j|s. With dichotomous and nominal responses, the multinomial

distribution is assumed, for which the parameters become the conditional response

probabilities p(ytji|xt = s) = θtj|s. The state-specific parameters and the transition

probabilities may vary across time, hence the subscript t, but are assumed to be

time-homogeneous during the remainder of this chapter.

Given a sample of size n, the parameters are typically estimated by maximizing the

log-likelihood function:

l(Φ) =

n∑i=1

log p(yi,Φ). (5.1)

The search for the values of Φ that maximize the log-likelihood function in equation (5.1)

can be carried out with the Expectation-Maximization (EM) algorithm (Dempster, Laird,

& Rubin, 1977; McLachlan & Krishnan, 2007), which alternates between computing

the expected complete data log-likelihood function (E step) and updating the unknown

parameters of interest by maximizing this function (M step). For LM models, a special

version of the EM algorithm with a computationally more efficient implementation of the

E step may be used. This algorithm is referred to as the Baum-Welch or forward-backward

algorithm (Bartolucci et al., 2010; Baum, Petrie, Soules, & Weiss, 1970; Vermunt, Tran,

& Magidson, 2008).

As already discussed in the introduction section, identifying of the number of latent


states is a common goal in LM modeling, and typically the first step in the analysis. Testing

hypotheses about the number of states involves estimating LM models with increasing

numbers of states and checking whether the model fit is significantly improved by adding

one or more states. More formally, the hypotheses about the number of states may be

specified as H0 : S = r versus H1 : S = s, where r < s. Usually, the r-and s-state model

differ by one state. For example, the test for H1 : 3-state LM model against H0 : 2-state

LM model. However, in principle, the comparison can also be between the 3-state and

the 1-state LM model. In this paper, we restrict ourselves to the situation in which r =

s− 1.

The LR statistic for this type of test is defined as

LR = 2(l(Φs)− l(Φr)), (5.2)

where l(·) is the log-likelihood function and Φs and Φr are the maximum likelihood

estimates under the alternative and null hypothesis, respectively. In the standard case,

under certain regularity conditions, it is generally assumed that the LR statistic in equation

(5.2) follows a central chi-square under the null hypothesis and a non-central chi-square

distribution under the alternative hypothesis (Steiger, Shapiro, & Browne, 1985). In such

a case, one may use the (theoretical) chi-square distribution with the appropriate number

of degrees of freedom to compute the p-value of the LR test given a predetermined level

of significance α or the power of the LR test given the population characteristics of H1

model. These asymptotic distributions however do not apply when using the LR statistic

for testing the number of latent states (Aitkin, Anderson, & Hinde, 1981).

One may however apply the method of parametric bootstrapping to construct the

empirical distribution of the LR, and subsequently use the contructed empirical distribution

for p-value computation. Due to advances in computing facilities, this can be applied

readily. Using parametric bootstrapping, the empirical distribution of the LR statistic

under the null hypothesis is constructed by generating B independent (bootstrap) samples

according to a parametric (probability) model p(y, Φr), where Φr itself is an estimate

5.3. POWER ANALYSIS FOR THE BLR TEST 105

computed based on a sample of size n (Feng & McCulloch, 1996; McLachlan, 1987;

Nylund et al., 2007). Denoting the bootstrap samples by yb (for b = 1, 2, 3, ...B),

equation (5.2) becomes

BLRb = 2(l(Φbs)− l(Φbr)), (5.3)

where BLRb denotes the BLR, computed for (bootstrap) sample yb.

So, sampling B data sets from the r-state LM model defined by p(y, Φr) and

computing the BLR statistic as shown in equation (5.3) for each of these data sets, yields

the BLR distribution under the null hypothesis. This distribution is then employed in the

bootstrap p-value computation. In short, the bootstrap p-value computation proceeds as

follows:

Step 1. Treating the ML parameter estimates as if they were the ”true” parameter values

for the r-state LM model, generate B independent (bootstrap) samples from the r-state

LM model.

Step 2. Compute the BLRb values as shown in equation (5.3), which requires us to fit

the r- and s-state models using the bootstrap samples generated in Step 1.

Step 3. Compute the bootstrap p-value as p = 1B

∑Bb=1 I(BLRb > LR), where I(·) is

the indicator function which takes on the value 1 if the argument BLRb > LR holds and

0 otherwise. The decision concerning whether the r-state LM model should be retained

or rejected in favor of the s-state model is then determined by comparing this p-value

with the predetermined significance level α.

5.3 Power analysis for the BLR test

As mentioned, two common goals of power analysis are (a) to determine the post hoc

power of a study (i.e., given a certain samples size, number of time points, and number of

response variables) and (b) to a priori determine the sample size (or other design factors

like the number of time points or the number of response variables) required to achieve a


certain power level. In both cases, we assume that the population parameters are known

(in a priori analyses a range of expected parameter values may be used) and other factors

are fixed. In what follows, we first show how the bootstrapping procedure discussed

above can used for power computation, and subsequently present the computationally

more efficient short-cut method for power and sample size computation in LM models.

5.3.1 Power computation

In this sub-section, we present two alternative methods for computing the power of the

BLR test. The first option, the PBP method, involves computing the power as the

proportion of the bootstrap p-values (PBP) for which H0 is rejected. More specifically,

the PBP method for power computation involves the following steps:

Step 1. Generate M independent samples, each of size n, from the true H1 model.

Step 2. For each samples m in Step 1, compute the likelihood-ratio LRm as shown in

equation (5.2).

Step 3. Obtain the bootstrap p-value of each sample m as pm = 1B

∑Bb=1 I(BLRbm >

LRm), where LRm is the LR of sample m from the H1 population, BLRbm is the

corresponding BLR for bootstrap sample b, and I(·) is the indicator function as defined

above.

Step 4 The actual power associated with a sample of size n is computed as the proportion

of the H1 data sets in which H0 is rejected. That is,

PBP =1

M

M∑m=1

I(pm < α), (5.4)

where the indicator function I(·) and α are as defined above.

As mentioned above, such a method of power computation is computationally

expensive and requires considerable amount of computer memory. For example, setting

M = 500 and B = 99 requires us to store and analyze M(B + 1) = 50000 data sets.

Also, in order to achieve a good approximation to the sampling distribution, which, if not


well approximated, could affect the p-value (and subsequently the power), both M and

B should be large enough.

For LM models, for which model fitting requires iterative procedures, power computation

by using the PBP method is computationally too intensive in practice. We propose a

computationally more efficient method, which we call the shortcut method. It works

very much as the standard power computation (see for example, Brown et al. (1999),

Satorra and Saris (1985), and Self et al. (1992)), with the difference that we construct

the distributions under H0 and H1 by Monte Carlo simulation. As explained below, the

distribution under H0 is used to obtain the critical value (CV), and the distribution under

H1 is used to compute the power given the CV.

First, the H0 “population” parameters needed to compute the CV should be obtained.

This can be achieved by creating an exemplary data set, which is a data file with all possible

response patterns and the population proportion under H1 as weights (O’Brien, 1986;

Self et al., 1992). Because in LM models with more than a few indicators and/or time

points, the number of possible response pattern is very large, this method cannot always

be applied. Therefore, as an alternative, using the parameter values of the H1 model,

we generate a large data set (e.g., 100000 observations), which is assumed to represent

the hypothetical H1 population. Estimating the H0 model (i.e., the r-state LM model)

using this large data set yields the pseudo parameter values for the r-state model. These

H0 parameters are then employed to construct the distribution of the LR under the null

hypothesis. That is, given the estimated parameters of the H0 model, generate K data

sets (each of size n) and for each of these data sets, compute the LR as shown in equation

(5.2). Next, order the LR values in such a way that LR[1] ≤ LR[2] ≤ LR[3] ≤ ... ≤ LR[K].

Given the nominal level α, compute the CV as

CV(1−α) = {LRk : p(LR > LR[k]|H0) = α}. (5.5)

Similarly, the distribution of the LR under the alternative hypothesis is constructed

using M samples of the H1 model. That is, given the parameters of the H1 model,


we generate M independent samples from the s-state LM model and for each of these

samples, compute the LR as shown in equation (5.2). For sufficiently large M , this yields

the true underlying sample distribution of the LR statistic under the alternative hypothesis.

The power is then computed from this empirical distribution as the probability that the

LR value exceeds the CV. That is,

power = p(LR > CV(1−α)|H1) =

∑Mm=1 I(LRm > CV(1−α))

M, (5.6)

where I(·) is the indicator function, indicating whether the LR value (computed based on

the b sample of the H1 population) exceeds the CV1−α value.

So both, the PBP and the short-cut method require M samples given H1 and the

calculation of the LR for each of these samples (i.e., steps 1 and 2 of the PBP power

calculation). The saving in computation time of the short-cut method lies in the omission

of the full bootstrap for each of the M samples from the H1 model. Rather, the LRs given

H1 are now evaluated against the approximated distribution of LRs given H0. Therefore,

compared to the PBP-based power computation, the number of data sets to be stored and

retrieved is much smaller when using the short-cut method. For example, for M = 500

and K = 500, we analyze M +K = 1000 data sets.

The short-cut method of power computation presented above can easily be implemented

using statistical software for LM analysis as outlined below.

1. Obtain the H0 population parameters: Given the parameters of the H1 model,

generate a large data (e.g., 10000 observations) from the H1 population. For this

purpose, any software that allows generating a sample from a LM model with fixed

parameter values can be used. For the numerical studies shown below, we used the

syntax module of the Latent GOLD 5.0 program (Vermunt & Magidson, 2013a).

Using this large data set, then estimate the parameters of the H0 model.

2. Compute the CV: Given the estimated parameters of the H0 model, generate K

data sets (each of size n) and for each of these data sets, compute the LR as shown


in equation (5.2). Note that this requires estimating both the r- and the s-state

model. For a sufficiently large K, the LR distribution approximates the population

distribution of the LR under the null hypothesis. We use this distribution to compute

the CV of the LR test as shown in equation (5.5).

3. Compute the power: Given the parameters of the H1 model, obtain the empirical

distribution of the LR. That is, generate M data sets from H1 model, and, using

these data sets, compute the LR as shown in (5.2). Given the CV and the empirical

distribution of the LR under H1, compute the power as shown in equation (5.6).

5.3.2 Sample size computation

In this section, we show how the procedure described above for power computation using

the short-cut method can be applied for sample size determination. For sample size

determination, step 1 of the power computation procedure (discussed under software

implementations) remains the same. The last two steps are however repeated for different

trial sample sizes. More specifically, suppose the investigator wishes to achieve a certain

pre-specified power level (say, power = .8 or larger) while avoiding the sample size to

become unnecessarily large. Then, the LR power computation is performed as outlined

in step 2 and 3, starting with a certain sample size n1. Below we provide power curves

that can be used as a guidance to locate this starting sample size. If the power obtained

based on these n1 observations is lower than .8, repeat step 2 and 3 by choosing n2

larger than n1. If the chosen n1 result in larger power instead (and we want to optimize

the sample size), choose n2 smaller than n1 and repeat step 2 and 3. In this way, the

power computation procedure is repeated for different trial samples of varying sizes, and

from these trial samples, the one that best approximates the desired power level is used

as the sample size for the study concerned. In our numerical study, we repeat this power

computation procedure for different sample sizes, which resulted in a series of power

values. By plotting these power values against the corresponding sample size, we obtain a

power curve from which one can easily determine the minimum sample size that satisfies


the power requirements, for example that the power should be larger than .8.

When designing a longitudinal study, it is also of interest to determine the number

of time points required to achieve a certain power level. For a fixed sample size, a fixed

number of response variables, and a priori specified H1 parameter values, the procedures

discussed above for sample size determination can be applied to the number of time

points determination as well. More specifically, in step 2 and 3 of the power computation

procedures, the number of time points T should be varied instead of the sample size n.

5.4 Numerical study

A numerical study is conducted to (a) illustrate the proposed power and sample size

computation methods, and (b) investigate whether the short-cut method and the PBP

method give similar results. This numerical study has an additional benefit for applied

researchers using the LM model: given the population characteristics, the resulting BLR

power tables and the power curves shown below may help to make an informed decision

about the data requirements in testing the number of states for the LM model. More

specifically, the results of this numerical study may be used as a guidance by applied

researchers to locate the initial trial sample size when computing the required sample size

to achieve a desired power level, as discussed in section 5.3.2.


The power of the BLR test for the number of states in LM models depends on several

design factors and population characteristics. See, for example, Gudicha et al., (2015)

who studied factors affecting the power in LM models. The design factors include the

sample size, the number of time points, and the number of response variables. The

number of latent states, and the various model parameter values (i.e., parameter values

for the initial state proportions, for the state transition probabilities, and for the state

specific densities) define the population characteristics.


In this numerical study, we varied both the design factors and the population

characteristics. The design factors varied were the sample size (n = 300, 500, or 700),

the number of time points (T = 3 or 5), and the number of response variables (P = 6 or

10). The population characteristics under the alternative hypothesis (i.e, the s-state LM

model for S = 3, or 4) were specified to meet varying levels of a) initial state proportions

(balanced, moderately imbalanced, highly imbalanced), b) stability of state membership

(stable, moderately stable, unstable), and c) state-response associations (weak, moderate,

strong) as follows.

In line with Dias (2006), the initial state proportions were specified using πs =

δs−1∑Sh=1 δ

h−1 . We set the values of δ to 1, 2, and 3, which correspond to balanced, moderately

imbalanced, highly imbalanced initial state proportions, respectively. For the transition

matrix, we used the specification suggested by Bacci et al. (2014), which under the

assumption of time homogeneity gives πs|r = ρ|s−r|∑Sh=1 ρ

|h−r| . Setting the values of ρ to

ρ = 0.1, 0.15, and 0.3 yields what we referred to above as stable, moderately stable,

and unstable state membership. In this numerical study, we restricted ourselves to the

situation that the response variables of interest are binary and that the state specific

conditional response probabilities are time-homogeneous. We set θj|1 to .75, .8 and .85,

θj|S to 1-.75, 1-.8, and 1-85, and for S = 3, θj|2 to .58, .65, and .7 which yields the

structure shown in Table 5.1. For S = 4, we used the same setting of conditional response

probabilities as for S = 3, but now defined the conditional response probabilities of the

remaining state as high (=θj|1) for half of the response variables and low (=θj|S) for the

other half.

Table 5.1: Values of conditional response probabilities

state-responses S=3 S=4association levels s = 1 s = 2 s = 3 s = 1 s = 2 s = 3 s = 4

Weak .75 .58 .25 .75 .58 .75 or.25 .25Moderate .80 .65 .20 .80 .65 .80 or .20 .20

Strong .85 .70 .15 .85 .70 .85 or.15 .15

The design factors and population characteristics were fully crossed resulting in 3


(sample size) × 2 (number of time points) ×2 (number of response variables)× 2 (number

of states) ×3 (initial state proportions) × 3 (transition probability matrices) × 3 (state-

response variables association levels) = 572 simulation conditions. For each simulation

condition, a large data set (of 100000 observations) was generated according to the H1

model and the H0 parameters were estimated using this data set. Next, for each simulation

condition, K = 1000 samples were generated according to the H0 parameters and the CV

was computed, assuming α = .05. Given a specified sample size, number of time points,

and the parameter values under the alternative hypothesis, the power was then computed

based on M = 1000 samples generated according to the H1 model as discussed in section

5.3. To minimize the problem of local maxima, we run all models using multiple starting

values.

5.4.2 Results

The results obtained from the numerical study for power computation by the short-cut

method are shown in Tables 5.2 and 5.3. Table 5.2 presents the power values for various

combinations of data and population characteristics. As expected, the power of the BLR

test increases with sample size, the number of time points, and the number of response

variables. Also, the more uniform (or balanced) the initial state proportions the larger

the power. Keeping the other design factors constant, the power of the BLR test in

general increases with stronger measurement conditions (i.e., weak to moderate to strong

state-response variable associations) and with more stable state membership transition

probabilities. Comparing the results in Table 5.2 with those in Table 5.3, holding the

other factors constant, the power of the BLR test to reject H0 : S = 2 in favour of

H1 : S = 3 is in general larger than for H0 : S = 3 against H1 : S = 4.

In the weak measurement condition and/or the highly imbalanced initial state

proportion condition, the power of the BLR test is in general very low, indicating that

very large sample sizes may be required to achieve an acceptable power level in these

conditions. Although the quality of state-response association plays a dominant role, the


power computed for the weak measurement condition improved substantially by increasing

the number of response variables or time points. Also, situations in which the state

membership is unstable (e.g., ρ = 0.3 or larger) need special care, since the power is low

in such situations.


Tab

le5.

2:P

ower

ofth

eB

LR

test

forH

0:S

=2

vers

usH

1:S

=3

Sta

te-r

esp

on

ses

asso

ciat

ion

sW

eak

Mo

der

ate

Str

on

gsa

mp

leIn

dex

of

stat

eIn

dex

of

stat

eIn

dex

of

stat

esi

zetr

ansi

tio

ntr

ansi

tio

ntr

ansi

tio

nρ=

0.1

ρ=

0.15

ρ=

0.3

ρ=

0.1

ρ=

0.15

ρ=

0.3

ρ=

0.1

ρ=

0.15

ρ=

0.3

30

0.1

88

.14

5.1

04

.30

1.2

6.1

76

.56

8.4

94

.33

9δ=

1,P

=6,T

=3

50

0.3

98

.30

1.1

78

.58

1.5

34

.29

4.8

69

.80

9.6

31

70

0.6

42

.43

9.2

38

.84

2.7

04

.40

5.9

78

.95

7.7

96

30

0.6

98

.55

9.2

28

.84

9.7

27

.39

4.9

72

.92

7.6

87

δ=

1,P

=6,T

=5

50

0.9

55

.86

8.4

16

.99

0.9

59

.72

61

.99

9.9

42

70

0.9

95

.97

.65

41

.99

9.8

87

11

.99

2

30

0.7

91

.64

6.4

02

.87

2.7

86

.55

1.9

87

.95

2.8

85

δ=

1,P

=10,T

=3

50

0.9

73

.94

1.7

02

.99

3.9

76

.86

61

1.9

93

70

01

.99

4.8

94

11

.97

41

11

30

0.1

47

.13

0.0

80

.24

7.1

97

.13

5.3

46

.30

8.2

49

δ=

2,P

=6,T

=3

50

0.2

9.2

10

.12

7.3

57

.35

1.2

44

.63

7.5

59

.45

77

00

.44

5.3

67

.19

3.5

94

.51

7.3

37

.80

1.7

63

.57

4

30

0.1

14

.07

5.0

73

.13

8.0

99

.09

0.1

71

.14

7.1

55

δ=

3,P

=6,T

=3

50

0.1

46

.11

2.1

04

.19

6.1

73

.13

1.3

07

.28

1.2

20

70

0.2

31

.18

6.1

24

.30

6.2

45

.19

5.5

15

.45

6.3

78

No

te.n

=sa

mp

lesi

ze,T

=n

um

ber

of

tim

ep

oin

ts,P

=n

um

ber

of

resp

on

seva

riab

les,δ=

init

ial

stat

epr

op

orti

on

ind

ex,

andρ

=st

ate

tran

siti

on

pro

bab

ility

ind

ex.

5.4. NUMERICAL STUDY 115T

able

5.3:

Th

ep

ower

ofth

eB

LR

test

for

test

ingH

0:S

=3

vers

usH

1:S

=4.

Sta

te-r

esp

on

ses

asso

ciat

ion

sW

eak

Mo

der

ate

Str

on

gn

um

ber

of

sam

ple

Ind

exo

fst

ate

Ind

exo

fst

ate

Ind

exo

fst

ate

tim

esi

zetr

ansi

tio

ntr

ansi

tio

ntr

ansi

tio

np

oin

tsρ=

0.1

ρ=

0.15

ρ=

0.3

ρ=

0.1

ρ=

0.15

ρ=

0.3

ρ=

0.1

ρ=

0.15

ρ=

0.3

30

0.1

21

.09

9.0

74

.17

0.1

20

.09

3.3

77

.30

07

.19

5T

=3

50

0.1

99

.15

8.1

22

.27

2.2

30

.17

1.6

43

.53

9.3

41

70

0.2

73

.21

8.1

51

.46

4.3

87

.23

3.8

11

.71

7.5

16

30

0.3

87

.23

7.1

47

.53

4.4

83

.21

2.8

72

.73

7.4

01

T=

55

00

.73

8.5

51

.21

4.8

82

.80

2.3

61

.99

4.9

62

.70

67

00

.91

9.7

36

.35

6.9

85

.91

8.5

72

11

.88

6

No

te.T

hes

ep

ower

valu

esar

ere

por

ted

for

the

sim

ula

tio

nco

nd

itio

nP

=6

andδ=

1.


Figures 5.1 and 5.2 present a power curve (as a function of sample size) for different

settings of the parameter values of the 3-state LM population model with equal initial

state proportions, 6 response variables, and 3 time points. Figure 5.1 shows that when the

state-response associations are weak, to achieve a power of .8 or larger, we may require a

sample of 1000 or more when state membership is stable, and a sample of 2000 or more

when state membership is unstable. We can also see from the same figure that when the

state-response associations are rather strong, the required sample sizes may drop to less

than 500 and 700, respectively for stable and unstable state membership conditions. As

can be seen from Figure 5.2, to achieve a power level of .8 when the state memberships

are moderately stable, sample sizes of at least 1200, 850, and 500, may be required in the

weak, medium, and strong measurement condition, respectively. For the situation when

the state memberships are unstable, such a power level is achieved by using a sample of

2000, 1300, and 700, respectively for weak, medium, and strong measurement conditions.

0.2

0.4

0.6

0.8

1.0

Weak state−indicators association

Sample Size

pow

er

200 300 500 700 1000 1500 2000

State transition

stablemoderately stableunstable

0.2

0.4

0.6

0.8

1.0

Moderate state−indicators association

Sample Size

pow

er

200 300 500 700 1000 1500 2000

0.2

0.4

0.6

0.8

1.0

Strong state−indicators association

Sample Size

pow

er

200 300 500 700 1000 1500 2000

Figure 5.1: Power by sample size for a 3-state LM population model with varying levelsof the measurement parameters, equal initial state proportions, 6 response variables, and3 time points

Table 5.4 shows a comparison of the short-cut method of BLR power computation

with the PBP method. As shown, the power values of the two methods are in general

comparable. Although the power values obtained by the short-cut method seem to be

slightly larger for some of the simulation conditions, overall differences do not lead to


0.2

0.4

0.6

0.8

1.0

stable transitions

Sample Size

pow

er

200 300 500 700 1000 1500 2000

Measurement

WeakModerateStrong

0.2

0.4

0.6

0.8

1.0

moderatly stable transitions

Sample Size

pow

er

200 300 500 700 1000 1500 2000

0.2

0.4

0.6

0.8

1.0

unstable transitions

Sample Size

pow

er

200 300 500 700 1000 1500 2000

Figure 5.2: Power by sample size for a 3-state LM population model with varying levelsof the transition parameters, equal initial state proportions, 6 response variables, and 3time points

different conclusions regarding the hypotheses about the number of states.

Table 5.4: Power of the BLR test according to the short-cut and the PBP method forseveral 3-state LM population models

State-responses associationsWeak Strong

Index of state Index of statetransition transition

ρ = 0.1 ρ = 0.15 ρ = 0.3 ρ = 0.1 ρ = 0.15 ρ = 0.3

n = 300 PBP .180 .148 .116 .550 .496 .320short-cut .188 .145 .104 .568 .494 .339

n = 500 PBP .394 .280 .150 .858 .804 .610short-cut .398 .301 .178 .869 .809 .631

n = 700 PBP .592 .442 .224 .968 .960 .800short-cut .642 .439 .238 .978 .957 .796

Note: The values reported in this table are for the design condition δ = 1, P = 6, T = 3.


The current study addressed methods of power analysis for the BLR when testing

hypotheses on the number of states in LM models. Two alternative methods of power


computation were discussed: the proportion of significant bootstrap p-values (PBP) and

the short-cut method. Using the PBP method, power is computed by first generating

a number of independent data sets under the alternative hypothesis, and then, for each

of these data sets, computing the p-value by applying a parametric bootstrap procedure

(McLachlan, 1987). The PBP method is computationally very demanding as it requires

performing the full bootstrap for each of M samples from the H1 model. We proposed

solving this computation time problem using the short-cut method. The short-cut method

works very much as a standard power computation, with the difference that instead of

relying on the theoretical distributions (a central chi-square under the null hypothesis and

a non-central chi-square under the alternative hypothesis), the distributions under H0 and

H1 are constructed by Monte Carlo simulation.

A numerical study was conducted to (a) illustrate the proposed power analysis methods

and (b) compare the power obtained by the short-cut and the PBP methods. As expected,

the power of the BLR test in the LM models increased with sample size. Likewise, power

increased with more time points and more response variables. In addition to these design

factors, the power of the BLR test was shown to depend on the following population

characteristics: the initial state proportions, the state transition probabilities, and the

state-response associations. Holding the other design factors constant, power was larger

with more balanced initial state proportions, more stable state memberships, and stronger

state-response associations. Contrary to this, when initial state proportions are highly

imbalanced, state membership is unstable, and the state-response association is weak,

the power of the BLR test is low.

For the simulation conditions that we have considered in this study, the sample size

required to achieve a power level of .8 or larger ranged from a few hundred to thousands

of cases. Also, the required sample size depended on other design factors and population

characteristics, which are highly interdependent. In general, the more time points, the

more response variables, the more balanced the initial state proportions, the more stable

the state memberships, and the stronger the state-response associations, the smaller the


sample size needed to achieve a certain power level. Because of mutual dependencies

among the LM model parameters, and since the required sample size is also influenced by

the number of time points, response variables, and state-indicator variable associations,

a sample size of 300 or 500 will often not suffice in LM analysis. Therefore, we strongly

suggest applied researchers to perform a power analysis for his/her specific research

situation instead of relying on certain rules of thumb about the sample size. The same

applies to questions about the minimum number of time points and/or response variables.

Limitations to the current numerical experiments need to be acknowledged. Firstly, in

the current study, we assumed time homogeneity for both state transition and conditional

response probabilities. Future research should assess the power of the BLR test if this

assumption is relaxed. Secondly, the conditional response probabilities of the binary

response variables were set to equal values, and for simplicity, we considered a specific

structure of the transition matrix: πs|r = ρ|s−r|∑Sh=1 ρ

h−r . However, in practice the conditional

response probabilities may differ across response variables, the response variables may be

nominal with more than two categories, continuous or of mixed type, and the structure

of the transition matrix can be completely unconstrained, or, for example, symmetric

or triangular (Bartolucci, 2006). Thus, more intensive simulations that address these

different scenarios in the H1 population may be needed to establish more knowledge and

guidelines about the power and sample size requirements of the BLR test for the number

of states in LM models.

CHAPTER 6

Summary and discussions

6.1 Summary

This dissertation aimed to study power analysis methods for latent class and latent Markov

models. The most important requirement when setting up a study using such a model

is that it should be possible to detect the relevant classes (or states). Other, more

specific, requirements concern particular model parameters: the measurement parameters,

which specify the associations between the latent classes and the indicator variables,

the transition parameters, which describe transitions between states across successive

measurement occasions, and for models with covariates, the structural parameters, which

describe relationships between classes and explanatory variables. For these four sets of

parameters, we identified the relevant null hypotheses, studied the requirements of the

study design to achieve enough power for the relevant statistical tests, and presented tools

which applied researchers may use for power and sample size computation.

More specifically, in Chapter 2 we studied power analysis for the Wald test for the

121

122 CHAPTER 6. SUMMARY AND DISCUSSIONS

measurement parameters in latent class models. The objectives of this chapter were

twofold: one was presenting a method for power and sample size computation and the

other was identifying the design factors affecting the power of these Wald tests. We

presented a simple procedure for power or sample size computation for the Wald test,

which makes use of the asymptotic distribution of this statistic under the alternative

hypothesis. In order to compute the power or the sample size, the proposed power analysis

method requires obtaining the expected information matrix for the model parameters,

which can, among others, be computed by creating an ”exemplary” data set; that is, a

data set which contains all possible response patterns with weights equal to the population

proportions according to the model under the alternative hypothesis. Using this exemplary

data set, one can obtain the expected information matrix with standard software for latent

class analysis. The expected information matrix is subsequently used to obtain the non-

centrality parameter.

The power of the Wald test in latent class models is shown to depend on the effect

size, the sample size, the level of significance, the number of classes, the class proportions,

and the number of indicator variables. The first three factors may be considered as the

standard factors for statistical power analysis (Cohen, 1988), whereas the others are

specific to latent class models. Analytic derivations that address how these latent class

specific design factors affect the separation between classes, which is one of the key

elements of the study design in latent class modeling, were provided. Based on these

derivations, we discussed how the information matrix (and the power of the Wald test,

which indirectly involves this information matrix) is affected by the fact that latent class

membership is not observable.

Effect size, which in latent class models refers to the differences in responses between

classes, plays a double role in power analysis for tests concerning the measurement

parameters. As is always the case for standard statistical models (e.g., ANOVA, logistic

regression analysis), larger effects require a smaller sample to be detected with a power

of say .8 or larger. However, effect size also affects the separation between the classes,

6.1. SUMMARY 123

and thus the certainty about respondents’ class memberships. That is, the larger the

effect sizes, the smaller the loss of power resulting from the fact that we are uncertain

about the subjects’ class memberships. Other factors affecting the class separation are

the number of classes, the class proportions, and the number of response variables. The

larger the number of classes, the more unequal the class proportions, and the smaller the

number of response variables, the more uncertain we are about the respondents’ class

memberships, and thus the lower the power. These results further support the idea of

Moerbeek (2014), who suggested for discrete-time survival analysis mixture models that

lower class separation requires larger sample size.

In Chapter 3 we studied the statistical power of the likelihood-ratio and Wald tests

for testing the structural parameters in latent class analysis. Asymptotic distributions,

a central chi-square under the null and a non-central chi-square under the alternative

hypotheses, were assumed for both the tests. When using these asymptotic distributions

of the tests for power or sample size computation, the most difficult problem is estimating

the non-centrality parameters. For the likelihood-ratio test, the non-centrality parameter

is shown to be a function of the log-likelihood differences between the models under the

alternative and null hypotheses. For the Wald test, it is a function of the logit parameters

for covariate effects on latent classes and the expected information matrix.

We proposed estimating the non-centrality parameter by simulating a large data set

from the population under the alternative hypothesis. When using the likelihood-ratio

test, this amounts to fitting the models under the null and alternative hypotheses to a

large simulated data set obtained under the alternative hypothesis. When using the Wald

test, the large simulated data set was used to estimate the expected information matrix

(or the variance-covariances) for the parameters under the alternative hypotheses. As an

alternative to the large simulated data set method, the exemplary data method that we

discussed in Chapter 2 could also be used. However, when the covariates are continuous

instead of categorical, or when the number of indicator variables involved is large, the

exemplary data method is generally impractical.


A numerical study was conducted to illustrate the proposed power analysis methods,

as well as to compare the power of the two types of tests. The results of this numerical

study indicated that, for a given effect size of a covariate on the latent classes, a desired

level of power can not only be obtained by manipulating the sample size, but also by

varying the number or the quality of the indicator variables. The implication of this is

that the statistical power for tests concerning the structural parameters depends on the

population characteristic for the measurement parameters as well. Based on the reported

results of the numerical study, we also concluded that the likelihood-ratio test is slightly

larger than the Wald test, supporting the results of previous work on power comparison

between the Wald and likelihood-ratio tests (Williamson et al., 2007).

In Chapter 4 we presented power analysis methods for testing hypotheses about the

transition parameters in latent Markov models. We distinguished power computation for

the standard case and power computation for the non-standard case, where the latter

arises when probabilities are fixed to zero. For the former case, we presented a power

computation method that relies on the theoretical distribution of the likelihood-ratio

statistic; i.e., a central chi-square under the null and non-central under the alternative

hypothesis. A problem arising when using these theoretical distributions is that the

non-centrality parameter is generally unknown, which makes it difficult to use the non-

central chi-square distribution in the process of computing the power. We proposed

two alternative solutions for this problem. One is estimating the power by simulating

the distribution of the likelihood-ratio under the alternative hypothesis. The other is

estimating the non-centrality parameter using the exemplary data set method that we

also described in Chapter 2. When the number of measurement occasions or the number

of response variables is large, the number of response patterns quickly becomes very large.

In such a case, the exemplary data method becomes impractical. We proposed resolving

this problem by using a large simulated data set as was also discussed in Chapter 3.

For the tests considered in the non-standard case, the distribution of the likelihood-

ratio is neither chi-square under the null hypothesis nor non-central chi-square under

6.1. SUMMARY 125

the alternative hypothesis. Therefore, for the non-standard case, we discussed power

computation by Monte Carlo simulation. It requires setting up two Monte Carlo

simulations: one yielding the distribution of the likelihood-ratio statistic under the null

hypothesis and the other yielding its distribution under the alternative hypothesis.

Design factors studied in Chapter 2 for testing the measurement in latent class

models were extended in Chapter 4 for testing transition parameters in latent Markov

models. Latent class and latent Markov models share the measurement parameters, and

thus factors affecting the uncertainty about the individuals’ class memberships play an

important role in power analysis for tests in latent Markov models as well. Additionally,

specific to latent Markov models are the number of measurement occasions and the size of

the transition probabilities. The results of the numerical experiment indicated that when

the transition probabilities are large, either a much larger sample size or larger number of

measurement occasions is required to achieve an acceptable level of power.

In Chapter 5 we studied power analysis for the bootstrap likelihood-ratio test for the

number of states in latent Markov models. The power of the bootstrap likelihood-ratio

test may be computed as the proportion of the bootstrap p-values (PBP) for which the null

hypothesis is rejected. Such a method of power computation is however computationally

very demanding as it requires performing the bootstrap p-value computation for multiple

data sets simulated according to the model under the alternative hypothesis. For example,

if we use 500 Monte Carlo samples and 500 bootstrap replications per Monte Carlo sample,

we need to estimate both the null and alternative model 250000 times. It will be clear

that such an approach is infeasible, especially for more complex models.

We proposed a computationally more efficient method, which we referred to as the

short-cut method. It works very much as standard power computation (see for example

Satorra and Saris (1985)), with the difference that the distributions under the null and

alternative hypotheses are constructed by simulation. Based on the presented numerical

studies, we concluded that the short-cut method is generally superior to the PBP method.

The additional advantage of the short-cut method is that a) it is computationally cheaper


and b) it can easily be applied to determine the necessary sample size or number of

measurement occasions in a study design.

6.2 Direction for future research and study limitations

Whereas this dissertation focused on power analysis methods for tests in latent class and

latent Markov models, the same methods may be applied with other types of mixture

models, such as mixtures of normal distributions, mixture growth models, and multilevel

latent class models. Generally, it can be expected that the same design factors will affect

the statistical power of the tests in those models, though the distribution of response

variables within the classes and other population characteristics may differ from the

mixture models studied in this thesis. For example, in mixture models for continuous

responses, the class specific densities will typically be assumed to be normally distributed

within classes, and in multilevel latent class models, also the higher-level model parameters

need to be specified.

Specific aspects that requires further research when extending the proposed power

analysis methods to other mixture modeling techniques are the following: a) The

population characteristics we varied were the conditional response probabilities, class

proportions, initial probabilities, and transition probabilities. Different population

characteristics may be relevant for other mixture models. For example, class-specific

means and (co)variances in finite mixture models for continuous responses (McLachlan &

Peel, 2000), class-specific growth trajectories and variance components in growth mixture

models (Tofighi & Enders, 2008; B. Muthen & Muthen, 2000), and both higher- and

lower-level class distributions in multilevel mixture models (Vermunt, 2003). b) We

proposed approximating the non-centrality parameters of the test under the alternative

hypotheses by using either the exemplary data or the simulated large data set method.

Whether these methods can be applied with other types of mixture models requires further

research. For example, it seems that the exemplary data method is problematic with

continuous responses or with multilevel data since it is not possible to list all possible

6.3. CONCLUSION 127

response patterns. However, using a large simulated data set to obtain the non-centrality

parameter may still work.

The main limitations of the presented work concern the reported numerical experiments,

which could be expanded in future research. Firstly, the numerical experiments were

limited to binary response variables. However, in practice the response variables used in

latent class and latent Markov model are often nominal or ordinal variables with three or

more response alternatives. Secondly, the number of parameters defining a latent class or

latent Markov models can become rather large, especially with large numbers of classes or

response variables. We decided to simplify the numerical experiments by assuming that

conditional response probabilities were the same for each response variable (e.g., high in

one class and low in others for all the indicator variables), whereas in practice they may

take on different values. A similar thing applies for the transition parameters, which were

assumed to be constant over time, whereas in practice these may be time varying.

6.3 Conclusion

Mixture models, which include techniques such as latent class, latent Markov, mixture

growth, and multilevel mixture models, are used in many research areas. These models

are not only used in fundamental research, but also in applied research for both profit

and non-profit sectors. Whereas power analysis methods have been developed for many

statistical techniques including logistic regression models (Demidenko, 2007; Whittemore,

1981), log-linear models (O’Brien, 1986; Shieh, 2000), linear multivariate models (Muller,

Lavange, Ramey, & Ramey, 1992), and structural equation models (Satorra & Saris, 1985;

R. MacCallum et al., 2010), these were lacking for mixture models. Moreover, previous

studies in mixture models did not address the requirements of the study design to achieve

enough power for the relevant statistical tests. Given the popularity of these models, we

argued that methods for performing power analysis in mixture models were needed. This

dissertation presented power analysis methods for tests in mixture models, with emphasis

on latent class and latent Markov models.


We discussed power analysis methods for different types of parameters in latent class

and latent Markov models, considering different specifications for the null hypothesis. For

some of these, the asymptotic distribution of the test statistic holds, while it does not

for others. For the situations in which the asymptotic distributions hold, we discussed

the estimation of the non-centrality parameter using the exemplary and large data set

methods. For non-standard testing situations, we presented computationally efficient

Monte Carlo simulation based power computation methods. Tests for the number of

classes may also be classified under this non-standard testing, for which we discussed

power computation by the short-cut and PBP method.

This dissertation contributes to the field of mixture modeling in various ways. Firstly,

the development of power computation methods for mixture models will contribute to

the validity of the results of applied research on which policy and business decisions are

based. Using the proposed power analysis methods enables assessment as to whether the

empirical studies are performed with an appropriate level of statistical power. Secondly,

the various numerical experiments conducted to illustrate the proposed power analysis

methods contribute to the understanding of the research design requirements to achieve

a certain (acceptable) level of power. Thirdly, we provide important tools to make sure

that resources for research are used as efficient as possible.

References

Agresti, A. (2007). An introduction to categorical data analysis. New Jersey: John Wiley

& Sons.

Aitkin, M., Anderson, D., & Hinde, J. (1981). Statistical modelling of data on teaching

styles. Journal of the Royal Statistical Society. Series A (General), 144(4), 419–461.

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions

on Automatic Control , 19(6), 716–723.

Bacci, S., Pandolfi, S., & Pennoni, F. (2014). A comparison of some criteria for states

selection in the latent markov model for longitudinal data. Advances in Data Analysis

and Classification, 8(2), 125–145.

Bakk, Z., Tekle, F. B., & Vermunt, J. K. (2013). Estimating the association between latent

class membership and external variables using bias-adjusted three-step approaches.

Sociological Methodology , 43(1), 272–311.

Bandeen-Roche, K., Miglioretti, D. L., Zeger, S. L., & Rathouz, P. J. (1997).

Latent variable regression for multiple discrete outcomes. Journal of the American

Statistical Association, 92(440), 1375–1386.

Bartolucci, F. (2006). Likelihood inference for a class of latent Markov models under

129

130 References

linear hypotheses on the transition probabilities. Journal of the Royal Statistical

Society: Series B (Statistical Methodology), 68(2), 155–178.

Bartolucci, F., & Farcomeni, A. (2009). A multivariate extension of the dynamic logit

model for longitudinal data based on a latent markov heterogeneity structure.

Journal of the American Statistical Association, 104(486), 816–831.

Bartolucci, F., Farcomeni, A., & Pennoni, F. (2010). An overview of latent Markov

models for longitudinal categorical data. arXiv preprint arXiv:1003.2804 .

Bartolucci, F., Farcomeni, A., & Pennoni, F. (2013). Latent markov models for

longitudinal data. Boca Raton: Chapman and Hall/CRC press.

Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique

occurring in the statistical analysis of probabilistic functions of Markov chains. The

Annals of Mathematical Statistics, 41(1), 164–171.

Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The

general theory and its analytical extensions. Psychometrika, 52(3), 345–370.

Bozdogan, H. (1994). Mixture-model cluster analysis using model selection criteria and

a new informational measure of complexity. In H. Bozdogan, (eds.), Proceedings

of the First US/Japan Conference on the Frontiers of Statistical Modeling: An

informational approach (Vol. 2, pp. 69–113). Boston, MA: Kluwer Academic

Publishers.

Brown, B. W., Lovato, J., & Russell, K. (1999). Asymptotic power calculations:

description, examples, computer code. Statistics in Medicine, 18(22), 3137–3151.

Buse, A. (1982). The likelihood ratio, Wald, and Lagrange multiplier tests: An expository

note. The American Statistician, 36(3), 153–157.

Chung, H., Park, Y., & Lanza, S. T. (2005). Latent transition analysis with covariates:

pubertal timing and substance use behaviours in adolescent females. Statistics in

Medicine, 24(18), 2895–2910.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ:

Erlbaum.

References 131

Collins, L. M., & Lanza, S. T. (2010). Latent class and latent transition analysis: With

applications in the social, behavioral, and health sciences. New Jersey: John Wiley

& Sons.

Collins, L. M., & Wugalter, S. E. (1992). Latent class models for stage-sequential dynamic

latent variables. Multivariate Behavioral Research, 27(1), 131–157.

Dayton, C. M., & Macready, G. B. (1976). A probabilistic model for validation of

behavioral hierarchies. Psychometrika, 41(2), 189–204.

Dayton, C. M., & Macready, G. B. (1988). Concomitant-variable latent-class models.

Journal of the American Statistical Association, 83(401), 173–178.

Demidenko, E. (2007). Sample size determination for logistic regression revisited.

Statistics in Medicine, 26(18), 3385–3397.

Demidenko, E. (2008). Sample size and optimal design for logistic regression with binary

interaction. Statistics in Medicine, 27(1), 36–46.

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from

incomplete data via the EM algorithm. Journal of the Royal Statistical Society.

Series B (Statistical Methodology), 39(1), 1–38.

Dias, J. (2006). Latent class analysis and model selection. In M. Spiliopoulou, R. Kruse,

C. Borgelt, A. Nurnberger, & W. Gaul (eds.), From Data and Information Analysis

to Knowledge Engineering (pp. 95–102). Berlin: Springer-Verlag.

Dias, J., & Goncalves, M. (2004). Finite mixture models: Review, applications,

and computer-intensive methods. Doctoral Dissertation. Research School Systems,

Organisation and Management, Groningen of University, The Netherlands.

Dziak, J. J., Lanza, S. T., & Tan, X. (2014). Effect size, statistical power, and sample

size requirements for the bootstrap likelihood ratio test in latent class analysis.

Structural Equation Modeling: A Multidisciplinary Journal , 21(4), 534–552.

Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses

using G* power 3.1: Tests for correlation and regression analyses. Behavior Research

Methods, 41(4), 1149–1160.

132 References

Feng, Z. D., & McCulloch, C. E. (1996). Using bootstrap likelihood ratios in finite mixture

models. Journal of the Royal Statistical Society. Series B (Statistical Methodology),

58(3), 609–617.

Fonseca, J. R., & Cardoso, M. G. (2007). Mixture-model cluster analysis using information

theoretical criteria. Intelligent Data Analysis, 11(2), 155–173.

Forcina, A. (2008). Identifiability of extended latent class models with individual

covariates. Computational Statistics & Data Analysis, 52(12), 5263–5268.

Formann, A. K. (1982). Linear logistic latent class analysis. Biometrical Journal , 24(2),

171–190.

Formann, A. K. (1992). Linear logistic latent class analysis for polytomous data. Journal

of the American Statistical Association, 87(418), 476–486.

Giudici, P., Ryden, T., & Vandekerkhove, P. (2000). Likelihood-ratio tests for hidden

Markov models. Biometrics, 56(3), 742–747.

Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and

unidentifiable models. Biometrika, 61(2), 215–231.

Gudicha, D. W., Schmittmann, V. D., & Vermunt, J. K. (2015). Power

computation for likelihood ratio tests for the transition parameters in latent

Markov models. Structural Equation Modeling: A Multidisciplinary Journal, DOI:

10.1080/10705511.2015.1014040 .

Gudicha, D. W., Tekle, F. B., & Vermunt, J. K. (in press). Power and sample size

computation for Wald tests in latent class models. Journal of Classification.

Gudicha, D. W., & Vermunt, J. K. (2013). Mixture model clustering with covariates using

adjusted three-step approaches. In B. Lausen, D. van den Poel, & A. Ultsch (eds.),

Algorithms from and for Nature and Life; Studies in Classification, Data Analysis,and

Knowledge Organization (pp. 87–93). Heidelberg, Germany: Springer-Verlag.

Hagenaars, J. A. (1988). Latent structure models with direct effects between indicators

local dependence models. Sociological Methods & Research, 16(3), 379–405.

Hagenaars, J. A., & McCutcheon, A. L. (2002). Applied latent class analysis. New York:

References 133

Cambridge University Press.

Hirtenlehner, H., Starzer, B., & Weber, C. (2012). A differential phenomenology of

stalking using latent class analysis to identify different types of stalking victimization.

International Review of Victimology , 18(3), 207–227.

Holt, J. A., & Macready, G. B. (1989). A simulation study of the difference chi-square

statistic for comparing latent class models under violation of regularity conditions.

Applied Psychological Measurement, 13(3), 221–231.

Hsieh, F. Y., Bloch, D. A., & Larsen, M. D. (1998). A simple method of sample size

calculation for linear and logistic regression. Statistics in Medicine, 17(14), 1623–

1634.

Jackson, K. M., & Schulenberg, J. E. (2013). Alcohol use during the transition from

middle school to high school: national panel data on prevalence and moderators.

Developmental Psychology , 49(11), 2147–2158.

Keel, P. K., Fichter, M., Quadflieg, N., Bulik, C. M., Baxter, M. G., Thornton, L., . . .

others (2004). Application of a latent class analysis to empirically define eating

disorder phenotypes. Archives of General Psychiatry , 61(2), 192–200.

Langeheine, R., & Van de Pol, F. (1993). Multiple indicator Markov models. In R. Steyer,

K. F. Wender, & K. F. Widaman (eds.), Proceedings of the 7th European Meeting

of the Psychometric Society in Trier (pp. 248–252). Stuttgart: Fischer.

Lanza, S. T., & Collins, L. M. (2008). A new SAS procedure for latent transition analysis:

transitions in dating and sexual risk behavior. Developmental Psychology , 44(2),

446–456.

Lanza, S. T., Collins, L. M., Lemmon, D. R., & Schafer, J. L. (2007). PROC LCA:

A SAS procedure for latent class analysis. Structural Equation Modeling: A

Multidisciplinary Journal , 14(4), 671–694.

Lazarsfeld, P. (1950). The logical and mathematical foundation of latent structure

analysis and the interpretation and mathematical foundation of latent structure

analysis. In S.A. Stouffer et al (eds.), Measurement and prediction (Vol. 4, pp.

134 References

362–472). Princeton, NJ: Princeton University Press.

Leisch, F. (2004). Flexmix: A general framework for finite mixture models and latent

glass regression in R. Journal of Statistical Software, 11(8), 1–18.

Linzer, D. A., & Lewis, J. B. (2011). poLCA: An R package for polytomous variable

latent class analysis. Journal of Statistical Software, 42(10), 1–29.

Lukociene, O., Varriale, R., & Vermunt, J. K. (2010). The simultaneous decision(s)

about the number of lower-and higher-level classes in multilevel latent class analysis.

Sociological Methodology , 40(1), 247–283.

MacCallum, R., Lee, T., & Browne, M. W. (2010). The issue of isopower in power

analysis for tests of structural equation models. Structural Equation Modeling: A

Multidisciplinary Journal , 17(1), 23–41.

MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between nested

covariance structure models: Power analysis and null hypotheses. Psychological

Methods, 11(1), 19–35.

Magidson, J., & Vermunt, J. K. (2004). Latent class models. In D. Kaplan (eds.), The

Sage Handbook of Quantitative Methodology for the Social Sciences (pp. 175–198).

Thousand Oakes: Sage Publications.

Mann, H. B., & Wald, A. (1943). On stochastic limit and order relationships. The Annals

of Mathematical Statistics, 14(3), 217–226.

Marsh, H. W., Hau, K.-T., Balla, J. R., & Grayson, D. (1998). Is more ever too much?

the number of indicators per factor in confirmatory factor analysis. Multivariate

Behavioral Research, 33(2), 181–220.

Martin, R. A., Velicer, W. F., & Fava, J. L. (1996). Latent transition analysis to the

stages of change for smoking cessation. Addictive Behaviors, 21(1), 67–80.

McCutcheon, A. L. (1987). Latent class analysis. Sage University Papers Series:

Quantitative Applications in the Social Sciences Number 07–064. Newbury Park,

CA: Sage publishers.

McCutcheon, A. L. (2002). Basic concepts and procedures in single-and multiple-group

References 135

latent class analysis. In J. A. Hagenaars & A. L. Mccutcheon (eds.), Applied Latent

Class Analysis (pp. 56–85). Cambridge, UK: Cambridge University Press.

McDonald, R. P., & Marsh, H. W. (1990). Choosing a multivariate model: Noncentrality

and goodness of fit. Psychological Bulletin, 107(2), 247–255.

McHugh, R. B. (1956). Efficient estimation and local identification in latent class analysis.

Psychometrika, 21(4), 331–347.

McLachlan, G. (1987). On bootstrapping the likelihood ratio test stastistic for the number

of components in a normal mixture. Applied Statistics, 36(3), 318–324.

McLachlan, G., & Krishnan, T. (2007). The EM algorithm and extensions. New Jersey:

John Wiley & Sons.

McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: John Wiley &

Sons.

Moerbeek, M. (2014). Sufficient sample sizes for discrete-time survival analysis mixture

models. Structural Equation Modeling: A Multidisciplinary Journal , 21(1), 63–67.

Mooijaart, A., & Van der Heijden, P. G. (1992). The EM algorithm for latent class

analysis with equality constraints. Psychometrika, 57(2), 261–269.

Muller, K. E., Lavange, L. M., Ramey, S. L., & Ramey, C. T. (1992). Power calculations for

general linear multivariate models including repeated measures applications. Journal

of the American Statistical Association, 87(420), 1209–1226.

Muthen, B., & Muthen, L. (2000). Integrating person-centered and variable-centered

analyses: Growth mixture modeling with latent trajectory classes. Alcoholism:

Clinical and Experimental Research, 24(6), 882–891.

Muthen, L., & Muthen, B. (1998-2007). Mplus user’s guide. fifth edition. Los Angeles:

Muthen & Muthen.

Nakagawa, S., & Foster, T. M. (2004). The case against retrospective statistical power

analyses with an introduction to power analysis. Acta Ethologica, 7(2), 103–108.

Nylund, K. L., Asparouhov, T., & Muthen, B. O. (2007). Deciding on the number

of classes in latent class analysis and growth mixture modeling: A Monte Carlo

136 References

simulation study. Structural Equation Modeling: A Multidisciplinary Journal , 14(4),

535–569.

O’Brien, R. G. (1986). Using the SAS system to perform power analyses for log-linear

models. Proceedings of the Eleventh Annual SAS Users Group Conference, Cary,

NC: SAS Institute, 778–784.

Pearson, K. (1894). Contributions to the mathematical theory of evolution. Philosophical

Transactions of the Royal Society of London, 185(1), 71–110.

Poulsen, C. S. (1990). Mixed Markov and latent Markov modelling applied to brand

choice behaviour. International Journal of Research in Marketing , 7(1), 5–19.

Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural

equation modeling. Psychometrika, 69(2), 167–190.

Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in

speech recognition. Proceedings of the IEEE , 77(2), 257–286.

Reboussin, B. A., Reboussin, D. M., Liang, K.-Y., & Anthony, J. C. (1998). Latent

transition modeling of progression of health-risk behavior. Multivariate Behavioral

Research, 33(4), 457–478.

Redner, R. (1981). Note on the consistency of the maximum likelihood estimate for

nonidentifiable distributions. The Annals of Statistics, 9(1), 225–228.

Rencher, A. C. (2000). Linear models in statistics. New York: John Wiley & Sons.

Rindskopf, D., & Rindskopf, W. (1986). The value of latent class analysis in medical

diagnosis. Statistics in Medicine, 5(1), 21–27.

Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio test in covariance

structure analysis. Psychometrika, 50(1), 83–90.

Schoenfeld, D. A., & Borenstein, M. (2005). Calculating the power or sample size for the

logistic and proportional hazards models. Journal of Statistical Computation and

Simulation, 75(10), 771–785.

Schwarz, G., et al. (1978). Estimating the dimension of a model. The Annals of Statistics,

6(2), 461–464.

References 137

Sclove, S. L. (1987). Application of model-selection criteria to some problems in

multivariate analysis. Psychometrika, 52(3), 333–343.

Self, S. G., Mauritsen, R. H., & Ohara, J. (1992). Power calculations for likelihood ratio

tests in generalized linear models. Biometrics, 48(1), 31–39.

Shapiro, A. (1988). Towards theory of inequality. International Statistical Review , 56(1),

49–62.

Shieh, G. (2000). On power and sample size calculations for likelihood ratio tests in

generalized linear models. Biometrics, 56(4), 1192–1196.

Sotres-Alvarez, D., Herring, A. H., & Siega-Riz, A.-M. (2013). Latent transition models to

study women’s changing of dietary patterns from pregnancy to 1 year postpartum.

American journal of epidemiology , 177(8), 852–861.

Steiger, J. H., Shapiro, A., & Browne, M. W. (1985). On the multivariate asymptotic

distribution of sequential chi-square statistics. Psychometrika, 50(3), 253–263.

Tein, J.-Y., Coxe, S., & Cham, H. (2013). Statistical power to detect the correct number of

classes in latent profile analysis. Structural Equation Modeling: A Multidisciplinary

Journal , 20(4), 640–657.

Tofighi, D., & Enders, C. K. (2008). Identifying the correct number of classes in

growth mixture models. In G.R. Hancock (eds.), Mixture Models in Latent Variable

Research (pp. 317–341). Charlotte, NC: Information Age.

Uebersax, J. S., & Grove, W. M. (1990). Latent class analysis of diagnostic agreement.

Statistics in Medicine, 9(5), 559–572.

Van de Pol, F., & De Leeuw, J. (1986). A latent Markov model to correct for measurement

error. Sociological Methods & Research, 15(1-2), 118–141.

Van der Heijden, P. G., Dessens, J., & Bockenholt, U. (1996). Estimating the

concomitant-variable latent-class model with the EM algorithm. Journal of

Educational and Behavioral Statistics, 21(3), 215–229.

Vermunt, J. K. (1996). Log-linear event history analysis: A general approach with missing

data, latent variables, and unobserved heterogeneity. Tilburg: Tilburg University

138 References

Press.

Vermunt, J. K. (1997). LEM: A general program for the analysis of categorical data.

Tilburg University, The Netherlands.

Vermunt, J. K. (2003). Multilevel latent class models. Sociological Methodology , 33(1),

213–239.

Vermunt, J. K. (2010a). Latent class modeling with covariates: Two improved three-step

approaches. Political Analysis, 18(4), 450–469.

Vermunt, J. K. (2010b). Latent class models. In P. Peterson, E. Baker, & B. McGaw,

(eds.), International Encyclopedia of Education (Vol. 7, pp. 238–244). Oxford:

Elsevier.

Vermunt, J. K., Langeheine, R., & Bockenholt, U. (1999). Discrete-time discrete-state

latent markov models with time-constant and time-varying covariates. Journal of

Educational and Behavioral Statistics, 24(2), 179–207.

Vermunt, J. K., & Magidson, J. (2002). Latent class cluster analysis. In J. A. Hagenaars

& A. L. Mccutcheon (eds.), Applied Latent Class Analysis (pp. 56–85). Cambridge,

UK: Cambridge University Press.

Vermunt, J. K., & Magidson, J. (2013a). Lg-syntax user’s guide: Manual for latent gold

5.0 syntax module. Belmont, MA: Statistical Innovations Inc.

Vermunt, J. K., & Magidson, J. (2013b). Technical guide for Latent GOLD 5.0: Basic,

advanced, and syntax. Belmont, MA: Statistical Innovations Inc.

Vermunt, J. K., Tran, B., & Magidson, J. (2008). Latent class models in longitudinal

research. In S. Menard (eds.), Handbook of Longitudinal Research: Design,

Measurement, and Analysis (pp. 373–385). Burlington, MA: Elsevier.

Visser, I., Raijmakers, M. E. J., & Molenaar, P. C. M. (2002). Fitting hidden Markov

models to psychological data. Scientific Programming , 10(3), 185–199.

Visser, I., & Speekenbrink, M. (2010). depmixS4: an R-package for hidden Markov

models. Journal of Statistical Software, 36(7), 1–21.

Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when

References 139

the number of observations is large. Transactions of the American Mathematical

Society , 54(3), 426–482.

Wall, M. M., & Li, R. (2009). Multiple indicator hidden Markov model with an application

to medical utilization data. Statistics in Medicine, 28(2), 293–310.

White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica:

Journal of the Econometric Society , 50(1), 1–25.

Whittemore, A. S. (1981). Sample size for logistic regression with small response

probability. Journal of the American Statistical Association, 76(373), 27–32.

Wiggins, L. M. (1973). Panel analysis: Latent probability models for attitude and behavior

processes. San Francisco: Elsevier Scientific.

Williamson, J. M., Lin, H., Lyles, R. H., & Hightower, A. W. (2007). Power calculations

for zip and zinb models. Journal of Data Science, 5(4), 519–534.

Wolfe, J. H. (1970). Pattern clustering by multivariate mixture analysis. Multivariate

Behavioral Research, 5(3), 329–350.

Yamaguchi, K. (2000). Multinomial logit latent-class regression models: An analysis of

the predictors of gender-role attitudes among japanese women. American Journal

of Sociology , 105(6), 1702–1740.

Yang, C. (2006). Evaluating latent class analysis models in qualitative phenotype

identification. Computational Statistics & Data Analysis, 50(4), 1090–1104.

Yang, I., & Becker, M. (1997). Latent variable modeling of diagnostic accuracy.

Biometrics, 53(3), 948–958.

Acknowledgments

I would like to express my sincere gratitude and appreciation to the many people who have

offered me unwavering support, encouragement, and inspiration throughout this research.

I feel incredibly privileged to have the opportunity to share prof. dr. Jeroen Vermunt

exceptional scientific knowledge in the field of mixture modeling. prof. dr. Jeroen

Vermunt, during the last five years, as my professor on categorical data analysis, as a

supervisor for my first year paper and master thesis, and as a supervisor (and promoter)

for my PhD thesis, I constantly benefited from your continuous support and guidance.

Back in 2011, when applying for NWO Research Talent grant, you believed that I could

write my PhD thesis in three years. Thank you for understanding my potential and for

supporting me to grow as a research scientist. Working with you, I have had a very

enjoyable and rewarding experience.

Special gratitude is extended to my co-supervisors dr. Verena Schmittmann and dr.

Fetene Tekle, for constructive suggestions that they contributed to the various chapters

in my thesis. dr. Verena Schmittmann, I have learned a lot about how to structure

and write rigorous academic papers through my partnership with you. dr. Fetene Tekle,

beside professional support, you helped me in a lot of practical issues by sharing your

experience of staying in the Netherlands. I would also like to thank the members of my

thesis committee, for their valuable time and encouraging comments. My thanks goes to

141

142 ACKNOWLEDGMENTS

VIC group members, colleagues, and the administrative staff at Tiburg University, who

directly or indirectly contributed to this thesis.

My sincere thanks also goes to Lonneke van der Linde, former Oldendorff research

policy advisor, and dr. Andries van der Ark, former research master students coordinator

at Tilburg University, for creating a friendly and welcoming environment and for making

me feel home while more than 3500 miles away from home. I would also like to express

my gratitude to Tilburg University, Oldendorff scholarship, and NWO for financial support

during my research master study and PhD research. Oldendorff scholarship is a lot to me;

if not for this scholarship, I wouldn’t have been here.

I am deeply indebted to my family, especially my wife Kelebet and my lovely daughter

Nenati. Kelebet, without your deep love and full understanding, I would never have

succeeded. You sacrifice your career for taking care for our daughter and dedicate countless

efforts to make this proud moment in my life a reality. Mam, you are the most special:

you never went to school but send me from a remote rural area where no schools to cities

in Ethiopia, and then to Europe for my education. I don’t have enough words of thanks

to express my gratitude for you, but I simply pray that God will bless you with many

more healthy and joyful years. Furthermore, I would like to thank brothers and sisters at

Eindhoven church for an amazing fellowship.

Above all, I wholeheartedly thank my mighty God who gives us richly all things to

enjoy, and whose perfect love, patience, and gifts are the real strength behind the greatest

accomplishments in my life.

Documents

Power analysis methods for tests in latent class and latent Markov