78
Cross-sectional LCA Patterns of first response to cigarettes

Cross-sectional LCA Patterns of first response to cigarettes

Embed Size (px)

DESCRIPTION

Aim To categorise the subjects based on their pattern of responses To assess the relationship between first-response and current smoking behaviour To try not to think too much about the possibility of recall bias

Citation preview

Page 1: Cross-sectional LCA Patterns of first response to cigarettes

Cross-sectional LCA

Patterns of first response to cigarettes

Page 2: Cross-sectional LCA Patterns of first response to cigarettes

First smoking experience

• Have you ever tried a cigarette (including roll-ups), even a puff?• How old were you when you first tried a cigarette?

• When you FIRST ever tried a cigarette can you remember how it made you feel? (tick as many as you want)– It made me cough– I felt ill– It tasted awful– I liked it– It made me feel dizzy

Page 3: Cross-sectional LCA Patterns of first response to cigarettes

Aim

• To categorise the subjects based on their pattern of responses

• To assess the relationship between first-response and current smoking behaviour

• To try not to think too much about the possibility of recall bias

Page 4: Cross-sectional LCA Patterns of first response to cigarettes

Step 1

Look at your data!!!

Page 5: Cross-sectional LCA Patterns of first response to cigarettes

Examine your data structure

• LCA converts a large number of response patterns into a small number of ‘homogeneous’ groups

• If the responses in your data are fair mutually exclusive then there’s no point doing LCA

• Don’t just dive in

Page 6: Cross-sectional LCA Patterns of first response to cigarettes

How many items endorsed?

numresp | Freq. Percent Cum.------------+----------------------------------- 0 | 69 2.75 2.75 1 | 1,597 63.70 66.45 2 | 569 22.70 89.15 3 | 202 8.06 97.21 4 | 68 2.71 99.92 5 | 2 0.08 100.00------------+----------------------------------- Total | 2,507 100.00

Page 7: Cross-sectional LCA Patterns of first response to cigarettes

Frequency of each item (n ~ 2500)

0

200

400

600

800

1,000

1,200

1,400

cough ill taste liked dizzy

Num

ber o

f pos

itive

resp

onse

s

Page 8: Cross-sectional LCA Patterns of first response to cigarettes

Examine pattern frequency +---------------------------------------+ | cough ill taste liked dizzy num | |---------------------------------------| 1. | 0 0 1 0 0 468 | 2. | 0 0 0 1 0 452 | 3. | 1 0 0 0 0 449 | 4. | 1 0 1 0 0 279 | 5. | 0 0 0 0 1 194 | |---------------------------------------| 6. | 1 1 1 0 0 94 | 7. | 1 0 0 1 0 87 | 8. | 1 0 0 0 1 76 | 9. | 0 0 0 0 0 69 |10. | 1 1 1 0 1 59 | |---------------------------------------|11. | 0 0 0 1 1 56 |12. | 1 0 1 0 1 47 |13. | 1 0 0 1 1 35 |14. | 0 1 0 0 0 34 |15. | 0 0 1 0 1 27 | |---------------------------------------|

+---------------------------------------+ | cough ill taste liked dizzy num | |---------------------------------------|16. | 0 1 1 0 0 17 |17. | 0 0 1 1 0 13 |18. | 1 1 0 0 1 9 |19. | 1 1 0 0 0 8 |20. | 0 1 1 0 1 7 | |---------------------------------------|21. | 1 0 1 1 1 7 |22. | 1 0 1 1 0 6 |23. | 0 1 0 0 1 5 |24. | 1 1 1 1 1 2 |25. | 0 1 0 1 1 2 | |---------------------------------------|26. | 0 1 0 1 0 1 |27. | 1 1 1 1 0 1 |28. | 1 1 0 1 1 1 |29. | 0 0 1 1 1 1 |30. | 1 1 0 1 0 1 | +---------------------------------------+

Page 9: Cross-sectional LCA Patterns of first response to cigarettes

Examine correlation structure

Polychoric correlation matrix

cough ill taste liked dizzy

cough 1        

ill 0.371 1      

taste 0.049 0.468 1    

liked -0.510 -0.542 -0.786 1  

dizzy -0.030 0.246 -0.241 -0.158 1

Page 10: Cross-sectional LCA Patterns of first response to cigarettes

Step 2

Now you can fit a latent class model

Page 11: Cross-sectional LCA Patterns of first response to cigarettes

Latent Class models

• Work with observations at the pattern level rather than the individual (person) level

+---------------------------------------+ | cough ill taste liked dizzy num | |---------------------------------------| 1. | 0 0 1 0 0 468 | 2. | 0 0 0 1 0 452 | 3. | 1 0 0 0 0 449 | 4. | 1 0 1 0 0 279 | 5. | 0 0 0 0 1 194 | |---------------------------------------|

Page 12: Cross-sectional LCA Patterns of first response to cigarettes

Latent Class models

• For a given number of latent classes, using application of Bayes’ rule plus an assumption of conditional independence one can calculate the probability that each pattern should fall into each class

• Derive the likelihood of the obtained data under each model (i.e. assuming different numbers of classes) and use this plus other fit statistics to determine optimal model i.e. optimal number of classes

Page 13: Cross-sectional LCA Patterns of first response to cigarettes

Latent Class models

• Bayes’ rule:

• Conditional independence: P( pattern = ’01’ | class = i) = P(pat(1) = ‘0’ | class = i)*P(pat(2) = ‘1’ | class = i)

Page 14: Cross-sectional LCA Patterns of first response to cigarettes

How many classes can I have?~ degrees of freedom

• 32 possible patterns

• Each additional class requires – 5 df to estimate the 5 prevalence of each item that class

(i.e. 5 thresholds)– 1 df for an additional cut of the latent variable defining

the class distribution• Hence a 5-class model uses up 5*5 + 4 = 29

degrees of freedom leaving up to 3df to test the model

Page 15: Cross-sectional LCA Patterns of first response to cigarettes

Standard thresholds

• Mplus thinks of binary variables as being a dichotomised continuous latent variable

• The point at which a continuous N(0,1) variable must be cut to create a binary variable is called a threshold

• A binary variable with 50% cases corresponds to a threshold of zero

• A binary variable with 2.5% cases corresponds to a threshold of 1.96

Page 16: Cross-sectional LCA Patterns of first response to cigarettes

Standard thresholds

Figure from Uebersax webpage

Page 17: Cross-sectional LCA Patterns of first response to cigarettes

Data: File is “..\smoking_experience.dta.dat"; listwise is on; Variable: Names are sex cough ill taste liked dizzy

numresp less_12 less_13;

categorical are cough ill taste liked dizzy ; usevariables are cough ill taste liked dizzy; Missing are all (-9999) ;

classes = c(3); Analysis: proc = 2 (starts); type = mixture; starts = 1000 500; stiterations = 20; Output: tech10;

Page 18: Cross-sectional LCA Patterns of first response to cigarettes

What you’re actually doing

model: %OVERALL%

[c#1 c#2];

%c#1% [cough$1]; [ill$1]; [taste$1]; [liked$1]; [dizzy$1];

+ five more threshold parameters for %c#2% and %c#3%

Defines the latent class variable

Defines the within class thresholds i.e. the prevalence of the endorsement of each item

Page 19: Cross-sectional LCA Patterns of first response to cigarettes

SUMMARY OF CATEGORICAL DATA PROPORTIONS

COUGH Category 1 0.537 Category 2 0.463 ILL Category 1 0.904 Category 2 0.096 TASTE Category 1 0.590 Category 2 0.410 LIKED Category 1 0.735 Category 2 0.265 DIZZY Category 1 0.789 Category 2 0.211

Page 20: Cross-sectional LCA Patterns of first response to cigarettes

RANDOM STARTS RESULTS RANKED FROM THE BEST TO THE WORST LOGLIKELIHOOD VALUES

Final stage loglikelihood values at local maxima, seeds, and initial stage start numbers:

-6343.937 685561 9973 -6343.937 172907 9395 -6343.937 497824 9464 -6343.937 770684 7725 -6343.937 584663 5193 -6343.937 872295 2899 -6343.937 116150 3570 -6343.937 271339 4768 -6343.937 472383 9650 -6343.937 707126 3683Etc.

Page 21: Cross-sectional LCA Patterns of first response to cigarettes

How many random starts?

• Depends on– Sample size– Complexity of model

• Number of manifest variables• Number of classes

• Aim to find consistently the model with the lowest likelihood, within each run

Page 22: Cross-sectional LCA Patterns of first response to cigarettes

Success Not there yetLoglikelihood values at local maxima,

seeds, and initial stage start numbers:

-10148.718 987174 1689 -10148.718 777300 2522 -10148.718 406118 3827 -10148.718 51296 3485 -10148.718 997836 1208 -10148.718 119680 4434 -10148.718 338892 1432 -10148.718 765744 4617 -10148.718 636396 168 -10148.718 189568 3651 -10148.718 469158 1145 -10148.718 90078 4008 -10148.718 373592 4396 -10148.718 73484 4058 -10148.718 154192 3972 -10148.718 203018 3813 -10148.718 785278 1603 -10148.718 235356 2878 -10148.718 681680 3557 -10148.718 92764 2064

Loglikelihood values at local maxima, seeds, and initial stage start numbers

-10153.627 23688 4596 -10153.678 150818 1050 -10154.388 584226 4481 -10155.122 735928 916 -10155.373 309852 2802 -10155.437 925994 1386 -10155.482 370560 3292 -10155.482 662718 460 -10155.630 320864 2078 -10155.833 873488 2965 -10156.017 212934 568 -10156.231 98352 3636 -10156.339 12814 4104 -10156.497 557806 4321 -10156.644 134830 780 -10156.741 80226 3041 -10156.793 276392 2927 -10156.819 304762 4712 -10156.950 468300 4176 -10157.011 83306 2432

Page 23: Cross-sectional LCA Patterns of first response to cigarettes

Scary “warnings”

IN THE OPTIMIZATION, ONE OR MORE LOGIT THRESHOLDS APPROACHED AND WERE SET AT THE EXTREME VALUES. EXTREME VALUES ARE -15.000 AND 15.000.

THE FOLLOWING THRESHOLDS WERE SET AT THESE VALUES:

* THRESHOLD 1 OF CLASS INDICATOR TASTE FOR CLASS 3 AT ITERATION 11* THRESHOLD 1 OF CLASS INDICATOR DIZZY FOR CLASS 3 AT ITERATION 12* THRESHOLD 1 OF CLASS INDICATOR ILL FOR CLASS 3 AT ITERATION 16* THRESHOLD 1 OF CLASS INDICATOR LIKED FOR CLASS 1 AT ITERATION 34* THRESHOLD 1 OF CLASS INDICATOR TASTE FOR CLASS 1 AT ITERATION 93

WARNING: WHEN ESTIMATING A MODEL WITH MORE THAN TWO CLASSES, IT MAY BE NECESSARY TO INCREASE THE NUMBER OF RANDOM STARTS USING THE STARTS OPTION TO AVOID LOCAL MAXIMA.

Page 24: Cross-sectional LCA Patterns of first response to cigarettes

THE MODEL ESTIMATION TERMINATED NORMALLY

TESTS OF MODEL FIT

Loglikelihood H0 Value -6343.937 H0 Scaling Correction Factor 1.006 for MLR

Information Criteria Number of Free Parameters 17 Akaike (AIC) 12721.873 Bayesian (BIC) 12820.930 Sample-Size Adjusted BIC 12766.916 (n* = (n + 2) / 24)

Page 25: Cross-sectional LCA Patterns of first response to cigarettes

Chi-Square Test of Model Fit for the Binary and Ordered Categorical (Ordinal) Outcomes

Pearson Chi-Square

Value 623.040 Degrees of Freedom 14 P-Value 0.0000

Likelihood Ratio Chi-Square

Value 563.869 Degrees of Freedom 14 P-Value 0.0000

Page 26: Cross-sectional LCA Patterns of first response to cigarettes

FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED MODEL

Latent Classes 1 600.41143 0.23949 2 1517.83320 0.60544 3 388.75538 0.15507

CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP

Latent Classes 1 630 0.25130 2 1396 0.55684 3 481 0.19186

Page 27: Cross-sectional LCA Patterns of first response to cigarettes

Entropy (fuzzyness)

CLASSIFICATION QUALITY

Entropy 0.832

Average Latent Class Probabilities for Most Likely Latent Class Membership (Row) by Latent Class (Column)

1 2 3 1 0.952 0.048 0.000 2 0.000 0.979 0.021 3 0.000 0.252 0.748

Page 28: Cross-sectional LCA Patterns of first response to cigarettes

Model results

Two-Tailed Estimate S.E. Est./S.E. P-Value

Latent Class 1

Thresholds COUGH$1 1.604 0.133 12.103 0.000 ILL$1 7.371 4.945 1.490 0.136 TASTE$1 15.000 0.000 999.000 999.000 LIKED$1 -15.000 0.000 999.000 999.000 DIZZY$1 1.890 0.139 13.604 0.000

Page 29: Cross-sectional LCA Patterns of first response to cigarettes

Categorical Latent Variables

Two-Tailed Estimate S.E. Est./S.E. P-Value

Means C#1 0.435 0.124 3.500 0.000 C#2 1.362 0.135 10.058 0.000

Page 30: Cross-sectional LCA Patterns of first response to cigarettes

RESULTS IN PROBABILITY SCALE

Latent Class 1

COUGH Category 1 0.833 0.018 45.072 0.000 Category 2 0.167 0.018 9.059 0.000 ILL Category 1 0.999 0.003 321.448 0.000 Category 2 0.001 0.003 0.202 0.840 TASTE Category 1 1.000 0.000 0.000 1.000 Category 2 0.000 0.000 0.000 1.000 LIKED Category 1 0.000 0.000 0.000 1.000 Category 2 1.000 0.000 0.000 1.000 DIZZY Category 1 0.869 0.016 54.848 0.000 Category 2 0.131 0.016 8.284 0.000

Page 31: Cross-sectional LCA Patterns of first response to cigarettes

Class 1 from 3-class model

0

0.2

0.4

0.6

0.8

1

COUGH ILL TASTE LIKED DIZZY

Page 32: Cross-sectional LCA Patterns of first response to cigarettes

Conditional independence

• The latent class variable accounts for the covariance structure in your dataset

• Conditional on C, any pair of manifest variables should be uncorrelated

• Harder to achieve for a cross-sectional LCA• With a longitudinal LCA there tends to be a more

ordered pattern of correlations based on proximity in time

Page 33: Cross-sectional LCA Patterns of first response to cigarettes

Tech10 – response patterns

MODEL FIT INFORMATION FOR THE LATENT CLASS INDICATOR MODEL PART

RESPONSE PATTERNS

No. Pattern No. Pattern No. Pattern No. Pattern 1 10000 2 00100 3 00010 4 11100 5 11101 6 00001 7 10101 8 10010 9 10100 10 00101 11 10001 12 0000013 00011 14 01101 15 10011 16 0011017 11000 18 10111 19 11011 20 0110021 10110 22 01000 23 01001 24 1111125 01010 26 11001 27 01011 28 1101029 00111 30 11110

Page 34: Cross-sectional LCA Patterns of first response to cigarettes

Tech10 – Bivariate model fit

• 5 manifest variables → number of pairs =

Overall Bivariate Pearson Chi-Square 215.353Overall Bivariate Log-Likelihood Chi-Square 214.695

Compare with χ² (10 df) = 18.307

102

45)!2()!3(

!525

Page 35: Cross-sectional LCA Patterns of first response to cigarettes

Tech10 – Bivariate model fit

Not bad:- Estimated Probabilities Standardized Variable Variable H1 H0 Residual (z-score) COUGH ILL Category 1 Category 1 0.511 0.506 0.457 Category 1 Category 2 0.026 0.031 -1.321 Category 2 Category 1 0.393 0.398 -0.467 Category 2 Category 2 0.070 0.065 0.925

Bivariate Pearson Chi-Square 2.726 Bivariate Log-Likelihood Chi-Square 2.798

Page 36: Cross-sectional LCA Patterns of first response to cigarettes

Tech10 – Bivariate model fit

Terrible:- Estimated Probabilities Standardized Variable Variable H1 H0 Residual (z-score) COUGH ILL Category 1 Category 1 0.566 0.534 3.149 Category 1 Category 2 0.338 0.370 -3.255 Category 2 Category 1 0.024 0.056 -6.850 Category 2 Category 2 0.072 0.040 7.977

Bivariate Pearson Chi-Square 116.657 Bivariate Log-Likelihood Chi-Square 117.162

Page 37: Cross-sectional LCA Patterns of first response to cigarettes

Conditional Independence violated

Need more classes

Page 38: Cross-sectional LCA Patterns of first response to cigarettes

Obtain the ‘optimal’ model

Assess the following for models with increasing classes• aBIC• Entropy• BLRT (Bootstrap LRT)• Conditional Independence (Tech10)

• Ease of interpretation• Consistency with previous work / theory

Page 39: Cross-sectional LCA Patterns of first response to cigarettes

Model fit stats

1 class 2 class 3 class 4 class 5 class

Estimated params 5 11 17 23 29H0 Likelihood -6962.1 -6458.7 -6343.9 -6200.1 -6100.8aBIC 13947.4 12968.5 12766.9 12507.1 12336.5Entropy - 0.944 0.832 0.894 0.844Tech 10 625.2 228.1 214.7 135.9 17.6BLRT statistic - 1006.8 229.5 287.8 198.4BLRT p-value - < 0.0001 < 0.0001 < 0.0001 < 0.0001

Page 40: Cross-sectional LCA Patterns of first response to cigarettes

5-class model

• aBIC values are still decreasing• Tech 10 is still quite high – residual correlations

between ill and both liked and dizzy

• BLRT rejects 4-class model• Not enough df to fit 6-class model so we cannot

assess fit of 5-class• Seems unlikely as BLRT values are decreasing

slowly

Page 41: Cross-sectional LCA Patterns of first response to cigarettes

Cross-sectional LCA

Patterns of first response to cigaretteAttempt 2

Page 42: Cross-sectional LCA Patterns of first response to cigarettes

What to do?

• We need more degrees of freedom• There were only 5 questions on response to smoking

• Add something else:– How old were you when you first tried a cigarette?– Split into pre-teen / teen

• 6 binary variables means 64 d.f. to play with

Page 43: Cross-sectional LCA Patterns of first response to cigarettes

Model fit stats – attempt 2

3 class 4 class 5 class 6 class 7 class

Estimated params 20 27 34 41 48H0 Likelihood -7866.3 -7720.2 -7616.0 -7582.4 -7576.2aBIC 15825.6 15565.7 15389.9 15355.1 15375.2Entropy 0.823 0.893 0.812 0.876 0.850Tech 10 228.9 144.6 16.8 1.2 0.29BLRT statistic 123.3 146.1 104.2 67.3 12.4BLRT p-value < 0.0001 < 0.0001 < 0.0001 < 0.0001 0.2100

Page 44: Cross-sectional LCA Patterns of first response to cigarettes

Model fit stats – attempt 2

3 class 4 class 5 class 6 class 7 class

Estimated params 20 27 34 41 48H0 Likelihood -7866.3 -7720.2 -7616.0 -7582.4 -7576.2aBIC 15825.6 15565.7 15389.9 15355.1 15375.2Entropy 0.823 0.893 0.812 0.876 0.850Tech 10 228.9 144.6 16.8 1.2 0.29BLRT statistic 123.3 146.1 104.2 67.3 12.4BLRT p-value < 0.0001 < 0.0001 < 0.0001 < 0.0001 0.2100

Page 45: Cross-sectional LCA Patterns of first response to cigarettes

6-class model results

CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED MODEL

Latent classes

1 53.23894 2.1% 2 541.96140 21.7% 3 396.04196 15.9% 4 454.89294 18.2% 5 750.87470 30.1% 6 295.99007 11.9%

CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP

Latent classes

1 34 1.4% 2 540 21.7% 3 403 16.2% 4 447 17.9% 5 840 33.7% 6 229 9.2%

Page 46: Cross-sectional LCA Patterns of first response to cigarettes

Examine entropy in more detail

• Model-level entropy = 0.876

• Class level entropy:

1 2 3 4 5 6 1 0.953 0.000 0.000 0.000 0.026 0.020 2 0.000 0.997 0.000 0.000 0.002 0.001 3 0.000 0.000 0.958 0.000 0.017 0.025 4 0.000 0.000 0.000 0.949 0.041 0.011 5 0.025 0.005 0.000 0.036 0.851 0.083 6 0.000 0.000 0.043 0.003 0.036 0.918

Page 47: Cross-sectional LCA Patterns of first response to cigarettes

Pattern level entropy

• Save out the model-based probabilities• Open in another stats package• Collapse over response patterns

Page 48: Cross-sectional LCA Patterns of first response to cigarettes

Save out the model-based probabilities

savedata: file is "6-class-results.dat"; save cprobabilities;

Page 49: Cross-sectional LCA Patterns of first response to cigarettes

Varnames shown at end of outputSAVEDATA INFORMATION

Order and format of variables

COUGH F10.3 ILL F10.3 TASTE F10.3 LIKED F10.3 DIZZY F10.3 LESS_13 F10.3 ALN F10.3 QLET F10.3 SEX F10.3 CPROB1 F10.3 CPROB2 F10.3 CPROB3 F10.3 CPROB4 F10.3 CPROB5 F10.3 CPROB6 F10.3 C F10.3

Page 50: Cross-sectional LCA Patterns of first response to cigarettes

Open / process in Stata

Remove excess spaces from data file, then:

insheet using 6-class-results.dat, delim(" ")

local i = 1local varnames "COUGH ILL TASTE LIKED DIZZY LESS_13 ALN

QLET SEX CPROB1 CPROB2 CPROB3 CPROB4 CPROB5 CPROB6 C"

foreach x of local varnames {rename v`i' `x'local i=`i'+1}

gen num = 1collapse (mean) CPROB* C (count) num, by(COUGH ILL

TASTE LIKED DIZZY LESS_13)

Page 51: Cross-sectional LCA Patterns of first response to cigarettes

Check the assignment probabilities for each class

cough ill taste liked dizzy < 13 P_c1 P_c2 P_c3 P_c4 P_c5 P_c6Modclass n

1 1 1 0 0 0 0 0 0 0 0.052 0.948 6 64

1 1 1 0 1 0 0 0 0.003 0 0.001 0.996 6 34

1 1 1 0 0 1 0 0 0 0 0.027 0.973 6 30

1 0 1 0 1 0 0 0 0.135 0 0.062 0.803 6 29

1 1 1 0 1 1 0 0 0.003 0 0.001 0.996 6 25

1 0 1 0 1 1 0 0 0.154 0 0.032 0.815 6 18

1 1 0 0 0 0 0 0 0 0.071 0.054 0.874 6 6

0 1 1 0 1 1 0 0 0.073 0 0.012 0.915 6 4

1 1 0 0 1 0 0 0 0.303 0 0.001 0.696 6 4

1 1 0 0 1 1 0 0 0.329 0 0 0.671 6 4

0 1 1 0 0 1 0 0 0 0 0.411 0.589 6 3

0 1 1 0 1 0 0 0 0.065 0 0.024 0.912 6 3

1 1 0 0 0 1 0 0 0 0.055 0.029 0.917 6 2

1 1 1 1 0 1 0 0.001 0 0 0.023 0.977 6 1

1 1 1 1 1 0 0 0 0.039 0 0.001 0.96 6 1

1 1 1 1 1 1 0 0 0.044 0 0 0.955 6 1

Page 52: Cross-sectional LCA Patterns of first response to cigarettes

cough ill taste liked dizzy < 13 P_c1 P_c2 P_c3 P_c4 P_c5 P_c6Modclass n

1 1 1 0 0 0 0 0 0 0 0.052 0.948 6 64

1 1 1 0 1 0 0 0 0.003 0 0.001 0.996 6 34

1 1 1 0 0 1 0 0 0 0 0.027 0.973 6 30

1 0 1 0 1 0 0 0 0.135 0 0.062 0.803 6 29

1 1 1 0 1 1 0 0 0.003 0 0.001 0.996 6 25

1 0 1 0 1 1 0 0 0.154 0 0.032 0.815 6 18

1 1 0 0 0 0 0 0 0 0.071 0.054 0.874 6 6

0 1 1 0 1 1 0 0 0.073 0 0.012 0.915 6 4

1 1 0 0 1 0 0 0 0.303 0 0.001 0.696 6 4

1 1 0 0 1 1 0 0 0.329 0 0 0.671 6 4

0 1 1 0 0 1 0 0 0 0 0.411 0.589 6 3

0 1 1 0 1 0 0 0 0.065 0 0.024 0.912 6 3

1 1 0 0 0 1 0 0 0 0.055 0.029 0.917 6 2

1 1 1 1 0 1 0 0.001 0 0 0.023 0.977 6 1

1 1 1 1 1 0 0 0 0.039 0 0.001 0.96 6 1

1 1 1 1 1 1 0 0 0.044 0 0 0.955 6 1

Check the assignment probabilities for each class

Page 53: Cross-sectional LCA Patterns of first response to cigarettes

cough ill taste liked dizzy < 13 P_c1 P_c2 P_c3 P_c4 P_c5 P_c6Modclass n

1 1 1 0 0 0 0 0 0 0 0.052 0.948 6 64

1 1 1 0 1 0 0 0 0.003 0 0.001 0.996 6 34

1 1 1 0 0 1 0 0 0 0 0.027 0.973 6 30

1 0 1 0 1 0 0 0 0.135 0 0.062 0.803 6 29

1 1 1 0 1 1 0 0 0.003 0 0.001 0.996 6 25

1 0 1 0 1 1 0 0 0.154 0 0.032 0.815 6 18

1 1 0 0 0 0 0 0 0 0.071 0.054 0.874 6 6

0 1 1 0 1 1 0 0 0.073 0 0.012 0.915 6 4

1 1 0 0 1 0 0 0 0.303 0 0.001 0.696 6 4

1 1 0 0 1 1 0 0 0.329 0 0 0.671 6 4

0 1 1 0 0 1 0 0 0 0 0.411 0.589 6 3

0 1 1 0 1 0 0 0 0.065 0 0.024 0.912 6 3

1 1 0 0 0 1 0 0 0 0.055 0.029 0.917 6 2

1 1 1 1 0 1 0 0.001 0 0 0.023 0.977 6 1

1 1 1 1 1 0 0 0 0.039 0 0.001 0.96 6 1

1 1 1 1 1 1 0 0 0.044 0 0 0.955 6 1

Check the assignment probabilities for each class

Page 54: Cross-sectional LCA Patterns of first response to cigarettes

Bad taste (30.1%)

0

0.2

0.4

0.6

0.8

1

COUGH ILL TASTE LIKED DIZZY LESS_13

Page 55: Cross-sectional LCA Patterns of first response to cigarettes

Positive experience (21.7%)

0

0.2

0.4

0.6

0.8

1

COUGH ILL TASTE LIKED DIZZY LESS_13

Page 56: Cross-sectional LCA Patterns of first response to cigarettes

Coughed (18.2%)

0

0.2

0.4

0.6

0.8

1

COUGH ILL TASTE LIKED DIZZY LESS_13

Page 57: Cross-sectional LCA Patterns of first response to cigarettes

Dizziness (15.9%)

0

0.2

0.4

0.6

0.8

1

COUGH ILL TASTE LIKED DIZZY LESS_13

Page 58: Cross-sectional LCA Patterns of first response to cigarettes

V negative experience (11.9%)

0

0.2

0.4

0.6

0.8

1

COUGH ILL TASTE LIKED DIZZY LESS_13

Page 59: Cross-sectional LCA Patterns of first response to cigarettes

Felt ill (2.1%)

0

0.2

0.4

0.6

0.8

1

COUGH ILL TASTE LIKED DIZZY LESS_13

Page 60: Cross-sectional LCA Patterns of first response to cigarettes

Well that was a complete waste of time!

• You might think that those resulting classes could have been derived just looking at the response patterns and making some arbitrary decisions e.g.– Group all of those who had >1 negative experience– Keep separate each group who had 1 experience

• You would have ended up with a bunch of weird patterns with no clue of what to do with them

• Strange patterns likely to be measurement error?• LCA incorporates ALL patterns and deals with uncertainty

through the posterior probabilities

Page 61: Cross-sectional LCA Patterns of first response to cigarettes

Conclusions / warning

• Like EFA, LCA is an exploratory tool with the aim of summarising the variability in the dataset in a simple/interpretable way

• These results do not prove that there are 6 groups of young people in real life.

• LCA will find groupings in the data even if there is no reason to think such groups might exist. It’s just mathematics and it knows no better

Page 62: Cross-sectional LCA Patterns of first response to cigarettes

Remember, we are dealing with probabilities

Model-based “Modal assignment”

Ill 53.24 2.1% 34 1.4% Positive 541.96 21.7% 540 21.7% Dizzy 396.04 15.9% 403 16.2% Coughed 454.89 18.2% 447 17.9% Bad taste 750.87 30.1% 840 33.7% V negative 295.99 11.9% 229 9.2%

• Working with modal assignment is easy – chuck each pattern into it’s most likely class and pretend everything is OK– Equivalent to doing a single imputation for missing data – shudder!

• Unless entropy is V high, stick with the probabilities

Page 63: Cross-sectional LCA Patterns of first response to cigarettes

Covariates and outcomes

Page 64: Cross-sectional LCA Patterns of first response to cigarettes

Merging the classes with other data

• In the “olden days”, you could pass your ID variable through Mplus so when you saved your class probabilities you could merge this with other data.

• Now you can pass other data through Mplus as well – hurrah!

Variable: <snip> auxiliary are ID sex;

Page 65: Cross-sectional LCA Patterns of first response to cigarettes

Reshaping the dataset

• To account for the uncertainty in our class variable we will need to weight by the posterior probabilities obtained from Mplus

• Weighted model requires a reshaping of the dataset so that each respondent has n-rows (for an n-class model) rather than just 1

Page 66: Cross-sectional LCA Patterns of first response to cigarettes

Pre-shaped – first 20 kids| ID sex dev_18 dev_42 pclass1 pclass2 pclass3 pclass4 pclass5 modclass ||--------------------------------------------------------------------------------------------------|| 30004 male 3 . .001 0 .803 0 .197 3 || 30008 male 2 1 .908 0 0 .007 .085 1 || 30010 male 2 2 .053 .001 .052 0 .894 5 || 30023 male 1 3 .115 0 .596 .001 .288 3 || 30031 male 3 4 0 0 .983 0 .016 3 ||--------------------------------------------------------------------------------------------------|| 30033 male 4 4 .392 0 .397 0 .211 3 || 30042 male 1 3 0 0 .983 0 .016 3 || 30050 male 3 2 0 0 .983 0 .016 3 || 30051 male 2 2 0 0 0 1 0 4 || 30057 male 1 3 .135 0 .002 0 .864 5 ||--------------------------------------------------------------------------------------------------|| 30058 male 1 4 0 0 .958 0 .041 3 || 30064 male 2 4 0 0 .983 0 .016 3 || 30068 male 4 3 .001 0 .803 0 .197 3 || 30070 male 3 4 0 0 .983 0 .016 3 || 30072 male 1 1 0 0 .983 0 .016 3 ||--------------------------------------------------------------------------------------------------|| 30075 male 3 3 0 0 .982 0 .018 3 || 30088 male 3 4 .03 .002 .889 .003 .076 3 || 30095 male 3 . 0 0 .983 0 .016 3 || 30098 male 3 . .068 .158 .173 .018 .583 5 || 30104 male 4 1 .008 0 .775 0 .217 3 |+--------------------------------------------------------------------------------------------------+

Page 67: Cross-sectional LCA Patterns of first response to cigarettes

Pre-shaped – first 20 kids| ID sex dev_18 dev_42 pclass1 pclass2 pclass3 pclass4 pclass5 modclass ||--------------------------------------------------------------------------------------------------|| 30004 male 3 . .001 0 .803 0 .197 3 || 30008 male 2 1 .908 0 0 .007 .085 1 || 30010 male 2 2 .053 .001 .052 0 .894 5 || 30023 male 1 3 .115 0 .596 .001 .288 3 || 30031 male 3 4 0 0 .983 0 .016 3 ||--------------------------------------------------------------------------------------------------|| 30033 male 4 4 .392 0 .397 0 .211 3 || 30042 male 1 3 0 0 .983 0 .016 3 || 30050 male 3 2 0 0 .983 0 .016 3 || 30051 male 2 2 0 0 0 1 0 4 || 30057 male 1 3 .135 0 .002 0 .864 5 ||--------------------------------------------------------------------------------------------------|| 30058 male 1 4 0 0 .958 0 .041 3 || 30064 male 2 4 0 0 .983 0 .016 3 || 30068 male 4 3 .001 0 .803 0 .197 3 || 30070 male 3 4 0 0 .983 0 .016 3 || 30072 male 1 1 0 0 .983 0 .016 3 ||--------------------------------------------------------------------------------------------------|| 30075 male 3 3 0 0 .982 0 .018 3 || 30088 male 3 4 .03 .002 .889 .003 .076 3 || 30095 male 3 . 0 0 .983 0 .016 3 || 30098 male 3 . .068 .158 .173 .018 .583 5 || 30104 male 4 1 .008 0 .775 0 .217 3 |+--------------------------------------------------------------------------------------------------+

covariates Posterior probs Modal class

Page 68: Cross-sectional LCA Patterns of first response to cigarettes

The reshaping

. reshape long pclass, i(id) j(class)

(note: j = 1 2 3 4 5)

Data wide -> long---------------------------------------------------------Number of obs. 5584 -> 27920Number of variables 66 -> 63j variable (5 values) -> classxij variables: pclass1 pclass2 ... pclass5 -> pclass---------------------------------------------------------

Page 69: Cross-sectional LCA Patterns of first response to cigarettes

Re-shaped – first 3 kids +--------------------------------------------------+ | id sex dev_18 dev_42 pclass class | |--------------------------------------------------| 1. | 30004 male 3 . .001 1 | 2. | 30004 male 3 . 0 2 | 3. | 30004 male 3 . .803 3 | 4. | 30004 male 3 . 0 4 | 5. | 30004 male 3 . .197 5 | |--------------------------------------------------| 6. | 30008 male 2 1 .908 1 | 7. | 30008 male 2 1 0 2 | 8. | 30008 male 2 1 0 3 | 9. | 30008 male 2 1 .007 4 | 10. | 30008 male 2 1 .085 5 | |--------------------------------------------------| 11. | 30010 male 2 2 .053 1 | 12. | 30010 male 2 2 .001 2 | 13. | 30010 male 2 2 .052 3 | 14. | 30010 male 2 2 0 4 | 15. | 30010 male 2 2 .894 5 | +--------------------------------------------------+

First kid

Third kid

Second kid

Sum = 1Constant within child

Page 70: Cross-sectional LCA Patterns of first response to cigarettes

Similar with our data:. list id SEX CPROB class C in 1/12

+---------------------------------+ | id SEX CPROB class C | |---------------------------------| 1. | 30012 2 0 1 4 | 2. | 30012 2 0 2 4 | 3. | 30012 2 0 3 4 | 4. | 30012 2 .945 4 4 | 5. | 30012 2 .045 5 4 | 6. | 30012 2 .01 6 4 | |---------------------------------| 7. | 30024 2 0 1 5 | 8. | 30024 2 0 2 5 | 9. | 30024 2 0 3 5 | 10. | 30024 2 0 4 5 | 11. | 30024 2 .991 5 5 | 12. | 30024 2 .009 6 5 | |---------------------------------|

First respondent

Second respondent

Page 71: Cross-sectional LCA Patterns of first response to cigarettes

Simple crosstab. tab class SEX , row nofreq

| SEX class | 1 2 | Total-----------+----------------------+---------- Ill | 40.87 59.13 | 100.00 Positive | 40.87 59.13 | 100.00 Dizzy | 40.87 59.13 | 100.00 Coughed | 40.87 59.13 | 100.00 Bad taste | 40.87 59.13 | 100.00 V negative | 40.87 59.13 | 100.00 -----------+----------------------+---------- Total | 40.87 59.13 | 100.00

• Oops!

Page 72: Cross-sectional LCA Patterns of first response to cigarettes

Simple crosstab – take 2

. tab class SEX [iw = CPROB], row nofreq

| SEX class | Male Female | Total-----------+-------------------+------- Ill | 52.9% 47.1% | 100% Positive | 32.9% 67.1% | 100% Dizzy | 43.2% 56.8% | 100% Coughed | 40.8% 59.2% | 100% Bad taste | 45.2% 54.8% | 100% V negative | 39.3% 60.7% | 100% -----------+-------------------+------- Total | 40.9% 59.1% | 100%

Page 73: Cross-sectional LCA Patterns of first response to cigarettes

Compare with modal class assignment

. tab C SEX if (class==1), row nofreq

| SEX C | Male Female |-----------+-----------------+ Ill | 50.0% 50.0% | Positive | 33.0% 67.0% | Dizzy | 43.4% 56.6% | Coughed | 40.7% 59.3% | Bad taste | 45.4% 54.6% | V negative | 37.6% 62.4% | -----------+-----------------+ Total | 40.9% 59.1% |

. tab class SEX [iw = CPROB], row nofreq

| SEX class | Male Female |-----------+-----------------+ Ill | 52.9% 47.1% | Positive | 32.9% 67.1% | Dizzy | 43.2% 56.8% | Coughed | 40.8% 59.2% | Bad taste | 45.2% 54.8% | V negative | 39.3% 60.7% | -----------+-----------------+ Total | 40.9% 59.1% |

Page 74: Cross-sectional LCA Patterns of first response to cigarettes

Multinomial logistic. xi: mlogit class i.SEX [iw = CPROB], rrr

Multinomial logistic regression Number of obs = 2493 LR chi2(5) = 24.52 Prob > chi2 = 0.0002Log likelihood = -4053.3746 Pseudo R2 = 0.0030------------------------------------------------------------------------------ class | RRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------Ill | _ISEX_2 | .7322787 .2081189 -1.10 0.273 .4195259 1.278186-------------+----------------------------------------------------------------Positive | _ISEX_2 | 1.677364 .1965463 4.41 0.000 1.333175 2.110413-------------+----------------------------------------------------------------Dizzy | _ISEX_2 | 1.082775 .1355213 0.64 0.525 .8472297 1.383807-------------+----------------------------------------------------------------Coughed | _ISEX_2 | 1.194885 .1437877 1.48 0.139 .9438344 1.512712-------------+----------------------------------------------------------------V negative | _ISEX_2 | 1.274734 .1782148 1.74 0.083 .9692081 1.676572------------------------------------------------------------------------------(class==Bad taste is the base outcome)

Page 75: Cross-sectional LCA Patterns of first response to cigarettes

Class predicts binary outcome

. Outcome = weekly smoker at age of 15

char class[omit] 5. xi: logistic sm1100 i.class [iw = CPROB]

Logistic regression Number of obs = 2493 LR chi2(5) = 229.03 Prob > chi2 = 0.0000Log likelihood = -1168.697 Pseudo R2 = 0.0892

------------------------------------------------------------------------------ sm1100 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- Ill | 2.132652 .9125838 1.77 0.077 .9218961 4.933531 Positive | 7.190203 1.231216 11.52 0.000 5.140265 10.05766 Dizzy | 7.899915 1.413907 11.55 0.000 5.562583 11.21937 Coughed | 3.686492 .6831946 7.04 0.000 2.563689 5.301041 V negative | 2.243034 .497619 3.64 0.000 1.452099 3.46478------------------------------------------------------------------------------

Page 76: Cross-sectional LCA Patterns of first response to cigarettes

Compare with modal class. Posterior probabilities------------------------------------------------------------------------------ sm1100 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- Ill | 2.132652 .9125838 1.77 0.077 .9218961 4.933531 Positive | 7.190203 1.231216 11.52 0.000 5.140265 10.05766 Dizzy | 7.899915 1.413907 11.55 0.000 5.562583 11.21937 Coughed | 3.686492 .6831946 7.04 0.000 2.563689 5.301041 V negative | 2.243034 .497619 3.64 0.000 1.452099 3.46478------------------------------------------------------------------------------

Modal assignment------------------------------------------------------------------------------ sm1100 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- Ill | 2.560182 1.291868 1.86 0.062 .9522577 6.88315 Positive | 7.802047 1.313428 12.20 0.000 5.609367 10.85184 Dizzy | 8.3454 1.467249 12.07 0.000 5.912796 11.77881 Coughed | 4.224301 .7686958 7.92 0.000 2.957071 6.034592 V negative | 2.861537 .6548723 4.59 0.000 1.827254 4.481255------------------------------------------------------------------------------

Page 77: Cross-sectional LCA Patterns of first response to cigarettes

Conclusions

• Young people at 15yrs can report a variety of responses to their first cigarette

• Certain responses are associated with current regular smoking behaviour

• 15 year-old girls are more likely to retrospectively report a positive experience

• Recall bias is likely to play a part in these associations

Page 78: Cross-sectional LCA Patterns of first response to cigarettes

Conclusions

• LCA is an exploratory tool which can be used to simplify a set of binary responses

• Extension to ordinal responses is straight-forward• The use of ordinal data is an alternative way to boost

degrees of freedom

• Resulting probabilities can be used model latent class variable as a risk factor or outcome

• A modal class variable should be used with caution