Lecture 15: Hierarchical Latent Class Models Based ON N. L. Zhang (2002). Hierarchical latent class models for cluster analysis. Journal of Machine Learning

Lecture 15: Hierarchical Latent Class Models

Based ON

N. L. Zhang (2002). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research, to appear.

COMP 538

Introduction of Bayesian networks

2

Outline Motivation

Application of LCA in medicine Model-based clustering and TCM diagnosis Need of more general models

Theoretical Issues Learning Algorithm Empirical Results Related work

3

Motivations/LCA in Medicine

In medical diagnosis, sometimes gold standard exists

Example: Lung Cancer Symptoms:

Persistent cough, Hemoptysis (Coughing up blood), Constant chest pain, Shortness of breath, Fatigue, etc

Information for diagnosis: symptoms, medical history, smoking history, X-ray, sputum.

Gold standard: Biopsy: the removal of a small sample of tissue for

examination under a microscope by a pathologist

4

Sometimes gold standard does not exist Example: Rheumatoid Arthritis (RA)

Symptoms: Back Pain, Neck Pain, Joint Pain, Joint Swelling, Morning Joint Stiffness, etc

Information for diagnosis: Symptoms, medical history, physical exam, Lab tests including a test for rheumatoid factor. (Rheumatoid factor is an antibody found in the blood of about

80 percent of adults with RA. ) No gold standard:

None of the symptoms or their combinations are not clear-cut indicators of RA

The presence or absence of rheumatoid factor does not indicate that one has RA.


5

Questions: How many diagnostic categories there should be? What rules to use when making diagnosis?

Note: These questions cannot be answered using regression (supervised learning) because

The true “disease type” is never directly observed. It is latent.

Ideas: Each “disease type” must correspond to a cluster of people. People in different clusters demonstrate different symptom

patterns (otherwise diagnosis is hopeless) Possible solution: Perform cluster analysis of symptom data to

reveal patterns.


6

Latent class analysis (LCA) Cluster analysis based on the

latent class (LC) model Observed variables Y_j: symptoms Latent variable X: “disease type” Assumption:

Y_j’s independent of each other given X

Given: Data on Y_j Determine:

Number of states for X Prevalence: P(X) Class specific probability P(Y_j|X)

X

Y1 Y2 Yp


7

LC Analysis of Hannover Rheumatoid Arthritis Data

Class specific probabilities Cluster 1: “disease” free

Cluster 2: “back-pain type”

Cluster 3: “Joint type”

Cluster 4: “Severe type”


8

Diagnosis in traditional Chinese Medicine (TCM) Example: deficiency of kidney(肾虚 ),

Symptoms: lassitude in the loins (腰酸软而痛 ), tinnitus(耳鸣 ), driping urine (小便余沥不尽 ), etc

Similar to Rheumatoid Arthritis Diagnosis based on symptoms No gold standards exist

Model-Based Clustering and TCM diagnosis

9

Current status Researcher have been searching for laboratory indices

that can serve as gold standards. All such effort failed. In practice, quite subjective. Differ considerably between

doctors. Hindering practices and preventing international

recognition.


10

How to lay TCM diagnosis on a scientific foundation?

Model-based cluster analysis

Statistical methods might be the answer: TCM diagnosis based on experiences (by contemporary

practitioners and ancient doctors) Experiences are summaries of patient cases. Summarizing patient cases by humans braining leads to

subjectivity. Summarizing patient cases by computer avoids

subjectivity.


11

Need of more general Models

Preliminary analysis of TCM data using LCA: Could not find models that fit data well

Reason: latent class (LC) models are too simplistic Local independence: Observed variables mutually independent

given the latent variable

Need: more realistic models

12

Hierarchical latent class (HLC) models: Tree structured Bayesian networks,

where Leaf nodes are observed and others are

not Manifest variables = observed variables

Maybe still too simplistic, but a good first step

More general than LC models Nice computational properties

Task: Learn HLC models from data Learn latent structures from what we

can observe.

Need of more general Models

13

Theoretical Issues What latent structures can be learned from data?

An HLC model M is parsimonious if there does NOT exist another model M' that

Is marginally equivalent to M, and P(manifest vars|M) = P(manifest vars|M’) Has fewer independent parameters than M.

Occam’s razor prefers parsimonious models over non-parsimonious ones

14

Theoretical Issues

Regular HLC models HLC model is regular if for any latent node Z with neighbors X1,

X2, …, Xk

where strict inequality hold when there are only two neighbors

Irregular models are not parsimonious. (Operational characterization of parsimony)

The set of all possible regular HLC models for a given set of manifest variables is finite. (Finite search space for learning algorithm.)

15

Theoretical Issues Model Equivalence

Root walking M1: root walks to X2 M2: root walks to X3

Root walking leads to equivalent models

16

Theoretical Issues Unrooted HLC models

The root of an HLC model can walk to any latent node.

Unrooted model: HLC models with undirected edges.

We can only learn unrooted models.

Question: which latent node should be the class node?

Answer: Any, depending on semantics and purpose of clustering. Learn one model for multiple clustering.

17

Theoretical Issues Measure of model complexity

When no latent variables: number of free parameters (standard dimension)

When latent variables: effective dimension instead P(Y1, Y2, …, Yn) spans 2n –1 dimensional space S if no

constraints. HLC model imposes some constraints on the joint It spans a subspace of S Effective dimension of model: dimension of S. HARD to

compute.

18

Theoretical Issues Reduction Theorem for regular HLC models (Kocka

and Zhang 2002): D(M) = D(M1) + D(M2) – number of common parameters

Problem reduces to: effective dimension of LC models. Good approximation exists.

19

Theoretical Issues Example

Standard dimension: 110 Effective dimension: 61

20

Learning HLC Models

Given: i.i.d. samples generated by some regular HLC model. Task: Reconstruct the HLC model from data.

Hill-climbing algorithm Scoring metric: We experiment with

AIC,BIC, CS, Holdout LS (yet to run experiments with effective dimension)

Search space: Set of all possible regular HLC models for the given manifest

variables.

We structure the space into two levels according to two subtasks Given a model structure, estimate cardinalities of latent

variables. Find a optimal model structure.

21

Learning HLC Models

Estimate cardinalities of latent variables given model structure

Search space: All regular models with the given model structure.

Hill-climbing: Start: All latent variables have minimum cardinality (usually

2) Search operator: Increate the cardinality of one latent

variable by one

22

Learning HLC Models

Find optimal model structures

Search space: Set of all regular unrooted HLC model structures for the given manifest variables.

Hill-Climbing: Start: unrooted LC model structure

Search operators: Node introduction, Node elimination, Neighbor relocation Can go between any two model structures using those

operators.

23

Learning HLC Models

Motivations for search Operators: Node introduction: M1’ M2’. Deal with local dependence.

Opposite: Node elimination. Neighbor relocation: M2’ M3’. Result of tradeoff.

Opposite. Itself. Not allowed to yield irregular model structures.

24

Empirical Results Synthetic data:

Generative model, randomly parameterized All latent variables have 3 states. Sample sizes: 5k, 10k, 50k, 100k

Log scores on testing data Close to that of generative model Do not vary much across scoring metrics.

25

Empirical Results Learned structures: Numbers of steps to true structure

26

Empirical Results Cardinality of Latent variables

Better results with more

skewed parameters

27

Empirical Results Hannover Rheumatoid Arthritis data:

5 binary manifest variables: back pain, neck pain, joint swelling, …

7,162 records Analysis by Kohlmann and Formann (1997): 4 class LC model. Our algorithm: exactly the same model.

Coleman data 4 binary manifest variables, 3,398 records. Analysis by Goodman (1974) and Hagenaars (1988): M1, M2 Our algorithm: M3

28

Empirical Results HIV data

4 binary manifest variables, 428 records Analysis by Uebesax (2000): Our algorithm:

House Building data 4 binary manifest variable, 1185 Records Analysis by Hagenaars (1988): M2, M3, M4

Our algorithm: 4 class LC model, fits data poorly. A failure. Reason: limitation of HLC models

29

Related Work Phylogenetic trees:

Represent relationship between a set of species.

Probabilistic model: Taxa aligned, sites evolves i.i.d Conditional probs: character evolution

model. Parameters: edge lengths, representing time.

Restricted to one site, a phylogenetic tree is a HLC model where

Binary tree structure, same state space for all vars.

The conditional probabilities are parameterized by edge lengths

The model is the same for different sites

AGGGCAT

TAGCCCA

TAGACTT

AGCACAA

AGCGCTT

AAGACTT

AGCACTT

AAGGCCT

30

Related Work Tree reconstruction:

Given: current taxa. Find: tree topology and edge lengths. Methods

Hill-climbing Stepwise addition of sites Star decomposition, similar to node introduction in HLC models. Branch swapping, similar to neighbor relocation in HLC models

Structural EM (Friedman et al 2002): Use fact: All vars have same state space

Neighbor joining (Saitou & Nei, 1987): Use facts: parameters = edge lengths, additivity.

31

Related Work Connolly (1993):

Heuristic method for constructing HLC models Mutual information used to group variables One latent variable introduced for each group. Cardinalities of latent variables determined using conceptual

clustering Martin and VanLehn (1994):

Heuristic method for learning two-level Bayesian network where the top level is latent.

Elidan et al. (2001): Learning latent variables for general Bayesian networks. Aim: Simplification. Idea: Structural signature.

Model-based hierarchical clustering (Hansen et al. 1991): Hierarchical the state space for ONE cluster variable.

32

Related Work Diagnostics for local dependence in LC models:

Hagenaars (1988): Standardized residual

Espeland & Handelmann (1988) Likelihood ratio statistic

Garret & Zeger (2000) Log odds ratio

Modeling local dependence in LC models Joint variable (M2), multiple indicator (M3), loglinear model (M4)

Documents

Lecture 15: Hierarchical Latent Class Models Based ON N. L. Zhang (2002). Hierarchical latent class models for cluster analysis. Journal of Machine Learning