7
Weighted Chinese Restaurant Process for clustering barcodes Javier Cabrera John Lau Albert Lo DIMACS, Bristol U, and HKUST

Weighted Chinese Restaurant Process for clustering barcodes Javier Cabrera John Lau Albert Lo DIMACS, Bristol U, and HKUST

Embed Size (px)

Citation preview

Page 1: Weighted Chinese Restaurant Process for clustering barcodes Javier Cabrera John Lau Albert Lo DIMACS, Bristol U, and HKUST

Weighted Chinese Restaurant Process

for clustering barcodes

Javier Cabrera John Lau Albert Lo

DIMACS, Bristol U, and HKUST

Page 2: Weighted Chinese Restaurant Process for clustering barcodes Javier Cabrera John Lau Albert Lo DIMACS, Bristol U, and HKUST

Cluster Analysis:

Group the observations into k distinct natural groups.

Non Bayesian Cluster Analysis:

Hierarchical clustering:  Build a hierarchical tree

- SIMILARITY: Inter point distance: Euclidean, Manhattan… - Inter cluster distance: Single Linkage, Complete, Average,

Ward- Build a hierarchical tree

Non Hierarchical clustering: - K-means- Divisive- PAM- Model Based- Many Other Methods

Page 3: Weighted Chinese Restaurant Process for clustering barcodes Javier Cabrera John Lau Albert Lo DIMACS, Bristol U, and HKUST

Speci

men

1

Speci

men

2

Speci

men

3

Speci

men

4

Speci

men

5

Speci

men

6

Speci

men

7

Hierarchical Clustering1

2

34

7

65

Page 4: Weighted Chinese Restaurant Process for clustering barcodes Javier Cabrera John Lau Albert Lo DIMACS, Bristol U, and HKUST

1

2

34

7

65

Weighted Chinese Restaurant Process

1. The Restaurant is full of tables.

2. Customers are sited on tables by a sitting rule.

3. Customers are allowed to move from one table to another or to a new empty one.

Partition: Each sitting arrangement for all the customers in the restaurant.

Page 5: Weighted Chinese Restaurant Process for clustering barcodes Javier Cabrera John Lau Albert Lo DIMACS, Bristol U, and HKUST

Partitions:p : Partition of specimens into species.p P : {Space of all possible partitions. All arrangements of specimens into species}

Bayes basics:Prior Distribution: π(p)

Likelihood: f(x|p) = 1in(p) k(xj, jCi).

Posterior: π(p|data) f(x|p) π(p)

Page 6: Weighted Chinese Restaurant Process for clustering barcodes Javier Cabrera John Lau Albert Lo DIMACS, Bristol U, and HKUST

1

2

34

7

65

Weighted Chinese Restaurant ProcessApproximate Posterior

distribution with WCRP

• Run the process for a while and obtain frequency table of partitions visited.

• Estimate final partition with posterior mode.

• Compare posterior probabilities of most probable partitions.

New Specimens: - Placed in one existing

table.- Open a new table=>New

Species

Page 7: Weighted Chinese Restaurant Process for clustering barcodes Javier Cabrera John Lau Albert Lo DIMACS, Bristol U, and HKUST

Future WorkWCRP Algorithm for

Barcode data:

Data Visualization:Final partition => similarities => Euclidean Representation - Multidimensional Scaling- Multivariate Data Visualization (used in taxonomy) - Projection Pursuit

Entropy scanningLo (1984), Ishwaran and James (2003b), Cabrera, Lau, Lo (2006)

Javier Cabrera [email protected] John Lau [email protected] Lo [email protected]