Upload
dayna-daniels
View
218
Download
2
Embed Size (px)
Citation preview
Weighted Chinese Restaurant Process
for clustering barcodes
Javier Cabrera John Lau Albert Lo
DIMACS, Bristol U, and HKUST
Cluster Analysis:
Group the observations into k distinct natural groups.
Non Bayesian Cluster Analysis:
Hierarchical clustering: Build a hierarchical tree
- SIMILARITY: Inter point distance: Euclidean, Manhattan… - Inter cluster distance: Single Linkage, Complete, Average,
Ward- Build a hierarchical tree
Non Hierarchical clustering: - K-means- Divisive- PAM- Model Based- Many Other Methods
Speci
men
1
Speci
men
2
Speci
men
3
Speci
men
4
Speci
men
5
Speci
men
6
Speci
men
7
Hierarchical Clustering1
2
34
7
65
1
2
34
7
65
Weighted Chinese Restaurant Process
1. The Restaurant is full of tables.
2. Customers are sited on tables by a sitting rule.
3. Customers are allowed to move from one table to another or to a new empty one.
Partition: Each sitting arrangement for all the customers in the restaurant.
Partitions:p : Partition of specimens into species.p P : {Space of all possible partitions. All arrangements of specimens into species}
Bayes basics:Prior Distribution: π(p)
Likelihood: f(x|p) = 1in(p) k(xj, jCi).
Posterior: π(p|data) f(x|p) π(p)
1
2
34
7
65
Weighted Chinese Restaurant ProcessApproximate Posterior
distribution with WCRP
• Run the process for a while and obtain frequency table of partitions visited.
• Estimate final partition with posterior mode.
• Compare posterior probabilities of most probable partitions.
New Specimens: - Placed in one existing
table.- Open a new table=>New
Species
Future WorkWCRP Algorithm for
Barcode data:
Data Visualization:Final partition => similarities => Euclidean Representation - Multidimensional Scaling- Multivariate Data Visualization (used in taxonomy) - Projection Pursuit
Entropy scanningLo (1984), Ishwaran and James (2003b), Cabrera, Lau, Lo (2006)
Javier Cabrera [email protected] John Lau [email protected] Lo [email protected]