Upload
posy-lawrence
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Mining and Visualization of Flow Cytometry DataANGELA CHIN
UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES
JULY 3, 2013
1
Contents1. Introduction to Flow Cytometry2. The Problem3. Current Approaches & Results4. Future Work
2
Flow CytometryMEDICAL TECHNIQUE USED FOR CELL COUNTING AND CELL SORTING
3
How it Works
Picture from: Abcam http://www.abcam.com/index.html?pageconfig=resource&rid=11446 4
Flow Cytometry ApplicationDetermine whether a person has b-cell lymphomaBased on the number of clusters that result from flow cytometry• Two clusters : cancer patient
• Three clusters : healthy individual
5
Example: Flow Cytometry Results
6
Cancer PatientHealthy Patient
Problems with Current MethodsThe process for determining if there are two or three clusters is manualDoctors’ time could be better spent on other tasks
7
The ProblemCREATING AN AUTOMATED METHOD TO DETERMINING THE NUMBER OF CLUSTERS
8
Past ApproachesMany ways to determine number of clusters• Most need to know the number of clusters ahead of time
Most popular is k-means, but there are some problems• Need to give the algorithm the number of clusters beforehand
• Has difficulty when clusters are close, different sizes, etc.
9
Further Defining the ProblemWe want to be able to determine the number of clusters when:The distance between clusters is very smallThe ratio of cluster sizes is large (100:1 to 1000:1)
We decided to further constrain the problem such that we could determine:1 cluster vs 2 clusters when the size ratio was up to 1000:1
10
Current Approaches & Results
11
Two Approaches Approach #1: TransformationFind the center of the dataTake each point and find its angle from the horizontal line located at the center (new x-value) and distance from the center (new y-value)Use transformed data to determine number of clusters
Approach #2: Testing Normal FitProject 2D data onto line to create 1D dataApply normal distribution fitCompare the Bayesian Information Criterion (BIC) of the fit to a cut-off limitIf the BIC is above the limit, there are two clusters; otherwise, there is one
12
Approach #1: Transformation
13
Approach #1: Transformation
14
𝜋/2 3/2 2𝜋
Approach #1: Transformation Process
15
𝜋/2 3/2 2𝜋
/2
Approach #1: Transformation
16
𝜋/2 3/2 2𝜋
Approach #2: Testing Normal Fit
17
Approach #2: Testing Normal Fit3 standard deviations apart, ratio 1:99
ONE CLUSTER BEST FITS TWO CLUSTER BEST FITS
18
Approach #2: Testing Normal Fit Comparing BIC of the one
cluster versus two clusters
All data was generated using 100000 points and the same standard deviations
The ratios between clusters and distance between two clusters (if applicable) was varied• Ratios: 199:1 to 63:1• Distance: 1.5 to 5 Standard
Deviations apart
19
Approach #2: Testing Normal Fit
20
Comparing BIC of the one cluster versus two clusters
All data was generated using 100000 points and the same standard deviations
The ratios between clusters and distance between two clusters (if applicable) was varied• Ratios: 199:1 to 63:1• Distance: 1.5 to 5 Standard
Deviations apart
Future Work
21
Future WorkApproach #1:
Determine if there is a way to detect the second cluster in the transformation
Approach #2: Use real data to see if a cut-off can be determined
Overall: After figuring out how to distinguish one and two clusters, extend the method to two versus three clusters
22
LimitationsAssume the data will have Gaussian distributionNumber of clusters limited to two or three
23
AcknowledgementsI would like to thank my research advisor, Dr. Stephen Huang, and Mitch Shih for their guidance on this project. I would also like to thank the University of Houston Computer Science Department and the National Science Foundation for providing me with the opportunity to participate in the REU.
24