Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1

Mining and Visualization of Flow Cytometry DataANGELA CHIN

UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES

JULY 3, 2013

1

Contents1. Introduction to Flow Cytometry2. The Problem3. Current Approaches & Results4. Future Work

2

Flow CytometryMEDICAL TECHNIQUE USED FOR CELL COUNTING AND CELL SORTING

3

How it Works

Picture from: Abcam http://www.abcam.com/index.html?pageconfig=resource&rid=11446 4

Flow Cytometry ApplicationDetermine whether a person has b-cell lymphomaBased on the number of clusters that result from flow cytometry• Two clusters : cancer patient

• Three clusters : healthy individual

5

Example: Flow Cytometry Results

6

Cancer PatientHealthy Patient

Problems with Current MethodsThe process for determining if there are two or three clusters is manualDoctors’ time could be better spent on other tasks

7

The ProblemCREATING AN AUTOMATED METHOD TO DETERMINING THE NUMBER OF CLUSTERS

8

Past ApproachesMany ways to determine number of clusters• Most need to know the number of clusters ahead of time

Most popular is k-means, but there are some problems• Need to give the algorithm the number of clusters beforehand

• Has difficulty when clusters are close, different sizes, etc.

9

Further Defining the ProblemWe want to be able to determine the number of clusters when:The distance between clusters is very smallThe ratio of cluster sizes is large (100:1 to 1000:1)

We decided to further constrain the problem such that we could determine:1 cluster vs 2 clusters when the size ratio was up to 1000:1

10

Current Approaches & Results

11

Two Approaches Approach #1: TransformationFind the center of the dataTake each point and find its angle from the horizontal line located at the center (new x-value) and distance from the center (new y-value)Use transformed data to determine number of clusters

Approach #2: Testing Normal FitProject 2D data onto line to create 1D dataApply normal distribution fitCompare the Bayesian Information Criterion (BIC) of the fit to a cut-off limitIf the BIC is above the limit, there are two clusters; otherwise, there is one

12

Approach #1: Transformation

13


14

𝜋/2 3/2 2𝜋

[email protected]

Add an slide to show how the transformation takes place-- "transformation process"

[email protected]

Use unit circle example

[email protected]

Make the angles in pi instead of numbers

[email protected]

Use ratio 500:1 - 200:1

Approach #1: Transformation Process

15

𝜋/2 3/2 2𝜋

/2


16

𝜋/2 3/2 2𝜋

Approach #2: Testing Normal Fit

17

Approach #2: Testing Normal Fit3 standard deviations apart, ratio 1:99

ONE CLUSTER BEST FITS TWO CLUSTER BEST FITS

18

Approach #2: Testing Normal Fit Comparing BIC of the one

cluster versus two clusters

All data was generated using 100000 points and the same standard deviations

The ratios between clusters and distance between two clusters (if applicable) was varied• Ratios: 199:1 to 63:1• Distance: 1.5 to 5 Standard

Deviations apart

19

[email protected]

Use log scale instead?

Approach #2: Testing Normal Fit

20

Comparing BIC of the one cluster versus two clusters

All data was generated using 100000 points and the same standard deviations

The ratios between clusters and distance between two clusters (if applicable) was varied• Ratios: 199:1 to 63:1• Distance: 1.5 to 5 Standard

Deviations apart

Future Work

21

Future WorkApproach #1:

Determine if there is a way to detect the second cluster in the transformation

Approach #2: Use real data to see if a cut-off can be determined

Overall: After figuring out how to distinguish one and two clusters, extend the method to two versus three clusters

22

LimitationsAssume the data will have Gaussian distributionNumber of clusters limited to two or three

23

AcknowledgementsI would like to thank my research advisor, Dr. Stephen Huang, and Mitch Shih for their guidance on this project. I would also like to thank the University of Houston Computer Science Department and the National Science Foundation for providing me with the opportunity to participate in the REU.

24

Documents

Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1