16
Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University, Burnaby Brinkman’s Lab, Terry Fox Laboratory, BC Cancer Agency, Vancouver

Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Embed Size (px)

Citation preview

Page 1: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Faithful Sampling for Spectral Clustering to Analyze High

Throughput Flow Cytometry Data

Parisa Shooshtari

School of Computing Science, Simon Fraser University, Burnaby

Brinkman’s Lab, Terry Fox Laboratory, BC Cancer Agency, Vancouver

Page 2: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Outline:

• Flow Cytometry (FCM) Data• Clustering of FCM data• Spectral Clustering• Faithful Sampling for Spectral Clustering• Result• Summary

Page 3: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Basics of Flow Cytometry Technique

3

Sample

Wave Length

Wave Length

Inte

nsity

Inte

nsity

MHC-II

MHC-II

MHC-II

MHC-II

CD-11c

CD-11c

Int-1

Int-2

CD-11c

MHC-IIInt-1Int-2

Page 4: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Cell Population Identification in Flow Cytometry (FCM)

X%

Adapted from the Science Creative Quarterly (2)

Para

met

er 3

Parameter 4Pa

ram

eter

2

Parameter 1

Page 5: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Importance of FCM Data Clustering

• Manual Gating is– Subjective– Error-prone– Time-Consuming– It ignores the multi-variation nature of the data

• Analyzing large size FCM data sets (with up to 19 dimensions and 1000,000 points) is impractical without the aim of automated techniques

Page 6: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Which Clustering Algorithm Is Suitable?• Model-Based algorithms like FlowClust, FlowMerge and FLAME

are not suitable for non-elliptical shape clusters.

6

FlowMergeA Good Clustering

GFP

Page 7: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Our Motivation for Using Spectral Clustering

• Spectral clustering does not require any priori assumption on cluster size, shape or distribution

• It is not sensitive to outliers, noise and shape of clusters

7

Page 8: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Spectral Clustering in One SlideRepresent data sets by a similarity graph

Construct the Graph:• Vertices: data points p1, p2, …, pn

• Weights of edges: similarity values Si, j as

Clustering: Find a cut through the graph• Define a cut objective function• Solve it

Page 9: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

The Bottleneck of Spectral Clustering

• Serious empirical barriers when applying this algorithm to large datasets

• Time complexity: O(n3) ---- > 2 years for 300,000 data points (cells)

• Required memory: O(n2) ---- > 5 terabytes for 300,000 data points (cells)

9

Page 10: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Faithful Sampling: Our Solution for Applying Spectral Clustering to Large Data

• Uniform Sampling:Low density populations close to dense ones may not remain distinguishable

10

• Faithful Sampling:Tends to choose more samples from non-dense parts of the data.

Page 11: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

How Does Our Faithful SamplingPreserve Information?

1.1. Space Uniform Sampling: Space Uniform Sampling: It preserves low-density parts of the data by selecting more samples from them compared to the uniform sampling.

2.2. Keeping the list of points in Keeping the list of points in neighbourhood of samples: neighbourhood of samples: This will be used to define similarities between communities.

Page 12: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Clustering Result• Low density populations surrounded by dense ones

Page 13: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Clustering Result• Populations with Non-elliptical Shapes

• Subpopulations of a major population

13

SamSPECTRAL flowMerge FLAME

Page 14: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Summary• Spectral clustering can now be applied to large size data

by our proposed Faithful (Information Preserving) sampling.

• This sampling method can be used in combination with other graph-based clustering algorithms with different objective functions to reduce size of the data.

• We have shown that SamSPECTRAL has advantage over model-based clusterings in identification of– Cell populations with non-elliptical shapes– Low-density populations surrounded by dense ones– Sub-populations of a major population

Page 15: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Acknowledgement• Committee:

– Dr. Arvind Gupta– Dr. Ryan Brinkman– Dr. Tobias Kollman

• Co-authors on SamSPECTRAL – Habil Zare

• Data Providers – Connie Eaves– Peter Landsdrop– Keith Humphries

Page 16: Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Thanks for Thanks for Your Attention!Your Attention!