33
ICSA, 6/2007 Pei Wang, [email protected] 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research Program, PHS, FHCRC Joint work with Robert Tibshirani, Stanford University, CA

ICSA, 6/2007 Pei Wang, [email protected] 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

Embed Size (px)

Citation preview

Page 1: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

1

Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso

Pei WangCancer Prevention Research Program, PHS, FHCRC

Joint work with Robert Tibshirani,

Stanford University, CA

Page 2: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

2

Outline 1. DNA copy number alterations and Array

CGH experiments.

2. Detect copy number alterations using Fused Lasso regression.

3. Simulation and real data examples.

4. Jointly model copy number alterations and disease out comes using Fused Lasso regression.

Page 3: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

3

DNA Copy Number• In normal human cells: DNA copy number

= 2 • Genome instability => Copy number

alterations.

Alberson and Pinkel, Hum. Mol. Gen., 2003

Page 4: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

4

DNA Copy Number

In cancer researches, knowledge of copy number aberrations helps to

• Identify important cancer genes.

• Reveal different tumor subtypes with different mechanism of initiation and/or progression.

• Predict tumor prognosis, and improve clinical diagnosis

Page 5: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

5

Array CGH• array Comparative Genomic Hybridization.

Scan machine reports the for each spot

on the chips, which correspond to:

Page 6: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

6

Array CGH Array CGH has been implemented using a wide variety of techniques.

• BAC array : produced from bacterial artificial chromosomes;

• cDNA microarray: made from cDNAs;

• oligo array: made from oligonucleotides (Affy, Agilent, Illumina).

Output from array CGH experiment:

sample reference in thenumber copy

sample test in thenumber copy log2

Page 7: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

7

Goal• Identify genome regions with DNA copy number alterations

An example segment of CGH data from a GMB primary tumor (Bredel et al.2005).

Page 8: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

8

Goal• Identify genome regions with DNA copy number alterations

Raw CGH data.Estimated copy number from fused lasso regression shows copy number alteration regions.

Page 9: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

9

Method• Denote the log2 ratio measurement of a chromosome (or chromosome arm) as .

• Assume: = log2( true copy number / 2) + ei

= + ei ,

We are interested in recovering .

• Property of :

(1) =0 for genome regions without alterations;

>0 or <0 for regions of gain/loss.

(2) Profile { } has strong spatial correlation along index i.

Page 10: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

10

Method

• We are interested in finding coefficients satisfying

(1) Lasso constraint --- detect alteration regions;

(2) Fused constraint --- account for the spatial correlation.

Page 11: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

11

lasso & fused lasso• lasso Regression (Tibshirani 1996)

• fused lasso Regression (Tibshirani et al. 2004)

Page 12: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

12

Method

• Apply fused lasso on aCGH data:

(1) Solve the optimization.

(2) Choose the tuning parameters.

(3) Control the False Discovery Rate (FDR).

Page 13: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

13

Method

• Apply fused lasso on aCGH data:

(1) Solve the optimization.

(2) Choose the tuning parameters.

(3) Control the False Discovery Rate (FDR).

Page 14: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

14

1. Solve the optimization 2. Choose the tuning parameter

For the general fused lasso regression:

-Use SQOPT by Gill et al. to solve the quadratic programming problem with sparse linear constraints (Tibshirani et al., 2004)

Page 15: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

15

For the special application on CGH array:

- Pathwise coordinate optimization (Jerome Friedman et. al. Tech Report)

• A modification of original Coordinate-wise descent algorithm (Shooting procedure) (Fu 1998, Daubechies et al. 2004).

• The running time is only 1/100 of the quadratic programming

1. Solve the optimization 2. Choose the tuning parameter

Page 16: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

16

Estimates s1 and s2 from pre-smoothed version of the data:

• s1 controls the overall copy number alteration amount of the target chromosome --- using heavily smoothed Y.

• s2 controls the frequency of the copy number alterations on the target chromosome --- using moderately smoothed Y.

1. Solve the optimization 2. Choose the tuning parameter

Page 17: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

17

Other Method

Lai et. al. 2005 provides a thorough review of statistical methods for aCGH analysis.

- Simple smoothing with Lowess

- Hidden Markov Model (Fridlyand et al. 2004)

- Top Down: Circular Binary Segmentation (Olshen et al. 2004, Venkatraman et al. 2007)

- Bottom-up: Cluster along chromosomes (Wang et al. 2005)

- Dynamic Programming: CGHseg (Picard et al. 2005)

- Denoising using wavelet (Hsu et al. 2005)

- And many others.

Page 18: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

18

Other Method

Lai et. al. 2005 provides a thorough review of statistical methods for aCGH analysis.

- Simple smoothing with Lowess

- Hidden Markov Model (Fridlyand et al. 2004)

- Top Down: Circular Binary Segmentation (Olshen et al. 2004, Venkatraman et al. 2007)

- Bottom-up: Cluster along chromosome (Wang et al. 2005)

- Dynamic Programming: CGHseg (Picard et al. 2005)

- Denoising using wavelet (Hsu et al. 2005)

- And many others.

Page 19: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

19

• General smoothing methods are not typically useful for analyzing CGH data, because their results can be difficult to interpret.

• Fused lasso regression can also be viewed as a smoothing approach; but, it is able to capture the structure of the CGH data very well.

Page 20: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

20

Comparison of Fused lasso with three segmentation methods:

CGHseg (Picard et. al. 2005)

CLAC (Wang et.al. 2005)

CBS (Olshen et.al. 2004)

Page 21: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

21

Simulation ExampleFurther comparison of fused lasso results with the three segmentation methods on simulation data sets from Lai et al. 2005.

• Total length of chromosome segment: 100• Four Different aberration width: 5, 10, 20, 40.• Signal to Noise ratio is equal to 1. Normal region: x~ N(0, 0.25); Alteration region: x~N(0.25, 0.25). • For each width, simulate 100 independently chromosomes.

Evaluation process:1. Estimate copy number using different methods. 2. Apply different thresholds on the estimated copy numbers, and calculate TPR = # of correct calls / # of total aberration. FPR = # of false calls / # of total normal probes.

Page 22: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

22

The TPR-FPR curves for the fours methods under different window sizes.

Page 23: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

23

Real Data Example

Breast Cancer Cell line MDA157 (Pollack 2002)

Page 24: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

24

Computation Time

Mean (sd) P=100 P=500 P=1000 P=2000

CBS(DNAcopy1.10.0)

0.151 (0.113)

1.243 (0.804)

3.669 (1.135)

8.455 (2.854)

CGHseg 0.063 (0.008)

0.445 (0.016)

1.223 (0.041)

4.205 (0.104)

CLAC 0.049 (0.003)

0.086 (0.013)

0.157 (0.037)

0.368 (0.073)

cghFLasso 0.025 (0.013)

0.140 (0.017)

0.334 (0.036)

0.840 (0.056)

Data Simulation:1. Pre-specify chromosome length p=100, 500, 1000, 2000.2. Random sample 50 genome segments of length p from 17 Breast Cancer CGH arrays.3. Apply each method on the 50 segments, and record the CPU time.

Comparison of the speed of the four Methods:

(seconds)

Page 25: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

25

Applying Fused Lasso on CGH:

• gives an appropriate way to model aCGH data.

• has favorable performance compared to other method.

• is computationally efficient.

Page 26: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

26

Applying Fused Lasso on CGH:

• provides an appropriate model for aCGH data.

• has favorable performance compared to other method.

• is computationally efficient.

• Provides a flexible frame work for aCGH analysis in more complicated settings.

Page 27: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

27

Joint ModelStudy copy number alterations and disease outcomes.

• Model:

Interested in finding disease associated genes.

Page 28: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

28

Joint ModelStudy copy number alterations and disease outcomes.

• Model:

Interested in finding disease associated genes. Naïve method (Two-Steps):

1. call gains and losses for each individual array;

2. use the estimated copy numbers to look for disease associated genes.

Page 29: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

29

Joint Model Naïve method (Two-Steps):

1. call gains and losses for each individual array; 2. use the estimated copy numbers to look for

disease associated genes.

Drawbacks:

1. Loss information after first round of data processing.

2. “Smoothing adds to already existing among neighboring values, thus causing the within-class covariance to be even more jagged… increase the computational cost with zero benefit in classification performance” (Hastie et al. 1995 Ann. of Stat.)

Page 30: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

30

Joint Model

Joint modeling:

Page 31: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

31

• Simulate genome segment with p=50 genes for n=30 samples:

- true copy numbers - noise CGH measurements

• Generate psuedo phenotype for each sample using two pre-selected non-adjacent genes.

• Look for disease associated genes with different methods. Varying the tuning parameter t and produce ROC curves for each method.

• Repeat for 200 times and plot the mean ROC curve.

Compare different approaches on a simulation data set.

Page 32: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

32

Summary• Fused Lasso Regression can be used to characterize the spatial structure of array CGH data.

- Tibshirani & Wang, Biostatistics (In press)

- google-> tibshirani -> click on cghFlasso under software

• The flexible framework of the regression model can be easily extended to solve other problems involving CGH data.

Page 33: ICSA, 6/2007 Pei Wang, pwang@fhcrc.org 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research

ICSA, 6/2007

Pei Wang, [email protected]

33

Acknowledgment

Stanford University, Department of Statisitcs

Robert Tibshirani, Jerry Friedman, Trevor Hastie.

Stanford University, Department of Pathology

Jonathan Pollack.