18
Comparative Assembly for Cancer Human Genome Gao Song 2010/02/03

Gao Song 2010/02/03. Background Knowledge Problem Description Framework of Solution Own Methods Results

Embed Size (px)

Citation preview

Page 1: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

Comparative Assemblyfor

Cancer Human GenomeGao Song

2010/02/03

Page 2: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

Background Knowledge Problem Description Framework of Solution Own Methods Results

Content

Page 3: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

Pair End Tag (PET)

Background Knowledge

Page 4: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

Concordant PET (CPET)

Discordant PET (DPET)◦ Distance or orientation is incorrect◦ Map to different chromosomes

DPET Cluster

Background Knowledge

Page 5: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

Given:◦ Frequency of DPET and CPET along the reference

genome◦ DPET Cluster

Requirement:◦ Find rearrangement of cancer genome compare to

normal human genome◦ Now focus on Amplicons

Problem Description

Page 6: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

The reference genome is cut when CPET is 0=> some big contigs

According to DPET, find the breakpoints Using CPET to check if there is connection

between breakpoints Convert DPET Cluster into edges in the

graph Using high copy edges to form subgraph of

amplicons

Framework of Solution

Page 7: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

Framework of Solution

DPETStart and End

Breakpoint

CPET

Filted BreakPoints

Original Contigs

Small Contigs

DPETReference Genome

Edges CPETNodes

Graph

Page 8: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

DPET Frequency Curve Using DPET directly

choose a threshold to Select the breakpoint

Problem:◦ How to choose the threshold◦ Within amplicon region, it is hard to find the

breakpoint – basic frequency is too much

Own Methods-NaiveChromosome 9

Page 9: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

Using slope(differentiation)

Problem:◦ How to define threshold◦ Too many false positive◦ Also miss some DPET cluster

Own Methods - Slope Chromosome 9

Page 10: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

In breakpoint, DPET increases, CPET decreases

Can be used as another criteria Problem

◦ Another Parameter!

Own Method – Consider Ratio

Page 11: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

Using slope to find the threshold The previous missing point can be found

New methods of finding breakpoint

Page 12: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

Localize checking Using two consecutive windows

◦ Each window has: μ σ

◦ Null Hypothesis: σ2 is not significantly

larger than σ1

◦ Using Binomial Testing:

Significance level: 0.05

Own Method – Hypothesis Testing

window1 window2

Page 13: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

Some details:◦ Check if the cluster region is included in window

Not finished yet Calculating σ is time-consuming

- have to recalculate after each step

Own Method – Hypothesis Testing

Page 14: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

Results(slope)

10k 20k

# of subgraph 72 35

Max chromosome inOne subgraph

4 4

Average chromosomeIn one subgraph

1.18 1.23

Max edge inOne subgraph

42 44

Average edgeIn one subgraph

5.47 5.77

Page 15: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

One Special Case

Page 16: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

10k Lib

Page 17: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

20k Lib

Page 18: Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results

10k lib 20k lib

Another example