25
Computational and Evolutionary Molecular Biology Yun S. Song UC Berkeley, CS Grad Visit Day March 17, 2008 0/9

Computational and Evolutionary Molecular Biologydb.cs.berkeley.edu/visitday08/song.pdf · Computational Biology The use of computational and mathematical techniques to address questions

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Computational and EvolutionaryMolecular Biology

Yun S. Song

UC Berkeley, CS Grad Visit DayMarch 17, 2008

0 / 9

Introduction Short-read resequencing Coalescent Theory DE

Biosystems & Computational Biology

CS faculty members:

Ruzena BajcsyBrian A. BarskyJerome A. FeldmanMichael I. Jordan (coordinator)Richard KarpJitendra MalikElchanan Mossel

Christos PapadimitriouLior PachterSatish RaoStuart J. RussellYun S. SongBernd SturmfelsKatherine Yelick

In what follows, I will give a brief overview of my group’sresearch interests.

1 / 9

Introduction Short-read resequencing Coalescent Theory DE

Biosystems & Computational Biology

CS faculty members:

Ruzena BajcsyBrian A. BarskyJerome A. FeldmanMichael I. Jordan (coordinator)Richard KarpJitendra MalikElchanan Mossel

Christos PapadimitriouLior PachterSatish RaoStuart J. RussellYun S. SongBernd SturmfelsKatherine Yelick

In what follows, I will give a brief overview of my group’sresearch interests.

1 / 9

Introduction Short-read resequencing Coalescent Theory DE

Computational BiologyThe use of computational and mathematical techniques toaddress questions arising from biology.Examples: Forensic DNA analysis, Sequence alignment,Genome-wide disease gene mapping, Phylogenetics

Mathematical Population GeneticsThe study of the evolutionary forces (such as populationhistory, natural selection, and recombination) that produceand maintain genetic variation within species.Tightly linked to many branches of mathematics.

Tools usedAlgorithmsGraph theoryCombinatoricsStochastic processes

Dynamical systemsMachine learningSignal processingReconfigurable computing

2 / 9

Introduction Short-read resequencing Coalescent Theory DE

High-throughput short-read resequencing

Genomic sequencing technology entered a revolutionaryphase in 2007.High-throughput short-read sequencing technology candeliver fast and cost-effective generation of immenseamounts of sequence data.For instance, we can now sequence a Drosophila genomein a few days and a human genome in a few weeks usingthe Solexa/Illumina platform.

The main and immediate challengeThe assembly and analyses of short-read resequencing data.

We want to develop efficient algorithms and computationalinfrastructure to meet that challenge, addressing the sheervolume and nature of short-read data.

3 / 9

Introduction Short-read resequencing Coalescent Theory DE

Short-read sequencing

hundreds of millions of them)

Randomly fragmentgenomic DNA

Sequence short−reads

Assemble

(tens of millions to

In the Solexa/Illumina platform, each short-read is between30 to 45 base-pairs long.

4 / 9

Introduction Short-read resequencing Coalescent Theory DE

Our approach1 Simultaneously map hundreds of short-reads onto a

reference genome using FPGA.

Calculates diagonal score

a = score_diagb = score_leftc = score_up

Outputs minimum score

Calculates minimum score

Reset lastlast one pulsebefore this PE startscomputing on valid data.

2score_up_out

1score_diag_out

xlmux

sel

d0

d1d1

d2

score_select_mux

addr

score_select_lut

xlconcathi

locat

score_select_concat

xlrelationalz-2

a

ba<=b

score_select_bc

xlrelationalz-2

a

ba<=b

score_select_ac

xlrelationalz-2

a

ba<=b

score_select_ab

k =1

score_left_cnsxladdsuba+ba

b

a

score_left_add

xlmux

sel

d0

d1d1

score_diag_reset_mux

xlrelationala

ba!=b

score_diag_neqxlmux

sel

d0

d1d1

score_diag_mux_2

k =1

score_diag_cns

xladdsuba+ba

b

a

score_diag_add

xlregisterz-1d

enq

lastlast

xlregisterz-1d

enq

last

z-3

delay_mux_2

z-3

delay_mux_1

z-3

delay_mux_0

k =0

cns_lastlast_rst

6rst

5en

4ref_char

3query_char

2 score_up_in

1score_diag_in

2 Use graph-theory-based algorithms to assemble the shortreads into a complete genome, while detecting geneticvariation and correcting for sequencing errors.

5 / 9

Introduction Short-read resequencing Coalescent Theory DE

1000 human genomes

The National Human Genome Research Institute (NHGRI)recently announced an international collaboration toresequence 1000 human genomes from around the world.(http://www.1000genomes.org/)

1000 Drosophila genomesMy group is closely involved in a project that proposes toresequence 1000 Drosophila genomes.

A well-studied model organism.About 20 times shorter than the human genome.Drosophila genome is less repetitive.Can perform direct functional analysis.

Further Computational ChallengeHow are we going to analyze 1000 genomes?

6 / 9

Introduction Short-read resequencing Coalescent Theory DE

1000 human genomes

The National Human Genome Research Institute (NHGRI)recently announced an international collaboration toresequence 1000 human genomes from around the world.(http://www.1000genomes.org/)

1000 Drosophila genomesMy group is closely involved in a project that proposes toresequence 1000 Drosophila genomes.

A well-studied model organism.About 20 times shorter than the human genome.Drosophila genome is less repetitive.Can perform direct functional analysis.

Further Computational ChallengeHow are we going to analyze 1000 genomes?

6 / 9

Introduction Short-read resequencing Coalescent Theory DE

Coalescent Theory (CS294-26/STAT260)Random rooted graph with directed, weighted, markededges.Related to random partitions, size-biased permutations,and other combinatorial structures.

Some questions that can be addressed using the coalescentHow many ancestors at time t back in time?Who is related to whom?Time to the common ancestor.The age of a mutation.Population history.Targets and dynamics of natural selection.Speciation.

7 / 9

Introduction Short-read resequencing Coalescent Theory DE

The coalescent with recombinationA retrospective stochastic genealogical process with collisionsand branchings of lineages.

H01110 10110111 11000011 11018 / 9

Introduction Short-read resequencing Coalescent Theory DE

The coalescent with recombinationA retrospective stochastic genealogical process with collisionsand branchings of lineages.

H0

H1

0111 1110

0111 1110

1011

***1

1101

1011

1100

11000011

0011

110*

8 / 9

Introduction Short-read resequencing Coalescent Theory DE

The coalescent with recombinationA retrospective stochastic genealogical process with collisionsand branchings of lineages.

H1

H0

H2

0111 1110 101111011100

0111 1110 1100 1011***10011

00118 / 9

Introduction Short-read resequencing Coalescent Theory DE

The coalescent with recombinationA retrospective stochastic genealogical process with collisionsand branchings of lineages.

m3 H2

H1

H0

H3

1100

0111 1110 1110 ***1 1011

0011

0011

101111010111 11108 / 9

Introduction Short-read resequencing Coalescent Theory DE

The coalescent with recombinationA retrospective stochastic genealogical process with collisionsand branchings of lineages.

m3

H1

H0

H2

H3H4

0011

0111 1110 1110 10110011

110111000111 1110 10118 / 9

Introduction Short-read resequencing Coalescent Theory DE

The coalescent with recombinationA retrospective stochastic genealogical process with collisionsand branchings of lineages.

m3

H1

H0

H2

H3H4H5

0111 1110 10110011

0011 11000111 1110 101111018 / 9

Introduction Short-read resequencing Coalescent Theory DE

The coalescent with recombinationA retrospective stochastic genealogical process with collisionsand branchings of lineages.

m3

H1

H0

H2

H3H4H5

H6*111 1110 10110011 0***

0011 0111 1110 1011110111008 / 9

Introduction Short-read resequencing Coalescent Theory DE

The coalescent with recombinationA retrospective stochastic genealogical process with collisionsand branchings of lineages.

m3

H1

H0

H2

H3H4H5

H6

H7*111 1110

0011

0011

0111 1110 101111011100

1011

8 / 9

Introduction Short-read resequencing Coalescent Theory DE

The coalescent with recombinationA retrospective stochastic genealogical process with collisionsand branchings of lineages.

m4

H1

H0

H2

H3H4H5

H6

H7H8

m3

1110 1011110111000011

0011 *111 1111 1011

01118 / 9

Introduction Short-read resequencing Coalescent Theory DE

The coalescent with recombinationA retrospective stochastic genealogical process with collisionsand branchings of lineages.

m4

H1

H0

H2

H3H4H5

H6

H7H8H9

m3

0111 1110 101111011100

1111

0011

10110011

8 / 9

Introduction Short-read resequencing Coalescent Theory DE

The coalescent with recombinationA retrospective stochastic genealogical process with collisionsand branchings of lineages.

m2

H1

H0

H2

H3H4H5

H6

H7H8H9H10

m3

m4

101111011100

1011

0011

0011 1011

0111 11108 / 9

Introduction Short-read resequencing Coalescent Theory DE

The coalescent with recombinationA retrospective stochastic genealogical process with collisionsand branchings of lineages.

m1

H1

H0

H2

H3H4H5

H6

H7H8H9H10

H11

m3

m4

m2

11011100

1011

0011

10111011

1110 101101118 / 9

Introduction Short-read resequencing Coalescent Theory DE

The coalescent with recombinationA retrospective stochastic genealogical process with collisionsand branchings of lineages.

H12

H3H4H5

H6

H7H8H9H10

H11

m3

m4

m2

m1

H1

H0

H2

0111 1110 101111011100

1011

0011

1011

8 / 9

Introduction Short-read resequencing Coalescent Theory DE

The coalescent with recombinationA retrospective stochastic genealogical process with collisionsand branchings of lineages.

H13

H5

H6

H7H8H9H10

H11

H12

m3

m4

m2

m1

H1

H0

H2

H3H4

0111 1110 101111011100

1011

00118 / 9

Introduction Short-read resequencing Coalescent Theory DE

Computational biology is an active area of research at UC Berkeley.

Designated Emphasis in Computational and Genomic BiologyAbout 30 faculty members spanning 7 Departments:

BioengineeringBiostatisticsChemistryEECSIntegrative BiologyMathematicsStatistics

(http://computationalbiology.berkeley.edu/)

Faculty members in CS affiliated with the DEMichael I. Jordan, Richard M. Karp, Lior Pachter, Yun S. Song,Bernd Sturmfels

9 / 9