53
GENOME EVOLUTION AND GENE GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Shin-Han Shiu Plant Biology / QBMI Plant Biology / QBMI Michigan State University Michigan State University

GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Embed Size (px)

Citation preview

Page 1: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

GENOME EVOLUTION AND GENOME EVOLUTION AND GENE DUPLICATIONS IN GENE DUPLICATIONS IN EUKARYOTESEUKARYOTES

Shin-Han ShiuShin-Han Shiu

Plant Biology / QBMIPlant Biology / QBMI

Michigan State UniversityMichigan State University

Page 2: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Genomes and gene contentsGenomes and gene contents

30,000 25,000

10,000

6,00045,000

17,000

Page 3: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Duplicate genes in the genomeDuplicate genes in the genome

Arabidopsis gene families*Arabidopsis gene families*

*: Clusters of Markov clustering using all-against-all BLAST E values as distance measures

Page 4: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Gene function and duplicationGene function and duplication

What’s the consequence?What’s the consequence?

Page 5: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Gene function and duplicationGene function and duplication

What’s the consequence?What’s the consequence?

Page 6: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Focus I: Duplication Mechanism and Loss Focus I: Duplication Mechanism and Loss RateRate

GeneDuplications

Mechanisms ConsequencesPreferential

retention

Page 7: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Duplication mechanismsDuplication mechanisms

+

Whole genome duplicationWhole genome duplication

Tandem duplicationTandem duplication

Segmental duplicationSegmental duplication

Replicative transpositionReplicative transposition

Page 8: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Lineage-specific gains in plants and animalsLineage-specific gains in plants and animals

OrganismOrganism Lineage-specific Lineage-specific gainsgains

Normalized Normalized gain*gain*

# of genes in # of genes in familiesfamilies

analyzedanalyzed% total% total

Rice 10115 6743 28467 35.5 (23.7)**

Arabidopsis 5984 3990 21936 27.3 (18.2)**

Human 811 811 21954 3.7

Mouse 1265 1265 24041 5.3

*: The gain counts are normlized against the ratio between the Arabidopsis-rice and human-mouse divergence time (150 and 100 Mya, respectively).

**: Numbers in parentheses refer to percentage total based on normalized gains.

Substantially more recent duplicates in plants than in animalsSubstantially more recent duplicates in plants than in animals Mostly due to frequent whole genome duplications in plantsMostly due to frequent whole genome duplications in plants

Page 9: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Gain vs. LossGain vs. Loss

3 rounds of whole-genome duplications in the Arabidopsis lineage3 rounds of whole-genome duplications in the Arabidopsis lineage ~82% duplicates from the last round were lost in the past 40 ~82% duplicates from the last round were lost in the past 40

million yearsmillion years

15,000*30,000

60,000

120,000

Arabidopsisgene content:

21,000**

*: Number of orthologous groups in shared families between Arabidopsis and rice.**: Number of genes in shared families.

Genome duplications + tandem duplications – gene losses =

Page 10: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

““Age” distribution of animal duplicatesAge” distribution of animal duplicates

Steady decay in the number of duplicatesSteady decay in the number of duplicates Frequent TD, SD, and RTFrequent TD, SD, and RT

Ks: rate of nucleotide substitutions in codon sites that do not affect amino acid identity

Shiu et al., 2006

Page 11: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Plant duplicate “age” distributionPlant duplicate “age” distribution

Apparent peak at ~0.18 instead of zero KsApparent peak at ~0.18 instead of zero Ks Frequent Frequent WGDWGD, TD, SD (maybe), and RT (in some plants), TD, SD (maybe), and RT (in some plants)

Shiu et al., 2004

Page 12: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Genome remodeling in polyploidsGenome remodeling in polyploids

Natural and synthetic polyploidsNatural and synthetic polyploids

~348 Mb

~203 Mb~314 Mb

~257 Mb

20,000 yr

Page 13: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Experimental approachesExperimental approaches

Genome-wide polymorphism monitored by tiling arrayGenome-wide polymorphism monitored by tiling array

Genome

Tiled probes

Gap Resolution

Array

20,000 yr

~6 million features

Page 14: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Genome-wide Single Feature PolymorphismGenome-wide Single Feature Polymorphism

Mid-parent (MP) vs. Arabidopsis suecica (As)Mid-parent (MP) vs. Arabidopsis suecica (As)

PolyploidPolyploid SFPSFP

Natural 58,517

Synthetic 503

Page 15: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Genome-wide Single Feature PolymorphismGenome-wide Single Feature Polymorphism

Genome-wide polymorphism monitored by tiling arrayGenome-wide polymorphism monitored by tiling array

Gene Pseudogene Transposon

Page 16: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Genome-wide Single Feature PolymorphismGenome-wide Single Feature Polymorphism

Duplication or deletionDuplication or deletion

MP duplication or

As deletion

Page 17: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Genome Survey SequencingGenome Survey Sequencing

Sequence ~40-60Mb of the Arabidopsis suecica genome Sequence ~40-60Mb of the Arabidopsis suecica genome 0.15-0.2 X coverage, will be done next week!0.15-0.2 X coverage, will be done next week!

Ultra-high throughput sequencer (GS20) funded by the Ultra-high throughput sequencer (GS20) funded by the Strategic Partnership GrantStrategic Partnership Grant Ultra-high throughputUltra-high throughput

20-30 Mb per run, each run 5 hours20-30 Mb per run, each run 5 hours Will be 100Mb per run early 2007Will be 100Mb per run early 2007

Cost efficientCost efficient ~$0.3/kb~$0.3/kb

Read length rather limitedRead length rather limited ~100bp per read now~100bp per read now Will be ~200bp early 2007Will be ~200bp early 2007

For more information contact:For more information contact: Andreas Weber (Andreas Weber ([email protected]@msu.edu)) David DeWitt (David DeWitt ([email protected]@msu.edu)) Or Shin-Han Shiu (Or Shin-Han Shiu ([email protected]@msu.edu))

Seminar on instrumentation: Seminar on instrumentation: 9/29, Friday, 1pm, 1415 BPS9/29, Friday, 1pm, 1415 BPS

Page 18: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Summary: Gene duplication and polyploidySummary: Gene duplication and polyploidy

Gene duplication occurred frequently in eukaryotes but most Gene duplication occurred frequently in eukaryotes but most duplicate are lost.duplicate are lost.

In plants, whole genome duplication is common. But gene lost In plants, whole genome duplication is common. But gene lost occurred frequently.occurred frequently.

After 4 generations, very small number of SFPs are identified in After 4 generations, very small number of SFPs are identified in synthetic polyploids.synthetic polyploids.

After 20,000 generations, most coding genes do not have After 20,000 generations, most coding genes do not have clustered sequence polymorphism that indicative of deletion.clustered sequence polymorphism that indicative of deletion.

Clustered polymorphisms mostly locate in pseudogenes and Clustered polymorphisms mostly locate in pseudogenes and transposons.transposons.

Survey sequencing is necessary to determine if some coding Survey sequencing is necessary to determine if some coding genes have become pseudogenes without being deleted. genes have become pseudogenes without being deleted.

Page 19: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Focus II: Differential Retention of Focus II: Differential Retention of DuplicatesDuplicates

GeneDuplications

Mechanisms ConsequencesPreferential

retention

Page 20: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Duplicate genes in the genomeDuplicate genes in the genome

Arabidopsis gene families*Arabidopsis gene families*

*: Clusters of Markov clustering using all-against-all BLAST E values as distance measures

Page 21: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Large gene families in plantsLarge gene families in plants

One of the largest gene familiesOne of the largest gene families

Page 22: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Normalized gain: % expanded OGs Normalized gain: % expanded OGs

Large family sizes do not necessarily indicates higher expansion Large family sizes do not necessarily indicates higher expansion ratesrates

Page 23: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Ancestral family sizes and gene gainsAncestral family sizes and gene gains

Large ancestral family tend to have more lineage specific gains Large ancestral family tend to have more lineage specific gains but with many exceptionsbut with many exceptions

Page 24: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Differential expansion of functional Differential expansion of functional categoriescategories

GO: GeneOntologyGO: GeneOntology

Protein ubiquitinationProtein ubiquitination Polysaccharide biosynthesisPolysaccharide biosynthesis Cell wall modificationCell wall modification Transcriptional regulationTranscriptional regulation Biotic stress responseBiotic stress response Secondary metabolismSecondary metabolism

Page 25: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Differences in DuplicabilityDifferences in Duplicability

CategoryCategory ArabidopsisArabidopsis HumanHuman

Defense responseDefense response

ProteolysisProteolysis

TransportTransport

Ion channel activityIon channel activity

MetabolismMetabolism

DevelopmentDevelopment

Protein kinase activityProtein kinase activity

Transcription factor activityTranscription factor activity

DuplicabilityDuplicability The propensity for the retention of a duplicate geneThe propensity for the retention of a duplicate gene Computational analysis of genome-wide trendComputational analysis of genome-wide trend

Page 26: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Kinase superfamily sizes among eukaryotesKinase superfamily sizes among eukaryotes

OrganismNumber of

genesKinase

superfamilyPercent

total gene

Arabidopsis thaliana 25,814 1041 4.0

Oryza sativa subsp. indica ~35,000 1607 3.6

Chlamydomonas reinhardtii ~12,200 414 3.4

Plasmodium falciparum 5,334 94 1.8

Plasmodium yoelii 7,681 70 0.9

Caenorhabditis elegans 19,484 417 2.1

Drosophila melanogaster 13,808 262 1.9

Anopheles gambiae 15,088 216 1.4

Ciona intestinalis 15,852 316 2.0

Fugu rubripes 33,609 632 1.9

Mus musculus 22,444 495 2.2

Homo sapiens 22,980 472 2.1

Saccharomyces cerevisiae 6449 113 1.8

Candida albicans 6,164 95 1.5

Neurospora crassa 10082 104 1.9

Schizosaccharomyces pombe 4945 109 2.2

Shiu & Bleecker, 2003

Page 27: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Kinase families in rice and Kinase families in rice and ArabidopsisArabidopsis

Gene count differences among families indicate differential Gene count differences among families indicate differential expansionexpansion

Shiu et al., 2004

Page 28: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Estimation of ancestral RLK family sizeEstimation of ancestral RLK family size

A. B.440 speciation points rice Arabidopsis

A. B.WAK LRR VIII, X, XII

Kinase phylogeny of Arabidopsis and rice RLKsKinase phylogeny of Arabidopsis and rice RLKs

Shiu et al., 2004

Page 29: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Development vs. resistance/defense RLKsDevelopment vs. resistance/defense RLKs

Shiu et al., 2004

Page 30: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

ContradictionContradiction

Plant genes invovled in development tend to have high Plant genes invovled in development tend to have high duplicabilityduplicability

DevelopmentalRLKs

Low duplicability

Resistance/DefenseRLKs

High duplicability

Animal tyrosinekinases

Low duplicability

Transcription factors

High duplicability

Page 31: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Selection for expansionSelection for expansion

Depend on the level of variations of the signalsDepend on the level of variations of the signals

T

T

OR

Page 32: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Summary: differential retentionSummary: differential retention

Longevity and duplicability of plant genesLongevity and duplicability of plant genes

High High

High Low

Low High

Low Low

Duplicability Longevity Examples

Transcription factors

Resistance genes

Enzymes in central metabolicpathways

??

Page 33: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Focus III: Functional ConsequencesFocus III: Functional Consequences

GeneDuplications

Mechanisms ConsequencesPreferential

retention

Page 34: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Functional Consequences of DuplicationFunctional Consequences of Duplication

Functional divergence and conservationFunctional divergence and conservation Is it because of changes in cis-regulatory elements or coding sequencesIs it because of changes in cis-regulatory elements or coding sequences

How are duplicates retained, subfunctionalization or How are duplicates retained, subfunctionalization or neofunctionalizationneofunctionalization

Page 35: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Divergence in gene expressionDivergence in gene expression

Develop pipelines for cis-element prediction and Develop pipelines for cis-element prediction and

Clusters ofgenes with similarexpression profiles

Machine learning

Motif functionalprediction

Cis-regulatorylogic

Expression dataOver-representedsequence motifs

in 5’ regions

Experimentalvalidations

Page 36: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Divergence in post-translational Divergence in post-translational modificationmodification

Conservation of phosphorylation site across specesConservation of phosphorylation site across speces SACE: budding yeastSACE: budding yeast CAGL: Candida glabraCAGL: Candida glabra CAAL: Candida albicansCAAL: Candida albicans CATR: Candida tropicalisCATR: Candida tropicalis NECR: Neurospora crassaNECR: Neurospora crassa DEHA: Debaryomuces hanseniiDEHA: Debaryomuces hansenii

Page 37: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Detailed Functional Studies of Duplicate Detailed Functional Studies of Duplicate GenesGenes

Functional analyses of DDF1 and DDF2 transcription factorsFunctional analyses of DDF1 and DDF2 transcription factors Derived from recent whole genome duplication in ArabidopsisDerived from recent whole genome duplication in Arabidopsis Related to the well known CBF factors involved in cold and draught Related to the well known CBF factors involved in cold and draught

stressstress

DDFs

PromoterGFP

Knockouts

Over-expression

studies

Interactingproteins

Bindingtargets

DDFs

PromoterGFP

Knockouts

Over-expression

studies

Interactingproteins

Bindingtargets

Arabidopsis thaliana Arabidopsis lyrata

Page 38: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Focus IV: Protein spaceFocus IV: Protein space

GeneDuplications

Mechanisms ConsequencesPreferential

retentionConsequences

Preferentialretention

Page 39: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Tiling array analysis of transcriptomeTiling array analysis of transcriptome

Human Chr 21, 22Human Chr 21, 22

Kapranov et al., 2002

Page 40: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Posterior probability p(F|coding)Posterior probability p(F|coding)

Page 41: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Performance of the CI measurePerformance of the CI measure

Known Arabidopsis exon and intron 90-300bpKnown Arabidopsis exon and intron 90-300bp

Arabidopsis small protein that are not annotatedArabidopsis small protein that are not annotated Correctly predict 19 out of 20 (95%).Correctly predict 19 out of 20 (95%).

Yesat sORF with translation evidenceYesat sORF with translation evidence Correctly predict 98 out of 114 (86%)Correctly predict 98 out of 114 (86%)

In “intergenic” sequences of Arabidopsis genomeIn “intergenic” sequences of Arabidopsis genome 3,274 sORF identified3,274 sORF identified

Page 42: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Coupling with tiling array expressionCoupling with tiling array expression

Hybridization intensities for feature typesHybridization intensities for feature types

Page 43: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Summary: Novel coding genesSummary: Novel coding genes

Many unannotated regions in the genomes are expressed.Many unannotated regions in the genomes are expressed.

Using the CI measure, many proteins that were not annotated Using the CI measure, many proteins that were not annotated but with evidence of expression from yeast and Arabidopsis are but with evidence of expression from yeast and Arabidopsis are identified correctly.identified correctly.

Using the CI measure, we estimated that ~3000 novel coding Using the CI measure, we estimated that ~3000 novel coding regions are present in the unannotated regions of Arabidopsis regions are present in the unannotated regions of Arabidopsis thaliana genome.thaliana genome.

Using tiling array data, we found that many of these novel Using tiling array data, we found that many of these novel coding regions are expressed.coding regions are expressed.

Page 44: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

AcknowledgementAcknowledgement

Lab membersLab members

Kousuke Hanada

Melissa Lehti-Shiu

Cheng Zou

Emily Eckenrode

University of ChicagoUniversity of Chicago Justin BorevitzJustin Borevitz Xu ZhangXu Zhang

University of WisconsinUniversity of Wisconsin Sara PattersonSara Patterson Rick VierstraRick Vierstra

University of MissouriUniversity of Missouri Scott PeckScott Peck

Michigan State UniversityMichigan State University Many…Many… Rong Jin, Comp Sci & EngRong Jin, Comp Sci & Eng Yue-Hua Cui, Stat & ProbYue-Hua Cui, Stat & Prob Startup fundStartup fund

Page 45: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Recent completion …Recent completion …

Page 46: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Genome remodeling in polyploidsGenome remodeling in polyploids

Genome duplication occur frequently in plantsGenome duplication occur frequently in plants What is the fate of duplicates?What is the fate of duplicates?

How fast do gene losses occur?How fast do gene losses occur? Is there any preference in genes retained?Is there any preference in genes retained?

AB

CD

E

A1B1

C1D1

E1

A2B2

C2D2

E2

t1 t2

A1B1

C1D1

E1

A2B2

C2D2

E2

A1B1

C1D1

E1

A2B2

C2D2

E2

Ng = 5 10 8 5

Page 47: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Comparing degrees of expansionComparing degrees of expansion

Combined set

Arabidopsis: ~25,000 proteins

Rice prediction:~66,000 genes

Gene/domainfamilies

Shared

unique

Pairwise distance

Putative orthologous

groups

ui = 1

GO:0001

ei = 4

All orthologous groups

Total unexpanded = Σ ui

Total expanded = Σ ei

Page 48: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Major questions on gene duplicationMajor questions on gene duplication

When: timing of gene duplications, e.g. N = 10When: timing of gene duplications, e.g. N = 10

Page 49: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Domain gains in rice and Domain gains in rice and ArabidopsisArabidopsis

Gain in one lineage does not necessarily predict gain in the otherGain in one lineage does not necessarily predict gain in the other

Page 50: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Identify novel small coding genesIdentify novel small coding genes

Determine base composition probabilitiesDetermine base composition probabilities

Codingsequences

Non-codingsequences

CDSparameters

NCDSparameters

# of AAA

# of all NNNPc(AAA) =

Pc(AAAT)

Pc(AAA)Pc(T|AAA) =

Calculate posterior probabilityCalculate posterior probability

c1 c2 c3

c4 c5 c6

Feature tablesFeature tables

n

)()|()()|()()|()|(

NCDSPNCDSSPCDSPCDSSPCDSPCDSSPSCDSP

Page 51: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Setting up the Bayes’Setting up the Bayes’

PriorsPriors

S = S = ATG ATG TTC TTC TAC TAC TTT TTT GG……

6

1

2

1)(...)()( 621 CDSPCDSPCDSP2

1)()( NCDSPCDSP

6

1

)()|()()|(m

mCDSPmCDSSPCDSPCDSSP

...)|()|()|()|()()|( 132111 TTCTPGTTCPTGTTPATGTPATGPCDSSP ccccc...)|()|()|()|()()|( 213222 TTCTPGTTCPTGTTPATGTPATGPCDSSP ccccc

...)|()|()|()|()()|( 654666 TTCTPGTTCPTGTTPATGTPATGPCDSSP ccccc

...)|()|()|()|()()|( TTCTPGTTCPTGTTPATGTPATGPCDSSP nnnnnn

)()|()()|()()|()|(

NCDSPNCDSSPCDSPCDSSPCDSPCDSSPSCDSP

Page 52: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Coding Likelihood (CL)Coding Likelihood (CL)

Sliding windows of a sequenceSliding windows of a sequence

Simulation based on NCDS (introns)Simulation based on NCDS (introns)

n

SCDSPCL n

)|(1 2 3 4 … n

Page 53: GENOME EVOLUTION AND GENE DUPLICATIONS IN EUKARYOTES Shin-Han Shiu Plant Biology / QBMI Michigan State University

Divergence in post-translational Divergence in post-translational modificationmodification

Conservation of phosphorylation site across specesConservation of phosphorylation site across speces