7
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster (Dm), Anopheles gambiae (Ag), Caenorhabditis elegans (Ce), Mus musculus (Mm), Rattus norvegicus (Rn), and Homo sapiens (Hs) were downloaded from the NCBI website (ftp.ncbi.nlm.nih.gov ). HMMPFAM search against several major signature databases- PFAM, TIGRFAM, SMART, and Superfamily match to one or more of the models in any one of the databases no matches to any one of the models in any database “Unknown(POF) Our Definition of genes with unknown function What makes species different? A study of Unique Genes

Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster

Embed Size (px)

DESCRIPTION

% hits at each e-value “Unknowns” are not as conserved as “knowns”, even between related organisms! BLAST e-value Yeast

Citation preview

Page 1: Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster

Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster (Dm), Anopheles gambiae (Ag), Caenorhabditis elegans (Ce), Mus musculus (Mm), Rattus norvegicus (Rn), and Homo sapiens (Hs) were downloaded from the NCBI website (ftp.ncbi.nlm.nih.gov).

HMMPFAM search against several major signature databases- PFAM, TIGRFAM, SMART, and Superfamily

match to one or more of the models in any one of the databases

“Known (PDF)”

no matches to any one of the models in any database

“Unknown(POF)”

Our Definition of genes with unknown function

What makes species different?

A study of Unique Genes

Page 2: Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster

% o

f G

enom

e“Unknowns” account for about 25% of each genome

Page 3: Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster

% h

its a

t eac

h e-

valu

e

“Unknowns” are not as conserved as “knowns”, even between related organisms!

BLAST e-value

Yeast

Page 4: Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster

Outl.

ScSpAt

Os

DmAg

MmRn

Hs

Ce

Known Unknown

Relationship tree among the 10 different genomes reveals a high degree of evolutionary divergence

among “unknowns” from different species

“Unknowns” have a different rate of evolution?“Unknowns” are new genes?

Page 5: Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster

836

1197

487

882

1908

3384

5832

19157

792

2919

2041

5601

133

1440

20528

196

3173

5694

5955

% U

niqu

e ge

nes

KnownUnknown

“Unknowns” are mainly species-specific.

Representation of “unknowns” in the “unique-ome” of different species.

“Unique-ome” was defined by a BLAST cut off of 10-6. Between the 10 different genomes!

Page 6: Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster

0.000.050.100.150.200.250.300.350.400.45

Sc Sp At Os Dm Ag M m Rn Hs Ce

Dis

orde

r/len

gth

0.00

0.10

0.20

0.30

0.40

0.50

0.60

Sc Sp At Os Dm Ag M m Rn Hs Ce

Hyd

ro in

dex

0.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

Sc Sp At Os Dm Ag M m Rn Hs Ce

Avg

Seq

Len

gth

(aa)

KnownUnknown

Compared to “knowns”, “Unknowns” are more disordered, less hydrophobic and shorter.

Page 7: Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster

“Unknown” Conclusions• Unknown genes are typically species-specific and might provide

some of the keys that define species-specific differences. • Unraveling the function of “unknowns” would improve our

understanding of species-specific functions.• Disordered protein functions are thought to include the formation

and regulation of large multi-molecular assemblies that participate in important regulatory functions. Disordered regions on proteins have been reported to evolve significantly more rapidly than ordered regions.

• “Unknowns” are likely to be the result of greater evolutionary divergence among species leading to the establishment of new, species-specific regulatory networks.