38
Bioinformatics and Evolutionary Bioinformatics and Evolutionary Genomics Genomics

Bioinformatics and Evolutionary Genomics

  • Upload
    leane

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Bioinformatics and Evolutionary Genomics. Request. We have a small group and also heterogeneous with respect to previous knowledge - PowerPoint PPT Presentation

Citation preview

Page 1: Bioinformatics and Evolutionary Genomics

Bioinformatics and Evolutionary GenomicsBioinformatics and Evolutionary GenomicsBioinformatics and Evolutionary GenomicsBioinformatics and Evolutionary Genomics

Page 2: Bioinformatics and Evolutionary Genomics

RequestRequestRequestRequest

• We have a small groupWe have a small group• and also heterogeneous with respect to previous and also heterogeneous with respect to previous

knowledgeknowledge• PLEASE: interrupt / ask questions when I am going PLEASE: interrupt / ask questions when I am going

to fast, when I use jargon, when I make to fast, when I use jargon, when I make jumps/conclusions that to me seem obvious 100% jumps/conclusions that to me seem obvious 100% logical, but to your are erratic; please point out my logical, but to your are erratic; please point out my implicit assumptions regarding what everybody implicit assumptions regarding what everybody knowsknows

• We have a small groupWe have a small group• and also heterogeneous with respect to previous and also heterogeneous with respect to previous

knowledgeknowledge• PLEASE: interrupt / ask questions when I am going PLEASE: interrupt / ask questions when I am going

to fast, when I use jargon, when I make to fast, when I use jargon, when I make jumps/conclusions that to me seem obvious 100% jumps/conclusions that to me seem obvious 100% logical, but to your are erratic; please point out my logical, but to your are erratic; please point out my implicit assumptions regarding what everybody implicit assumptions regarding what everybody knowsknows

Page 3: Bioinformatics and Evolutionary Genomics

Lectures and computer exercisesLectures and computer exercisesLectures and computer exercisesLectures and computer exercises

• Homology, trees, Homology, trees, • Genomic context , genome evolution, pathway Genomic context , genome evolution, pathway

evolutionevolution• HTP dataHTP data• Eukaryotic Genome Evolution, tree of life.Eukaryotic Genome Evolution, tree of life.

• Exercises … basic abilities, plus impression of what Exercises … basic abilities, plus impression of what is possible / how type of research is done (albeit on is possible / how type of research is done (albeit on a larger scale)a larger scale)

• Homology, trees, Homology, trees, • Genomic context , genome evolution, pathway Genomic context , genome evolution, pathway

evolutionevolution• HTP dataHTP data• Eukaryotic Genome Evolution, tree of life.Eukaryotic Genome Evolution, tree of life.

• Exercises … basic abilities, plus impression of what Exercises … basic abilities, plus impression of what is possible / how type of research is done (albeit on is possible / how type of research is done (albeit on a larger scale)a larger scale)

Page 4: Bioinformatics and Evolutionary Genomics

Literature DiscussionLiterature DiscussionLiterature DiscussionLiterature Discussion

• Each (set of) articles will be introduced (=presentation) by a 1 / Each (set of) articles will be introduced (=presentation) by a 1 / 2 persons, presentation should last approximately half an hour, 2 persons, presentation should last approximately half an hour, followed by a discussionfollowed by a discussion

• What to discussWhat to discuss– What are the articles actually saying? What have authors What are the articles actually saying? What have authors

done? (so that everybody knows)done? (so that everybody knows)– What does this mean in a larger context? (e.g. a discussion What does this mean in a larger context? (e.g. a discussion

of the discussion)of the discussion)

• Each (set of) articles will be introduced (=presentation) by a 1 / Each (set of) articles will be introduced (=presentation) by a 1 / 2 persons, presentation should last approximately half an hour, 2 persons, presentation should last approximately half an hour, followed by a discussionfollowed by a discussion

• What to discussWhat to discuss– What are the articles actually saying? What have authors What are the articles actually saying? What have authors

done? (so that everybody knows)done? (so that everybody knows)– What does this mean in a larger context? (e.g. a discussion What does this mean in a larger context? (e.g. a discussion

of the discussion)of the discussion)

Page 5: Bioinformatics and Evolutionary Genomics

Homology and DomainsHomology and DomainsHomology and DomainsHomology and Domains

Page 6: Bioinformatics and Evolutionary Genomics

Gene / protein sequence evolution: Gene / protein sequence evolution: what is homologywhat is homology

Gene / protein sequence evolution: Gene / protein sequence evolution: what is homologywhat is homology

• Definition homology (biology)Definition homology (biology)• structures are said to be structures are said to be homologoushomologous if they are if they are

alike because of shared ancestry. alike because of shared ancestry. • Classic: arms ~ bird wings ~ bat wings,Classic: arms ~ bird wings ~ bat wings,• Genes/proteins/stretches of dna: sequence similarity Genes/proteins/stretches of dna: sequence similarity

because derived from the same ancestral sequencebecause derived from the same ancestral sequence• Instead of analogous: with sequences we have Instead of analogous: with sequences we have

convergence, but thought to be limited to specific convergence, but thought to be limited to specific cases (e.g. coiled-coil, regulatory motifs); but with cases (e.g. coiled-coil, regulatory motifs); but with function we have analogy e.g. analogous enzymesfunction we have analogy e.g. analogous enzymes

• Definition homology (biology)Definition homology (biology)• structures are said to be structures are said to be homologoushomologous if they are if they are

alike because of shared ancestry. alike because of shared ancestry. • Classic: arms ~ bird wings ~ bat wings,Classic: arms ~ bird wings ~ bat wings,• Genes/proteins/stretches of dna: sequence similarity Genes/proteins/stretches of dna: sequence similarity

because derived from the same ancestral sequencebecause derived from the same ancestral sequence• Instead of analogous: with sequences we have Instead of analogous: with sequences we have

convergence, but thought to be limited to specific convergence, but thought to be limited to specific cases (e.g. coiled-coil, regulatory motifs); but with cases (e.g. coiled-coil, regulatory motifs); but with function we have analogy e.g. analogous enzymesfunction we have analogy e.g. analogous enzymes

Page 7: Bioinformatics and Evolutionary Genomics

Why are we interested in homologyWhy are we interested in homologyWhy are we interested in homologyWhy are we interested in homology

• Function prediction → Homologous proteins tend to Function prediction → Homologous proteins tend to have similar functionshave similar functions

• Evolutionary dynamics → Tracing the evolution of Evolutionary dynamics → Tracing the evolution of genes (duplication, gene trees, origin of new gene genes (duplication, gene trees, origin of new gene families)families)

• Function prediction → Homologous proteins tend to Function prediction → Homologous proteins tend to have similar functionshave similar functions

• Evolutionary dynamics → Tracing the evolution of Evolutionary dynamics → Tracing the evolution of genes (duplication, gene trees, origin of new gene genes (duplication, gene trees, origin of new gene families)families)

Page 8: Bioinformatics and Evolutionary Genomics

How do we detect homologyHow do we detect homologyHow do we detect homologyHow do we detect homology

• Similarity of:Similarity of:

• 3D structure 3D structure → → most conserved aspect, yet not all structures are most conserved aspect, yet not all structures are available. Structures are compared and classified by “eye” and software available. Structures are compared and classified by “eye” and software packages (Dali). (NB classical homology); criterion shared packages (Dali). (NB classical homology); criterion shared “idiosyncratic” features that are not strictly necessary for function + “idiosyncratic” features that are not strictly necessary for function + sequence featuressequence features

• Sequence Sequence → → less conserved, many sequences are however available. less conserved, many sequences are however available. Homology determination is mainly based on models of sequence Homology determination is mainly based on models of sequence evolution and the likelihood that when you compare a sequence to a evolution and the likelihood that when you compare a sequence to a database you will find a sequence of at least that similarity. database you will find a sequence of at least that similarity.

• NB Manually curated databases of 3D structure similarity are used as a NB Manually curated databases of 3D structure similarity are used as a benchmark for detection of homology by sequence similarity (SCOP, benchmark for detection of homology by sequence similarity (SCOP, Blundels Bus).Blundels Bus).

• Similarity of:Similarity of:

• 3D structure 3D structure → → most conserved aspect, yet not all structures are most conserved aspect, yet not all structures are available. Structures are compared and classified by “eye” and software available. Structures are compared and classified by “eye” and software packages (Dali). (NB classical homology); criterion shared packages (Dali). (NB classical homology); criterion shared “idiosyncratic” features that are not strictly necessary for function + “idiosyncratic” features that are not strictly necessary for function + sequence featuressequence features

• Sequence Sequence → → less conserved, many sequences are however available. less conserved, many sequences are however available. Homology determination is mainly based on models of sequence Homology determination is mainly based on models of sequence evolution and the likelihood that when you compare a sequence to a evolution and the likelihood that when you compare a sequence to a database you will find a sequence of at least that similarity. database you will find a sequence of at least that similarity.

• NB Manually curated databases of 3D structure similarity are used as a NB Manually curated databases of 3D structure similarity are used as a benchmark for detection of homology by sequence similarity (SCOP, benchmark for detection of homology by sequence similarity (SCOP, Blundels Bus).Blundels Bus).

Page 9: Bioinformatics and Evolutionary Genomics

Gene / protein evolution: beyond blast, “distant Gene / protein evolution: beyond blast, “distant homology”homology”

Gene / protein evolution: beyond blast, “distant Gene / protein evolution: beyond blast, “distant homology”homology”

• Not obvious by blastNot obvious by blast• Substantial divergence, due to time and/or speedSubstantial divergence, due to time and/or speed• Use “profile” (HMMer or PSI-BLAST), Use “profile” (HMMer or PSI-BLAST), • In general work better becauseIn general work better because

• Not obvious by blastNot obvious by blast• Substantial divergence, due to time and/or speedSubstantial divergence, due to time and/or speed• Use “profile” (HMMer or PSI-BLAST), Use “profile” (HMMer or PSI-BLAST), • In general work better becauseIn general work better because

ECGHR ECGHR C G TCQQL SIGNL

ECNHN ECNHN

Page 10: Bioinformatics and Evolutionary Genomics

Gene / protein evolution: beyond blast, “distant Gene / protein evolution: beyond blast, “distant homology”homology”

Gene / protein evolution: beyond blast, “distant Gene / protein evolution: beyond blast, “distant homology”homology”

• PSI-BLAST a multiple sequence alignment is PSI-BLAST a multiple sequence alignment is generated on the fly to detect which generated on the fly to detect which residues/positions characterize the family.residues/positions characterize the family.

• OR use CDD, PFAM or SMARTOR use CDD, PFAM or SMART– Experts have collected representative and Experts have collected representative and

divergent members of a divergent members of a gene familygene family and use and use HMMer or RPS-BLAST to see if your query HMMer or RPS-BLAST to see if your query sequence belongs to this gene family (i.e. is sequence belongs to this gene family (i.e. is homologous to the members)homologous to the members)

– clearer/cleaner than psi-blast or blast.clearer/cleaner than psi-blast or blast.

• PSI-BLAST a multiple sequence alignment is PSI-BLAST a multiple sequence alignment is generated on the fly to detect which generated on the fly to detect which residues/positions characterize the family.residues/positions characterize the family.

• OR use CDD, PFAM or SMARTOR use CDD, PFAM or SMART– Experts have collected representative and Experts have collected representative and

divergent members of a divergent members of a gene familygene family and use and use HMMer or RPS-BLAST to see if your query HMMer or RPS-BLAST to see if your query sequence belongs to this gene family (i.e. is sequence belongs to this gene family (i.e. is homologous to the members)homologous to the members)

– clearer/cleaner than psi-blast or blast.clearer/cleaner than psi-blast or blast.

Page 11: Bioinformatics and Evolutionary Genomics

How to detect very distant homology / superfamiliesHow to detect very distant homology / superfamiliesHow to detect very distant homology / superfamiliesHow to detect very distant homology / superfamilies

• When two protein families When two protein families are homologous but the are homologous but the homology is not obvious they homology is not obvious they are part of the same so are part of the same so called superfamily called superfamily

How to detect: How to detect:

• In depth PSI-BLASTIn depth PSI-BLAST• ReciprocalReciprocal• Use of right seedUse of right seed• ““hopping” (homology is by hopping” (homology is by

definition transitive)definition transitive)

• When two protein families When two protein families are homologous but the are homologous but the homology is not obvious they homology is not obvious they are part of the same so are part of the same so called superfamily called superfamily

How to detect: How to detect:

• In depth PSI-BLASTIn depth PSI-BLAST• ReciprocalReciprocal• Use of right seedUse of right seed• ““hopping” (homology is by hopping” (homology is by

definition transitive)definition transitive)

Page 12: Bioinformatics and Evolutionary Genomics

Gene / protein evolution: Distant homologyGene / protein evolution: Distant homologyGene / protein evolution: Distant homologyGene / protein evolution: Distant homology

• alignment-vs-alignment, Profile-vs-profile, HMM vs alignment-vs-alignment, Profile-vs-profile, HMM vs HMM comparison (whereas HHMer, PSI-BLAST HMM comparison (whereas HHMer, PSI-BLAST compare a profile to a single sequence) compare a profile to a single sequence)

• Unfortunately statistic are still poorUnfortunately statistic are still poor• ““works” becauseworks” because

• alignment-vs-alignment, Profile-vs-profile, HMM vs alignment-vs-alignment, Profile-vs-profile, HMM vs HMM comparison (whereas HHMer, PSI-BLAST HMM comparison (whereas HHMer, PSI-BLAST compare a profile to a single sequence) compare a profile to a single sequence)

• Unfortunately statistic are still poorUnfortunately statistic are still poor• ““works” becauseworks” because

ACRNG ACRNGACGNR ACGNR C CTCQQL TCQQLTFQQI TCILL

Page 13: Bioinformatics and Evolutionary Genomics

Gene / protein evolution: Distant homologyGene / protein evolution: Distant homologyGene / protein evolution: Distant homologyGene / protein evolution: Distant homology

• 3D structure comparison/alignment plus visual 3D structure comparison/alignment plus visual inspection of multiple sequence alignment by Alexey inspection of multiple sequence alignment by Alexey MurzinMurzin

• The results of this are stored in the SCOP databaseThe results of this are stored in the SCOP database

• (Blundel’s bus)(Blundel’s bus)

• 3D structure comparison/alignment plus visual 3D structure comparison/alignment plus visual inspection of multiple sequence alignment by Alexey inspection of multiple sequence alignment by Alexey MurzinMurzin

• The results of this are stored in the SCOP databaseThe results of this are stored in the SCOP database

• (Blundel’s bus)(Blundel’s bus)

Page 14: Bioinformatics and Evolutionary Genomics

Structural alignmentStructural alignmentStructural alignmentStructural alignment

Secondary structure Secondary structure elementselements• Alpha-helicesAlpha-helices• Beta strands (beta Beta strands (beta sheets)sheets)• LoopsLoops

Fold vs superfamily?Fold vs superfamily?

Secondary structure Secondary structure elementselements• Alpha-helicesAlpha-helices• Beta strands (beta Beta strands (beta sheets)sheets)• LoopsLoops

Fold vs superfamily?Fold vs superfamily?

Page 15: Bioinformatics and Evolutionary Genomics

An example of distant homologyAn example of distant homologyAn example of distant homologyAn example of distant homology

• E.g. E.g. superfamilysuperfamily P-loop containing nucleoside P-loop containing nucleoside triphosphate hydrolasetriphosphate hydrolase

• In humans: AAA 130, ABC_tran 182, SMC_N 29In humans: AAA 130, ABC_tran 182, SMC_N 29

• Zot; UPF0079; TraG; SMC_N; SKI; Sigma54_activat; Zot; UPF0079; TraG; SMC_N; SKI; Sigma54_activat; Rep_fac_C; Rad17; NACHT; Mg_chelatase; MCM; Rep_fac_C; Rad17; NACHT; Mg_chelatase; MCM; KTI12; IstB; GSPII_E; DUF853; DNA_pol3_delta; KTI12; IstB; GSPII_E; DUF853; DNA_pol3_delta; Bac_DnaA; APS_kinase; ABC_tran; AAA_PrkA; Bac_DnaA; APS_kinase; ABC_tran; AAA_PrkA; AAA_5; AAA_3; AAA_2; AAA; AAA_5; AAA_3; AAA_2; AAA;

• E.g. E.g. superfamilysuperfamily P-loop containing nucleoside P-loop containing nucleoside triphosphate hydrolasetriphosphate hydrolase

• In humans: AAA 130, ABC_tran 182, SMC_N 29In humans: AAA 130, ABC_tran 182, SMC_N 29

• Zot; UPF0079; TraG; SMC_N; SKI; Sigma54_activat; Zot; UPF0079; TraG; SMC_N; SKI; Sigma54_activat; Rep_fac_C; Rad17; NACHT; Mg_chelatase; MCM; Rep_fac_C; Rad17; NACHT; Mg_chelatase; MCM; KTI12; IstB; GSPII_E; DUF853; DNA_pol3_delta; KTI12; IstB; GSPII_E; DUF853; DNA_pol3_delta; Bac_DnaA; APS_kinase; ABC_tran; AAA_PrkA; Bac_DnaA; APS_kinase; ABC_tran; AAA_PrkA; AAA_5; AAA_3; AAA_2; AAA; AAA_5; AAA_3; AAA_2; AAA;

Page 16: Bioinformatics and Evolutionary Genomics

Apart from sequence and structural features conservation Apart from sequence and structural features conservation of basic molecular functionof basic molecular function

Apart from sequence and structural features conservation Apart from sequence and structural features conservation of basic molecular functionof basic molecular function

Page 17: Bioinformatics and Evolutionary Genomics

Distant Homology:Distant Homology:Applications to function predictionApplications to function prediction

Distant Homology:Distant Homology:Applications to function predictionApplications to function prediction

• Bacterial protein of unknown function (DUF853) Bacterial protein of unknown function (DUF853)

• Member of the P-loop containing nucleoside Member of the P-loop containing nucleoside triphosphate hydrolase superfamilytriphosphate hydrolase superfamily

• Thus thought to be an ATPaseThus thought to be an ATPase

• Bacterial protein of unknown function (DUF853) Bacterial protein of unknown function (DUF853)

• Member of the P-loop containing nucleoside Member of the P-loop containing nucleoside triphosphate hydrolase superfamilytriphosphate hydrolase superfamily

• Thus thought to be an ATPaseThus thought to be an ATPase

Page 18: Bioinformatics and Evolutionary Genomics
Page 19: Bioinformatics and Evolutionary Genomics
Page 20: Bioinformatics and Evolutionary Genomics
Page 21: Bioinformatics and Evolutionary Genomics

Relevance of Relevance of homology for function homology for function

prediction: “Similar prediction: “Similar function“ What is function“ What is

function ?function ?

Relevance of Relevance of homology for function homology for function

prediction: “Similar prediction: “Similar function“ What is function“ What is

function ?function ?

• Various levels of Various levels of description:description:

• Sequence similarity, Sequence similarity, Homology has the largest Homology has the largest relevance for Molecular relevance for Molecular Function. This is aspect of Function. This is aspect of protein function that is protein function that is best conserved, protein best conserved, protein sequence, structure can sequence, structure can often be interpreted in often be interpreted in terms of function.terms of function.

• Various levels of Various levels of description:description:

• Sequence similarity, Sequence similarity, Homology has the largest Homology has the largest relevance for Molecular relevance for Molecular Function. This is aspect of Function. This is aspect of protein function that is protein function that is best conserved, protein best conserved, protein sequence, structure can sequence, structure can often be interpreted in often be interpreted in terms of function.terms of function.

Page 22: Bioinformatics and Evolutionary Genomics

Using distant homology for function prediction: example Using distant homology for function prediction: example from (just) before PSI-BLAST & HMMerfrom (just) before PSI-BLAST & HMMer

Using distant homology for function prediction: example Using distant homology for function prediction: example from (just) before PSI-BLAST & HMMerfrom (just) before PSI-BLAST & HMMer

Secreted Fringe-like Signaling Molecules May Be Secreted Fringe-like Signaling Molecules May Be Glycosyltransferases.  Glycosyltransferases. 

Cell. 1997 Jan 10;88(1):9-11. Cell. 1997 Jan 10;88(1):9-11.

Y. Yuan, J. Schultz, M. Mlodzik, P. BorkY. Yuan, J. Schultz, M. Mlodzik, P. Bork

Secreted Fringe-like Signaling Molecules May Be Secreted Fringe-like Signaling Molecules May Be Glycosyltransferases.  Glycosyltransferases. 

Cell. 1997 Jan 10;88(1):9-11. Cell. 1997 Jan 10;88(1):9-11.

Y. Yuan, J. Schultz, M. Mlodzik, P. BorkY. Yuan, J. Schultz, M. Mlodzik, P. Bork

Page 23: Bioinformatics and Evolutionary Genomics

Distant Homology: Application to evolutionDistant Homology: Application to evolutionDistant Homology: Application to evolutionDistant Homology: Application to evolution

• Invention vs (duplication and) divergenceInvention vs (duplication and) divergence

• First determine homology before putting sequences First determine homology before putting sequences in multiple sequence alignment & tree building in multiple sequence alignment & tree building softwaresoftware

• Two (or more) Proteins families that are present in all Two (or more) Proteins families that are present in all three kingdoms of life and which can be determined three kingdoms of life and which can be determined to be homologous to each other: Information from to be homologous to each other: Information from before the Last Universal Common Ancestor, before the Last Universal Common Ancestor, information about very early evolutioninformation about very early evolution

• Invention vs (duplication and) divergenceInvention vs (duplication and) divergence

• First determine homology before putting sequences First determine homology before putting sequences in multiple sequence alignment & tree building in multiple sequence alignment & tree building softwaresoftware

• Two (or more) Proteins families that are present in all Two (or more) Proteins families that are present in all three kingdoms of life and which can be determined three kingdoms of life and which can be determined to be homologous to each other: Information from to be homologous to each other: Information from before the Last Universal Common Ancestor, before the Last Universal Common Ancestor, information about very early evolutioninformation about very early evolution

b

Page 24: Bioinformatics and Evolutionary Genomics

Protein domains: structural definition: separate in Protein domains: structural definition: separate in structurestructure

Protein domains: structural definition: separate in Protein domains: structural definition: separate in structurestructure

• a a structural structural domaindomain ("domain") is an ("domain") is an element of element of overall structure overall structure that is self-that is self-stabilizing and stabilizing and often folds often folds independently of independently of the rest of the the rest of the protein chain protein chain

• a a structural structural domaindomain ("domain") is an ("domain") is an element of element of overall structure overall structure that is self-that is self-stabilizing and stabilizing and often folds often folds independently of independently of the rest of the the rest of the protein chain protein chain

Page 25: Bioinformatics and Evolutionary Genomics

Protein domains: sequence/evolutionary definition: Protein domains: sequence/evolutionary definition: Separate in “evolution”

Protein domains: sequence/evolutionary definition: Protein domains: sequence/evolutionary definition: Separate in “evolution”

• Homologous parts of proteins that occur with different Homologous parts of proteins that occur with different “partners”“partners”

• Mobile Mobile • ModulesModules• Almost always same as structural definitionAlmost always same as structural definition

• Homologous parts of proteins that occur with different Homologous parts of proteins that occur with different “partners”“partners”

• Mobile Mobile • ModulesModules• Almost always same as structural definitionAlmost always same as structural definition

Page 26: Bioinformatics and Evolutionary Genomics

Implications of domains for homology:Implications of domains for homology:Implications of domains for homology:Implications of domains for homology:

• The shared ancestry is not a property of the whole The shared ancestry is not a property of the whole gene but only of part of the gene.gene but only of part of the gene.

• When studying the evolution of gene families, When studying the evolution of gene families, consider fusions / domain combinations (also when consider fusions / domain combinations (also when making trees etc.)making trees etc.)

• The shared ancestry is not a property of the whole The shared ancestry is not a property of the whole gene but only of part of the gene.gene but only of part of the gene.

• When studying the evolution of gene families, When studying the evolution of gene families, consider fusions / domain combinations (also when consider fusions / domain combinations (also when making trees etc.)making trees etc.)

Page 27: Bioinformatics and Evolutionary Genomics

Domain repeats. Homology?Domain repeats. Homology?Domain repeats. Homology?Domain repeats. Homology?

• Blast homology vs Blast homology vs the “real” the “real” homology unithomology unit

• Q8TKV1 Q8TKV1 ((MethanosarcinaMethanosarcina acetivoransacetivorans))

• ??

• Blast homology vs Blast homology vs the “real” the “real” homology unithomology unit

• Q8TKV1 Q8TKV1 ((MethanosarcinaMethanosarcina acetivoransacetivorans))

• ??

Page 28: Bioinformatics and Evolutionary Genomics

Q8TKV1Q8TKV1Q8TKV1Q8TKV1

Page 29: Bioinformatics and Evolutionary Genomics

Ramifications for function prediction & understanding of Ramifications for function prediction & understanding of cellular processes: “one domain one (molecular) function” cellular processes: “one domain one (molecular) function”

(in contrast to one gene one function)(in contrast to one gene one function)

Ramifications for function prediction & understanding of Ramifications for function prediction & understanding of cellular processes: “one domain one (molecular) function” cellular processes: “one domain one (molecular) function”

(in contrast to one gene one function)(in contrast to one gene one function)

• This bit does this and that bit does thatThis bit does this and that bit does that• E.g. E.g.

– multidomain enzymesmultidomain enzymes– Transcriptional regulatorsTranscriptional regulators

• This bit does this and that bit does thatThis bit does this and that bit does that• E.g. E.g.

– multidomain enzymesmultidomain enzymes– Transcriptional regulatorsTranscriptional regulators

Page 30: Bioinformatics and Evolutionary Genomics

Example multidomain enzyme: TrpG Example multidomain enzyme: TrpG E.coliE.coliExample multidomain enzyme: TrpG Example multidomain enzyme: TrpG E.coliE.coli

Page 31: Bioinformatics and Evolutionary Genomics

Ramifications for function prediction when doing Ramifications for function prediction when doing blast: mind the domainsblast: mind the domains

Ramifications for function prediction when doing Ramifications for function prediction when doing blast: mind the domainsblast: mind the domains

Protein B is wrongly annotated as having the function of Protein B is wrongly annotated as having the function of domain 1, based on homology with the multidomain domain 1, based on homology with the multidomain protein A, but not with domain 1protein A, but not with domain 1

(multi-domain architecture problem for annotating proteins via blast)(multi-domain architecture problem for annotating proteins via blast)

Protein B is wrongly annotated as having the function of Protein B is wrongly annotated as having the function of domain 1, based on homology with the multidomain domain 1, based on homology with the multidomain protein A, but not with domain 1protein A, but not with domain 1

(multi-domain architecture problem for annotating proteins via blast)(multi-domain architecture problem for annotating proteins via blast)

1 2

A

BB

Page 32: Bioinformatics and Evolutionary Genomics

Ramifications for function prediction when doing Ramifications for function prediction when doing blast: mind the domainsblast: mind the domains

Ramifications for function prediction when doing Ramifications for function prediction when doing blast: mind the domainsblast: mind the domains

Protein B is incompletely annotated as having the Protein B is incompletely annotated as having the function of domain 2, based on homology with the function of domain 2, based on homology with the single domain protein A, the second domain is missed single domain protein A, the second domain is missed in the annotationin the annotation

Protein B is incompletely annotated as having the Protein B is incompletely annotated as having the function of domain 2, based on homology with the function of domain 2, based on homology with the single domain protein A, the second domain is missed single domain protein A, the second domain is missed in the annotationin the annotation

21

A

BB

Page 33: Bioinformatics and Evolutionary Genomics

Ramifications for function predictionRamifications for function predictionwhen doing blast do psi-blast, cdd / pfam instead.when doing blast do psi-blast, cdd / pfam instead.

Ramifications for function predictionRamifications for function predictionwhen doing blast do psi-blast, cdd / pfam instead.when doing blast do psi-blast, cdd / pfam instead.

• Rather than discover the domain structure by blast Rather than discover the domain structure by blast yourself, use e.g. SMART / PFAM / CDD to do it for yourself, use e.g. SMART / PFAM / CDD to do it for you you

• NB CDDNB CDD

• Rather than discover the domain structure by blast Rather than discover the domain structure by blast yourself, use e.g. SMART / PFAM / CDD to do it for yourself, use e.g. SMART / PFAM / CDD to do it for you you

• NB CDDNB CDD

Page 34: Bioinformatics and Evolutionary Genomics

Domains and distant homologiesDomains and distant homologiesDomains and distant homologiesDomains and distant homologies

• Promiscuous domains (i.e. that are present in many proteins), are often Promiscuous domains (i.e. that are present in many proteins), are often quite diverged and thus need sensitive homology detection tools in quite diverged and thus need sensitive homology detection tools in order to be recognized..order to be recognized..

• Moreover it is often only the most general functional property of the Moreover it is often only the most general functional property of the domain that is conserved over such long evolutionary distancesdomain that is conserved over such long evolutionary distances

• Over long evolutionary distances genes are often only homologous in Over long evolutionary distances genes are often only homologous in the sense that they share a domain, rather than being full length the sense that they share a domain, rather than being full length homologoushomologous

• We THUS use PFAM/SMART etc. forWe THUS use PFAM/SMART etc. for1.1. The domainsThe domains2.2. And to improve upon BLAST / be cleaner than PSI-BLASTAnd to improve upon BLAST / be cleaner than PSI-BLAST3.3. And because most of the sequences are covered by these And because most of the sequences are covered by these

database. No need to reinvent the wheel. The ones that are not, database. No need to reinvent the wheel. The ones that are not, are often “non globular”, recent inventions, or very fast evolving are often “non globular”, recent inventions, or very fast evolving

• Promiscuous domains (i.e. that are present in many proteins), are often Promiscuous domains (i.e. that are present in many proteins), are often quite diverged and thus need sensitive homology detection tools in quite diverged and thus need sensitive homology detection tools in order to be recognized..order to be recognized..

• Moreover it is often only the most general functional property of the Moreover it is often only the most general functional property of the domain that is conserved over such long evolutionary distancesdomain that is conserved over such long evolutionary distances

• Over long evolutionary distances genes are often only homologous in Over long evolutionary distances genes are often only homologous in the sense that they share a domain, rather than being full length the sense that they share a domain, rather than being full length homologoushomologous

• We THUS use PFAM/SMART etc. forWe THUS use PFAM/SMART etc. for1.1. The domainsThe domains2.2. And to improve upon BLAST / be cleaner than PSI-BLASTAnd to improve upon BLAST / be cleaner than PSI-BLAST3.3. And because most of the sequences are covered by these And because most of the sequences are covered by these

database. No need to reinvent the wheel. The ones that are not, database. No need to reinvent the wheel. The ones that are not, are often “non globular”, recent inventions, or very fast evolving are often “non globular”, recent inventions, or very fast evolving

Page 35: Bioinformatics and Evolutionary Genomics

Disclaimer: non-globular regionsDisclaimer: non-globular regionsDisclaimer: non-globular regionsDisclaimer: non-globular regions

• Low complexity Low complexity • Unstructured, Elongated (as opposed to globular)Unstructured, Elongated (as opposed to globular)• Many polar/charged residues; few hydrophobic Many polar/charged residues; few hydrophobic

residuesresidues• parts of proteins that do not posses a clear 3D parts of proteins that do not posses a clear 3D

structurestructure• ConvergenceConvergence• Do not obey PAM or BLOSUMDo not obey PAM or BLOSUM

• Low complexity Low complexity • Unstructured, Elongated (as opposed to globular)Unstructured, Elongated (as opposed to globular)• Many polar/charged residues; few hydrophobic Many polar/charged residues; few hydrophobic

residuesresidues• parts of proteins that do not posses a clear 3D parts of proteins that do not posses a clear 3D

structurestructure• ConvergenceConvergence• Do not obey PAM or BLOSUMDo not obey PAM or BLOSUM

Page 36: Bioinformatics and Evolutionary Genomics

Disclaimer: Coiled coilDisclaimer: Coiled coilDisclaimer: Coiled coilDisclaimer: Coiled coil

• All alpha: thought to arise independently All alpha: thought to arise independently (convergence)(convergence)

• Hypothesis: reservoir for “new” folds: all alpha folds Hypothesis: reservoir for “new” folds: all alpha folds (Koonin EV)(Koonin EV)

• E.g. ras / rho / rab / ran / -GAPsE.g. ras / rho / rab / ran / -GAPs

• All alpha: thought to arise independently All alpha: thought to arise independently (convergence)(convergence)

• Hypothesis: reservoir for “new” folds: all alpha folds Hypothesis: reservoir for “new” folds: all alpha folds (Koonin EV)(Koonin EV)

• E.g. ras / rho / rab / ran / -GAPsE.g. ras / rho / rab / ran / -GAPs

Page 37: Bioinformatics and Evolutionary Genomics

Disclaimer: Other protein motifsDisclaimer: Other protein motifsDisclaimer: Other protein motifsDisclaimer: Other protein motifs

• Signal peptidesSignal peptides• Lipid anchoringLipid anchoring• Convergence yet still important to predictConvergence yet still important to predict• Trans-membrane?Trans-membrane?

• Signal peptidesSignal peptides• Lipid anchoringLipid anchoring• Convergence yet still important to predictConvergence yet still important to predict• Trans-membrane?Trans-membrane?

Page 38: Bioinformatics and Evolutionary Genomics

Interesting result on protein evolution regarding domains Interesting result on protein evolution regarding domains and duplications: neutral?and duplications: neutral?

Interesting result on protein evolution regarding domains Interesting result on protein evolution regarding domains and duplications: neutral?and duplications: neutral?

Black observedBlack observedBlue: model of recombinationBlue: model of recombination& duplication separate& duplication separateRed: also duplication of Red: also duplication of combinations combinations b