25

Click here to load reader

Presented by Krishna Balasubramanian

  • Upload
    nessa

  • View
    48

  • Download
    5

Embed Size (px)

DESCRIPTION

Use of Logic Relationships to Decipher Protein Network Organization Peter M. Bowers, Shawn J. Cokus, David Eisenberg, Todd O. Yeates. Presented by Krishna Balasubramanian. Contents. Introduction Background Method Used - LAPP Results Observations Conclusion Future Work. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: Presented by Krishna Balasubramanian

Use of Logic RelationshipsUse of Logic Relationshipsto Decipher Proteinto Decipher Protein

Network OrganizationNetwork OrganizationPeter M. Bowers, Shawn J. Peter M. Bowers, Shawn J.

Cokus,Cokus,David Eisenberg, Todd O. YeatesDavid Eisenberg, Todd O. Yeates

Presented by Krishna Presented by Krishna BalasubramanianBalasubramanian

Page 2: Presented by Krishna Balasubramanian

22

ContentsContents

IntroductionIntroduction BackgroundBackground Method Used - LAPPMethod Used - LAPP ResultsResults ObservationsObservations ConclusionConclusion Future WorkFuture Work

Page 3: Presented by Krishna Balasubramanian

33

IntroductionIntroduction

Major focus of genome research:Major focus of genome research:– Deciphering networks of molecular Deciphering networks of molecular

interactions underlying cellular function.interactions underlying cellular function. Developed a Computational approach: Developed a Computational approach:

– Identify detailed relationships btw proteins Identify detailed relationships btw proteins based on genomic data.based on genomic data.

The method reveals many previously The method reveals many previously unidentified higher order relationshipsunidentified higher order relationships

Page 4: Presented by Krishna Balasubramanian

44

BackgroundBackground Patterns across multiple complete genomes Patterns across multiple complete genomes

have been used to infer biological have been used to infer biological interactions and functional linkages btw interactions and functional linkages btw proteins:proteins:– 2 distinct proteins from one organism 2 distinct proteins from one organism

genetically fused into a single protein in another genetically fused into a single protein in another organism.organism.

– Tendency of 2 proteins to occur in chromosomal Tendency of 2 proteins to occur in chromosomal proximity across multiple organisms.proximity across multiple organisms.

– Phylogenetic profile approach Phylogenetic profile approach Detects functional relationships btw proteins Detects functional relationships btw proteins

exhibiting statistically similar patterns of exhibiting statistically similar patterns of presence or absence.presence or absence.

Determine pattern describing a protein’s Determine pattern describing a protein’s presence or absence by searching for its presence or absence by searching for its homologs across N organisms.homologs across N organisms.

Page 5: Presented by Krishna Balasubramanian

55

BackgroundBackground Original implementations sought to infer “links” btw pairs of Original implementations sought to infer “links” btw pairs of

proteins with similar profiles.proteins with similar profiles.

A subsequent variation on that idea linked proteins if their profiles A subsequent variation on that idea linked proteins if their profiles represented the negation of each other.represented the negation of each other.

Simple notions - with the presence of one protein implying the Simple notions - with the presence of one protein implying the presence or absence of another.presence or absence of another.

Such simple relationships cannot adequately describe the full Such simple relationships cannot adequately describe the full complexity of cellular networks that involve branching, parallel, complexity of cellular networks that involve branching, parallel, and alternate pathways.and alternate pathways.

Higher order logic relationships involving a pattern of Higher order logic relationships involving a pattern of presence/absence of multiple proteins expected due to:presence/absence of multiple proteins expected due to:– Observed complexity of cellular networks.Observed complexity of cellular networks.– Evolutionary divergence, convergence, and horizontal transfer Evolutionary divergence, convergence, and horizontal transfer

events.events.

Page 6: Presented by Krishna Balasubramanian

66

Method - LAPPMethod - LAPP

Perform complete analysis of logic relations Perform complete analysis of logic relations possible btw triplets of phylogenetic profiles.possible btw triplets of phylogenetic profiles.

Demonstrate the power of the resulting logic Demonstrate the power of the resulting logic analysis of phylogenetic profiles (LAPP) to:analysis of phylogenetic profiles (LAPP) to:– Illuminate relationships among multiple proteins. Illuminate relationships among multiple proteins. – Infer the coarse function of large numbers of Infer the coarse function of large numbers of

uncharacterized protein families.uncharacterized protein families.

Page 7: Presented by Krishna Balasubramanian

77

Logical Relationships to determine presence/absence of ProteinsLogical Relationships to determine presence/absence of Proteins

Venn diagrams and logic statements show the 8 distinct kinds Venn diagrams and logic statements show the 8 distinct kinds of logic functions that describe the possible dependence of of logic functions that describe the possible dependence of the presence of on the presence of A and B, jointly.the presence of on the presence of A and B, jointly.

Logic functions are grouped together if they are related by a Logic functions are grouped together if they are related by a simple exchange of proteins A and B.simple exchange of proteins A and B.

Page 8: Presented by Krishna Balasubramanian

88

Logical Relationships to determine presence/absence Logical Relationships to determine presence/absence of Proteinsof Proteins

There are 8 possible logic relationships combining two There are 8 possible logic relationships combining two phylogenetic profiles to match a third profile.phylogenetic profiles to match a third profile.

E.g. 1: protein C might be present if and only if proteins A and B E.g. 1: protein C might be present if and only if proteins A and B are both present.are both present.– Function of protein C is necessary only when the functions of proteins A Function of protein C is necessary only when the functions of proteins A

and B are both present.and B are both present.

Gene C may be present if and only if either A or B is present.Gene C may be present if and only if either A or B is present.– Different organisms use two different protein families in combination Different organisms use two different protein families in combination

with a common third protein to accomplish some task.with a common third protein to accomplish some task.

Several of the eight possible logic relationships intuitively Several of the eight possible logic relationships intuitively understood to describe commonly observed biological scenarios.understood to describe commonly observed biological scenarios.

However, a few of the logic relationships are not easily related to However, a few of the logic relationships are not easily related to real biological situations.real biological situations.

Page 9: Presented by Krishna Balasubramanian

99

Examples of LAPP based on Phylogenetic Examples of LAPP based on Phylogenetic ProfilesProfiles Phylogenetic Profiles Biological examples of LAPPPhylogenetic Profiles Biological examples of LAPP

Page 10: Presented by Krishna Balasubramanian

1010

Examples of LAPP .. Cont’dExamples of LAPP .. Cont’d

Hypothetical phylogenetic profiles are Hypothetical phylogenetic profiles are used to illustrate the eight possible logic used to illustrate the eight possible logic functions.functions.

Real biological e.g. shown to illustrate the Real biological e.g. shown to illustrate the ternary relationships identified from actual ternary relationships identified from actual phylogenetic profiles for the 4 most phylogenetic profiles for the 4 most commonly observed logic types.commonly observed logic types.

Page 11: Presented by Krishna Balasubramanian

1111

Identifying Protein TripletsIdentifying Protein Triplets

Created a set of binary-valued vectors describing the Created a set of binary-valued vectors describing the presence or absence of each of the known protein presence or absence of each of the known protein families across 67 fully sequenced organisms.families across 67 fully sequenced organisms.

Categorized complete set of proteins into 4873 Categorized complete set of proteins into 4873 distinct families called clusters of orthologous groups distinct families called clusters of orthologous groups (COGs).(COGs).

Examined all triplet combinations of profiles and Examined all triplet combinations of profiles and rank-ordered them according to how well the logical rank-ordered them according to how well the logical combination f (a,b) of two profiles predicted a third combination f (a,b) of two profiles predicted a third profile, c.profile, c.

Neither profile a nor b alone was predictive of c.Neither profile a nor b alone was predictive of c.

Page 12: Presented by Krishna Balasubramanian

1212

Identifying Protein TripletsIdentifying Protein Triplets

Uncertainty Coefficients calculated for U(c|a), U(c|Uncertainty Coefficients calculated for U(c|a), U(c|b), and the logically combined profile U(c|f (a,b))b), and the logically combined profile U(c|f (a,b))– U(x|y) = [H(x) + H(y) – H(x, y)]/H(x)U(x|y) = [H(x) + H(y) – H(x, y)]/H(x)– H is the entropy of individual/joint distributionsH is the entropy of individual/joint distributions

U can range between 1.0, where x is a U can range between 1.0, where x is a deterministic function of y, and 0.0, where x is deterministic function of y, and 0.0, where x is completely independent of y.completely independent of y.

Selected triplets whose individual pairwise Selected triplets whose individual pairwise uncertainty scores described protein profile c uncertainty scores described protein profile c poorly [U(c|a) < 0.3 and U(c|b) < 0.3] but whose poorly [U(c|a) < 0.3 and U(c|b) < 0.3] but whose logically combined profile [U(c|f (a,b)) > 0.6] logically combined profile [U(c|f (a,b)) > 0.6] described c well.described c well.

Page 13: Presented by Krishna Balasubramanian

1313

ExampleExample Synthesis of aromatic Synthesis of aromatic

amino acids proceeds amino acids proceeds through the shikimate through the shikimate pathway.pathway.

Logic analysis of 5 Logic analysis of 5 participating proteins participating proteins show:show:– Shikimate can be Shikimate can be

converted to the end converted to the end product prephenate by product prephenate by one of two possible routes, one of two possible routes, leading to a type 7 logic leading to a type 7 logic relationship.relationship.

Example showing triplet andExample showing triplet andpairwise uncertaintypairwise uncertaintycoefficients, U.coefficients, U.

Page 14: Presented by Krishna Balasubramanian

1414

ResultsResults When either one shikimate kinase protein family (protein A, When either one shikimate kinase protein family (protein A,

COG1685) or an alternate shikimate kinase protein family (protein B, COG1685) or an alternate shikimate kinase protein family (protein B, COG0703) is present in an organism, then excitatory postsynaptic COG0703) is present in an organism, then excitatory postsynaptic potential (EPSP) synthase must also be present (protein C, COG0128) potential (EPSP) synthase must also be present (protein C, COG0128) (U 0 0.85) to carry out the subsequent enzymatic step.(U 0 0.85) to carry out the subsequent enzymatic step.

The same type 7 logic relationship is also observed between The same type 7 logic relationship is also observed between alternate shikimate kinase enzymes and the successive chorismate alternate shikimate kinase enzymes and the successive chorismate synthase (protein D, COG0082) and chorismate mutase (protein E, synthase (protein D, COG0082) and chorismate mutase (protein E, COG1605) enzymatic steps of the pathway.COG1605) enzymatic steps of the pathway.

The ordering of the metabolic steps that follow shikimate kinase is The ordering of the metabolic steps that follow shikimate kinase is predicted by the value of successive U coefficients, where EPSP predicted by the value of successive U coefficients, where EPSP synthase (second step, U 0 0.85) is most strongly linked to shikimate synthase (second step, U 0 0.85) is most strongly linked to shikimate kinase, followed directly by the chorismate synthase (third step, U 0 kinase, followed directly by the chorismate synthase (third step, U 0 0.66) and lastly by chorismate mutase (fourth step, U 0 0.56).0.66) and lastly by chorismate mutase (fourth step, U 0 0.56).

Page 15: Presented by Krishna Balasubramanian

1515

Results Cont’dResults Cont’d

Organisms synthesize chorismate and prephenate from Organisms synthesize chorismate and prephenate from shikimate with the use of only one of two possible alternate shikimate with the use of only one of two possible alternate routes: pathways consisting of either ordered enzymes A-C-routes: pathways consisting of either ordered enzymes A-C-D-E or enzymes B-C-D-E.D-E or enzymes B-C-D-E.

LAPP recovers 750,000 previously unknown relationships LAPP recovers 750,000 previously unknown relationships among protein families (U(c|(f(a,b)) > 0.60; U(c|b) < 0.30; among protein families (U(c|(f(a,b)) > 0.60; U(c|b) < 0.30; U(c|a) < 0.30).U(c|a) < 0.30).

Validity assessed by comparing known annotations of the Validity assessed by comparing known annotations of the linked proteins.linked proteins.

The ability to recover links between proteins annotated as The ability to recover links between proteins annotated as belonging to a major functional category has been used belonging to a major functional category has been used widely to corroborate computational inferences of protein widely to corroborate computational inferences of protein interactions.interactions.

Page 16: Presented by Krishna Balasubramanian

1616

ObservationsObservations One of the most frequently One of the most frequently

observed triplet relationships observed triplet relationships relates three proteins belonging to relates three proteins belonging to the cell motility category, the cell motility category, confirmation that the triplet confirmation that the triplet associations link proteins closely associations link proteins closely related in function.related in function.

Other triplets involve two proteins Other triplets involve two proteins from the motility category and a from the motility category and a third protein of another COG third protein of another COG category, producing recognizable category, producing recognizable horizontal and vertical bands in the horizontal and vertical bands in the histogram.histogram.

E.g. the category combinations E.g. the category combinations NNU (COG category U, intracellular NNU (COG category U, intracellular trafficking and secretion) and NNS trafficking and secretion) and NNS (COG category S, unknown (COG category S, unknown function) are also plentiful.function) are also plentiful.

Connections between these Connections between these categories make intuitive sense categories make intuitive sense and facilitate placement of and facilitate placement of unannotated proteins within the unannotated proteins within the context of specific cellular networks context of specific cellular networks of interacting proteins.of interacting proteins.

Section taken from a 3-D histogram that describes the frequency of observed logic relationships in which protein A of the triplet is annotated as belonging to the COG functional category N, cell motility.

Page 17: Presented by Krishna Balasubramanian

1717

ObservationsObservations LAPP leads to a set of LAPP leads to a set of

statistically significant statistically significant ternary relationships that are ternary relationships that are distinct from and more distinct from and more numerous than the ones numerous than the ones inferred using traditional inferred using traditional pairwise analysis.pairwise analysis.

Matrix of randomized Matrix of randomized phylogenetic profiles, phylogenetic profiles, containing the same containing the same individual and pairwise individual and pairwise distributions as the native distributions as the native profiles used to assess the profiles used to assess the probability of observing a probability of observing a given uncertainty coefficient given uncertainty coefficient score by chance.score by chance.

Triplets with U > 0.60 are Triplets with U > 0.60 are observed from the unshuffled observed from the unshuffled vectors ~10vectors ~1022 times more times more frequently than from shuffled frequently than from shuffled profiles and ~10profiles and ~1044 more more frequently when U > 0.80.frequently when U > 0.80.

Plot of the cumulative number Plot of the cumulative number of protein triplets recovered atof protein triplets recovered atan uncertainty coefficient an uncertainty coefficient score greater than a given score greater than a given threshold.threshold.

Page 18: Presented by Krishna Balasubramanian

1818

Observations Cont’dObservations Cont’d

P value calculated for each triplet relationship by P value calculated for each triplet relationship by enumerating all possible values of U that could be enumerating all possible values of U that could be obtained from shuffled profiles while maintaining obtained from shuffled profiles while maintaining the individual and pairwise distributions.the individual and pairwise distributions.

P = number of trials that exceed the observed P = number of trials that exceed the observed value of U divided by the total number of trials.value of U divided by the total number of trials.

More than 98% of the identified triplets (U > 0.6) More than 98% of the identified triplets (U > 0.6) have P < 0.05, and more than 75% of the have P < 0.05, and more than 75% of the identified triplets have P < 0.005.identified triplets have P < 0.005.

Page 19: Presented by Krishna Balasubramanian

1919

ObservationsObservations The 8 distinct logic types The 8 distinct logic types

occur with widely varying occur with widely varying frequencies within the set of frequencies within the set of significant ternary significant ternary relationships.relationships.

Consistent with our Consistent with our understanding of evolution & understanding of evolution & biological relationships.biological relationships.

Logic types 1, 3, 5, and 7 are Logic types 1, 3, 5, and 7 are observed frequently in the observed frequently in the biological data.biological data.

Logic types 2, 4, and 8 are Logic types 2, 4, and 8 are more difficult to relate to more difficult to relate to simple cellular logic and are simple cellular logic and are observed only rarely.observed only rarely.

Number of identified triplets (U Number of identified triplets (U >>0.6) for each of the eight logic 0.6) for each of the eight logic function types for randomizedfunction types for randomized(black) and real (gray) (black) and real (gray) phylogenetic profiles.phylogenetic profiles.

Page 20: Presented by Krishna Balasubramanian

2020

ObservationsObservations

50 highest scoring relationships (U > 0.75) involving proteins 50 highest scoring relationships (U > 0.75) involving proteins fromfromthe cell motility and intracellular trafficking and secretion the cell motility and intracellular trafficking and secretion functionalfunctionalcategories.categories.

Page 21: Presented by Krishna Balasubramanian

2121

Observations cont’dObservations cont’d

Cell motility proteins are colored light Cell motility proteins are colored light blue, intracellular trafficking and secretion blue, intracellular trafficking and secretion are colored magenta, and proteins are colored magenta, and proteins annotated as both are colored in orange.annotated as both are colored in orange.

Edges are shown between proteins A-C Edges are shown between proteins A-C and B-C of each logic triplet, with each and B-C of each logic triplet, with each edge labeled according to the logic edge labeled according to the logic function type used to associate the function type used to associate the proteins families.proteins families.

Page 22: Presented by Krishna Balasubramanian

2222

Observations cont’dObservations cont’d

The proteins linked include adhesin proteins The proteins linked include adhesin proteins necessary for bacterial pathogenesis, chemotaxis necessary for bacterial pathogenesis, chemotaxis proteins, and translocase proteins.proteins, and translocase proteins.

Network contains previously unknown Network contains previously unknown interactions that suggest mechanisms connecting interactions that suggest mechanisms connecting bacterial pathogenesis and chemotaxis.bacterial pathogenesis and chemotaxis.

CheZ, a chemotaxis dephosphorylase that CheZ, a chemotaxis dephosphorylase that regulates cell motility, is linked to the surface regulates cell motility, is linked to the surface receptor and virulence factors adhesin AidA and receptor and virulence factors adhesin AidA and Flp pilus-associated FimT.Flp pilus-associated FimT.

Page 23: Presented by Krishna Balasubramanian

2323

ConclusionConclusion

New higher order protein associations New higher order protein associations detected by LAPP provides a framework to detected by LAPP provides a framework to understand the complex logical dependencies understand the complex logical dependencies that relate proteins to one another in the cell.that relate proteins to one another in the cell.

Also useful in:Also useful in:– Modeling and engineering biological systemsModeling and engineering biological systems– Generating biological hypotheses for Generating biological hypotheses for

experimentationexperimentation– Investigating additional protein propertiesInvestigating additional protein properties

Page 24: Presented by Krishna Balasubramanian

2424

Future WorkFuture Work In all likelihood, logic relationships btw In all likelihood, logic relationships btw

proteins in the cell extend beyond ternary proteins in the cell extend beyond ternary relationships to include much larger sets of relationships to include much larger sets of proteins.proteins.

Ideas underlying the logical analysis of Ideas underlying the logical analysis of phylogenetic profiles can be extended to the phylogenetic profiles can be extended to the investigation of other kinds of genomic data:investigation of other kinds of genomic data:– Gene expression, Gene expression, – Nucleotide polymorphismNucleotide polymorphism– Phenotype dataPhenotype data

Page 25: Presented by Krishna Balasubramanian

2525

Questions?? Questions??