45
Analysis of Gene Analysis of Gene Networks and Signaling Networks and Signaling Pathways Based on Gene Pathways Based on Gene Expression and Proteome Expression and Proteome Data Data Marek Kimmel Marek Kimmel Rice University Rice University Houston, TX, USA Houston, TX, USA [email protected] [email protected]

Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

  • Upload
    jena

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data. Marek Kimmel Rice University Houston, TX, USA [email protected]. Outline. Basics: gene expression vs. protein abundance. Perceptron analysis of gene networks - PowerPoint PPT Presentation

Citation preview

Page 1: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Analysis of Gene Networks Analysis of Gene Networks and Signaling Pathways and Signaling Pathways

Based on Gene Expression Based on Gene Expression and Proteome Dataand Proteome Data

Marek KimmelMarek Kimmel

Rice UniversityRice University

Houston, TX, USAHouston, TX, USA

[email protected]@rice.edu

Page 2: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

OutlineOutline

• Basics: gene expression vs. protein Basics: gene expression vs. protein abundance.abundance.

• Perceptron analysis of gene Perceptron analysis of gene networksnetworks

• Proteomic analysis of FGF-2 Proteomic analysis of FGF-2 signaling in breast cancersignaling in breast cancer

Page 3: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Now that Now that we have we have

the the sequence sequence

of the of the HumanHuman

Genome –Genome –

What Next?What Next?

Page 4: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

ClinicalSciences

BasicSciences

Molecular Medicine

Structural Structural BiologyBiology

GenomicsGenomicsProteomicsProteomics

BioinformaticsBioinformatics

Page 5: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

BCM-BCM-HGSCHGSC

Genes make uponly 3%

of the genome

30,000

Page 6: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Measuring Gene Expression: Measuring Gene Expression: Oligonucleotide Gene MicroarraysOligonucleotide Gene Microarrays

• A Probe Pair consists of a Perfect Match (PM) & a Mismatch (MM).

• There are typically 20 Probe Pairs in a Probe Set.

• A Probe Set usually corresponds to a single gene.

• The Affymetrix 95A human GeneChip contains 12,626 Probe Sets.

• Thus, there are almost 500,000 Probe Cells on a GeneChip.

Affymetrix GeneChipsAffymetrix GeneChips™™

Page 7: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Oligonucleotide Gene MicroarraysOligonucleotide Gene Microarrays

Each probe is 25 nucleotides long

Affymetrix GeneChips™

Page 8: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

mRNA PreparationmRNA PreparationGAATTCAGTAACCCAGGCATTATTTTATCCTCAAGTCTTAGGTTGGTTGGAGAAAGATAACAAAAAGAAACATGA TTGTGCAGAAACAGACAAACCTTTTTGGAAAGCATTTGAAAATGGCATTCCCCCTCCACAGTGTGTTCACAGTGT GGGCAAATTCACTGCTCTGTCGTACTTTCTGAAAATGAAGAACTGTTACACCAAGGTGAATTATTTATAAATTAT GTACTTGCCCAGAAGCGAACAGACTTTTACTATCATAAGAACCCTTCCTTGGTGTGCTCTTTATCTACAGAATCC AAGACCTTTCAAGAAAGGTCTTGGATTCTTTTCTTCAGGACACTAGGACATAAAGCCACCTTTTTATGATTTGTT GAAATTTCTCACTCCATCCCTTTTGCTGATGATCATGGGTCCTCAGAGGTCAGACTTGGTGTCCTTGGATAAAGA GCATGAAGCAACAGTGGCTGAACCAGAGTTGGAACCCAGATGCTCTTTCCACTAAGCATACAACTTTCCATTAGA TAACACCTCCCTCCCACCCCAACCAAGCAGCTCCAGTGCACCACTTTCTGGAGCATAAACATACCTTAACTTTAC AACTTGAGTGGCCTTGAATACTGTTCCTATCTGGAATGTGCTGTTCTCTT

5’

3’

DNA Sequence for IL-8

GAATTCAGTAACCCAGGCATTATTT|TATCCTCAAGTCTTAGGTTGGTTGG|AGAAAGATAACAAAAAGAAACATGA| TTGTGCAGAAACAGACAAACCTTTT|TGGAAAGCATTTGAAAATGGCATTC|CCCCTCCACAGTGTGTTCACAGTGT| GGGCAAATTCACTGCTCTGTCGTAC|TTTCTGAAAATGAAGAACTGTTACA|CCAAGGTGAATTATTTATAAATTAT| GTACTTGCCCAGAAGCGAACAGACT|TTTACTATCATAAGAACCCTTCCTT|GGTGTGCTCTTTATCTACAGAATCC| AAGACCTTTCAAGAAAGGTCTTGGA|TTCTTTTCTTCAGGACACTAGGACA|TAAAGCCACCTTTTTATGATTTGTT| GAAATTTCTCACTCCATCCCTTTTG|CTGATGATCATGGGTCCTCAGAGGT|CAGACTTGGTGTCCTTGGATAAAGA| GCATGAAGCAACAGTGGCTGAACCA|GAGTTGGAACCCAGATGCTCTTTCC|ACTAAGCATACAACTTTCCATTAGA| TAACACCTCCCTCCCACCCCAACCA|AGCAGCTCCAGTGCACCACTTTCTG|GAGCATAAACATACCTTAACTTTAC| AACTTGAGTGGCCTTGAATACTGTT|CCTATCTGGAATGTGCTGTTCTCTT

5’

3’

Chop into short pieces suitable for hybridizing to 25mers on GeneChipChop into short pieces suitable for hybridizing to 25mers on GeneChip

Attach chromophore, then inject Attach chromophore, then inject onto the GeneChiponto the GeneChip

Page 9: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Affymetrix HybridizationAffymetrix Hybridization

PMPM MMMM

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGAGCTATACGGTTC|

AGTCGGATTAAGTGCTATACGGTTC|

AGTCGGATTAAGGGCTATACGGTTC|

AGTCGGATTAAGAGCTATACGGTTC|

AGTCGGATTAAGGGCTATACGGTTC|

AGTCGGATTAAGTGCTATACGGTTC|

AGTCGGATTAAGAGCTATACGGTTC|

AGTCGGATTAAGGGCTATACGGTTC|

|TCAGCCTAATTCGCGATATGCCAAG

|TCAGCCTAATTCGCGATATGCCAAG

X

Page 10: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Affymetrix HybridizationAffymetrix Hybridization

PMPM MMMM

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGCGCTATACGGTTC|

AGTCGGATTAAGAGCTATACGGTTC|

AGTCGGATTAAGTGCTATACGGTTC|

AGTCGGATTAAGGGCTATACGGTTC|

AGTCGGATTAAGAGCTATACGGTTC|

AGTCGGATTAAGGGCTATACGGTTC|

AGTCGGATTAAGTGCTATACGGTTC|

AGTCGGATTAAGAGCTATACGGTTC|

AGTCGGATTAAGGGCTATACGGTTC|

|TCAGCCTAATTCGCGATATGCCAAG

|TCAGCCTAATTCGCGATATGCCAAG

X

FormsFormsduplex with duplex with complementary complementary strandstrand

Mismatch!

Mismatch!

MatchMatch

Page 11: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Probe Cell IntensitiesProbe Cell Intensities

Average Difference = Average Difference = (PM – MM)/Pairs in Average(PM – MM)/Pairs in Average

1,662

Page 12: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Measuring Gene ExpressionMeasuring Gene Expression“Spotted DNA Microarrays”“Spotted DNA Microarrays”

• Each spot is the cDNA for a specific gene.

• RNA from the experimental sample is labeled with Cy5 red fluorescent dye.

• RNA from the reference sample is labeled with Cy3 green fluorescent dye.

• Fluorescent intensity ratios (Cy5/Cy3) are measured.

http://www.microarrays.org/software.html

http://rana.lbl.gov/ http://www.bioinfo.utmb.edu/

Page 13: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Where Do We Get the Data?Where Do We Get the Data?Disease,

Pathogens,Drugs, etc…

mRNA expressedin response to stimulus

mRNA collected and hybridized onto microarray

Microarray analyzed for spot intensities

Gene co-expression patterns

cDNA Gene Microarray

Page 14: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

MethodMethod

• Get mRNA samples from multiple Get mRNA samples from multiple conditions.conditions.

• Hybridize to DNA microarrays.Hybridize to DNA microarrays.

• Measure intensities.Measure intensities.

• Cluster.Cluster.

• Analyze results.Analyze results.

• Design new experiment.Design new experiment.

Page 15: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Discrimination between samplesDiscrimination between samples

• Green is “Green is “downdown”.”.• Red is “Red is “upup”.”.• We can We can

differentiate differentiate clearly between clearly between tumor and normal tumor and normal tissue.tissue.

• Can we find Can we find differences differences between between progressing and progressing and non-progressing non-progressing tumors?tumors?

Page 16: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Problematic quality of dataProblematic quality of data

• Note the large dynamic range.

• And the very large number of data points.

• And the limited information content.

Page 17: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

ProteomicsProteomics• Is to protein Is to protein

expression what expression what genomics is to genomics is to gene expression.gene expression.

• Due to variations Due to variations like post-like post-translational translational modifications, modifications, there are many there are many more proteins more proteins than genes.than genes.

Page 18: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

ProteomicsProteomics• Holds new promise for the future Holds new promise for the future

understanding of complex biological understanding of complex biological systems. systems.

• Post-translational modifications include: Post-translational modifications include: – PhosphorylationPhosphorylation– GlycosylationGlycosylation– OxidationOxidation

• Many challenges remain, Many challenges remain, e.g.e.g. isolating, isolating, identifying, characterizing, and identifying, characterizing, and quantifying small amounts of a very large quantifying small amounts of a very large number of varieties of proteinsnumber of varieties of proteins

• Currently, we primarily use 2D gels and Currently, we primarily use 2D gels and mass spectroscopy.mass spectroscopy.

Page 19: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Protein Separation Using 2D Protein Separation Using 2D Gel ElectrophoresisGel Electrophoresis

• Protein analysis uses a diseased or treated Protein analysis uses a diseased or treated sample and a control sample. 2D gel sample and a control sample. 2D gel electrophoresis is performed for each sample to electrophoresis is performed for each sample to separate proteins based on their molecular separate proteins based on their molecular weight and charge. weight and charge.

• Black marks on the gel images indicate a Black marks on the gel images indicate a protein or cluster of proteins and are referred to protein or cluster of proteins and are referred to as "features." as "features."

• The x-axis is the Isoelectric point (pI) which is The x-axis is the Isoelectric point (pI) which is analagous to pH, while the y-axis is molecular analagous to pH, while the y-axis is molecular weight (Mw) or size. weight (Mw) or size.

http://www.incyte.com/proteomics/tour/separation.shtml

Page 20: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Protein SeparationProtein Separation

Page 21: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Protein AnalysisProtein Analysis

• Gels are fixed and stained with a Gels are fixed and stained with a fluorescent dye, then scanned.fluorescent dye, then scanned.

• Expression levels are measured based on Expression levels are measured based on the size of each feature on the gel. the size of each feature on the gel.

• Provides information about those Provides information about those proteins which are up and down-proteins which are up and down-regulated, including how their abundance regulated, including how their abundance changed.changed.

http://www.incyte.com/proteomics/tour/analysis.shtml

Page 22: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Protein AnalysisProtein Analysis

http://www.incyte.com/proteomics/tour/analysis.shtml

Page 23: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Protein CharacterizationProtein Characterization

• Proteins are excised from the gel and treated with a Proteins are excised from the gel and treated with a succession of enzymes that cut amino acid chains into succession of enzymes that cut amino acid chains into short polypeptides about 5-10 amino acids in length.short polypeptides about 5-10 amino acids in length.

• The polypeptide fragments for each protein are then The polypeptide fragments for each protein are then separated by capillary electrophoresis and analyzed separated by capillary electrophoresis and analyzed using rapid-throughput mass spectrometry. using rapid-throughput mass spectrometry.

• At this point, we know the amino acid sequence of the At this point, we know the amino acid sequence of the polypeptide fragments, their mass, as well as post-polypeptide fragments, their mass, as well as post-translational modifications that occurred such as translational modifications that occurred such as glycosylation and phosphorylation. glycosylation and phosphorylation.

Page 24: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Protein CharacterizationProtein Characterization

Page 25: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Systems BiologySystems Biology• Consolidates genomics and Consolidates genomics and

proteomics differential expression proteomics differential expression data into a systematic description of data into a systematic description of pathways.pathways.– Signaling pathways.Signaling pathways.– Inflammatory response pathways.Inflammatory response pathways.– Metabolic pathways.Metabolic pathways.– Etc…Etc…

• Potential for Potential for understanding the understanding the interrelationships between genes, interrelationships between genes, proteins, and disease and identifying proteins, and disease and identifying potential therapeutic targets.potential therapeutic targets.

Page 26: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Gene ExpressionGene Expressionvs. Protein Abundancevs. Protein Abundance

• What exactly are we measuring?What exactly are we measuring?

• What is the relationship between What is the relationship between - ““level of gene expression” andlevel of gene expression” and

- ““abundance of proteins” ?abundance of proteins” ?

Page 27: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Dogma of Molecular BiologyDogma of Molecular Biology

Page 28: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Balance equationsBalance equations

][)()();()()( lineassemblytXtTtTtXtR

][)()();()()( rateturnovertPtEtEtTtP

)()()( tXtXtR

)()()( tXtPtP

1)(//, iiiiiiiiii RPXPXR

In the steady state, for a given geneIn the steady state, for a given gene ii

Page 29: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Complicating FactorsComplicating Factors

• For any gene, product (protein) abundance For any gene, product (protein) abundance is not necessarily proportional to the is not necessarily proportional to the relative expression level, even under relative expression level, even under “steady state” .“steady state” .

• Products do not follow 1-order elimination Products do not follow 1-order elimination kinetics. Instead they enter into kinetics. Instead they enter into complicated interactions with each other complicated interactions with each other and with external factors.and with external factors.

Page 30: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Application:Application:Identification of Gene NetworksIdentification of Gene Networks

General ideas:General ideas:

• Level of expression of a gene affects levels Level of expression of a gene affects levels of expressions of other genesof expressions of other genes

• Only three levels possible: Only three levels possible: Normal (0)Normal (0)

Over-expression (1)Over-expression (1)

Under-expression (-1)Under-expression (-1)

• Data: Arrays of perturbed expression levels Data: Arrays of perturbed expression levels in a set of genesin a set of genes

• Model: Perceptron (simplest neural net)Model: Perceptron (simplest neural net)

Page 31: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

ReferenceReference

Kim et al. (2000)Kim et al. (2000)

““General nonlinear framework for General nonlinear framework for the analysis of gene interaction via the analysis of gene interaction via multivariate expression arrays”multivariate expression arrays”

Journal of Biomedical OpticsJournal of Biomedical Optics 5, 411– 5, 411–424424

Page 32: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data
Page 33: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

• Data tableData table

• Perceptron function: Perceptron function: gg(.) is sigmoidal, (.) is sigmoidal,

XX’s and ’s and YY quantized to 3 levels quantized to 3 levels

)( 2211 bXaXaXagY mmpred

Page 34: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

• Training: Estimating coefficients a so that a coefficient Training: Estimating coefficients a so that a coefficient of determination (of determination () is maximized.) is maximized.

• Of all possible dependencies, only these with Of all possible dependencies, only these with above above threshold, are retained.threshold, are retained.

00 /)( opt

Page 35: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data
Page 36: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

ApplicationApplicationFGF-2 Signaling Pathways FGF-2 Signaling Pathways

and Breast Cancerand Breast Cancer

General ideas:General ideas:• Use 2-D protein gels and mass spectrometry to Use 2-D protein gels and mass spectrometry to

measure abundance changes of proteins in measure abundance changes of proteins in cancer cells, relative to normal cells.cancer cells, relative to normal cells.

• Use perturbed systems to draw conclusions on Use perturbed systems to draw conclusions on some specific signaling pathways.some specific signaling pathways.

• Example:Example: Signaling pathways of one of the Signaling pathways of one of the Fibroblast growth factors (FGF-2) in breast Fibroblast growth factors (FGF-2) in breast cancer.cancer.

Page 37: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

ReferenceReference

Hondermarck et al. (2001)Hondermarck et al. (2001)

““Proteomics of breast cancer Proteomics of breast cancer for marker discovery and for marker discovery and signal pathway profiling”signal pathway profiling”

Proteomics 1 Proteomics 1 , 1216–1232, 1216–1232

Page 38: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Figure 2.Figure 2.

Silver stained Silver stained 2-DE profile of 2-DE profile of MCF-7 breast MCF-7 breast cancer cells. cancer cells. The major The major proteins were proteins were determined by determined by MALDI-TOF and MALDI-TOF and MS/MS after MS/MS after trypsin trypsin digestion. digestion.

Page 39: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Figure 3Figure 3MALDI-TOF and MALDI-TOF and MS/MS spectra MS/MS spectra obtained forobtained forHSP70.HSP70.(A) MALDI-TOF and (A) MALDI-TOF and (B) MS/MS analysis of (B) MS/MS analysis of peak peak m/z m/z 1488.5 was 1488.5 was performed. The performed. The letters labeling the letters labeling the peaks are the single peaks are the single letter code for the letter code for the amino acids identifiedamino acids identifiedby MS/MS. Database by MS/MS. Database searching allowed the searching allowed the identificationidentificationof HSP70.of HSP70.

Page 40: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Figure 5Figure 5

2-D 2-D patterns patterns showing showing the down-the down-regulation regulation of 14-3-3 of 14-3-3 sigma sigma (indicated (indicated by an by an arrow) in arrow) in seven seven representatrepresentative breast ive breast tumor tumor samples samples (C–I)(C–I)

Page 41: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Design of experimentsDesign of experiments

• Previously depicted: “abundance Previously depicted: “abundance proteomics”, no clues as to how things proteomics”, no clues as to how things work.work.

• “ “Functional proteomics”Functional proteomics” Use perturbations of the hypothetical causal Use perturbations of the hypothetical causal

factor.factor. Measure not simply abundance but Measure not simply abundance but

characteristics indicating, e.g.,characteristics indicating, e.g., Synthesis rates Synthesis rates ActivationActivation

Page 42: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data
Page 43: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

Figure 7Figure 7Changes of Changes of protein synthesis protein synthesis induced by FGF-2 induced by FGF-2 stimulation in stimulation in MCF-7 breast MCF-7 breast cancer cells. 35 cancer cells. 35 S-labeled S-labeled proteins from proteins from unstimulated (A, unstimulated (A, C) or stimulated C) or stimulated (B, D) MCF-7 cells (B, D) MCF-7 cells were separated were separated by 2-DE and 2-D by 2-DE and 2-D gels were gels were subjected to subjected to autoradiography.autoradiography.

Page 44: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data
Page 45: Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data

CreditsCredits

• Bruce Luxon (UTMB, Galveston, TX)Bruce Luxon (UTMB, Galveston, TX)

• George Weinstock (BCM, Houston, TX)George Weinstock (BCM, Houston, TX)

• Guy de Maupassant Guy de Maupassant

[“three major virtues of a French writer: [“three major virtues of a French writer:

clarity, clarity, and clarity”]clarity, clarity, and clarity”]