45
07/03/22 1 EECS 730 Introduction to Bioinformatics Introduction to Proteomics Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/

EECS 730 Introduction to Bioinformatics Introduction to Proteomics

Embed Size (px)

DESCRIPTION

EECS 730 Introduction to Bioinformatics Introduction to Proteomics. Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/. Proteome: Prote in complement of a gen ome. Time- and cell- specific protein complement of the genome. - PowerPoint PPT Presentation

Citation preview

Page 1: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 1

EECS 730Introduction to Bioinformatics

Introduction to Proteomics

Luke HuanElectrical Engineering and Computer Science

http://people.eecs.ku.edu/~jhuan/

Page 2: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 2

Proteome: Protein complement of a genome

Time- and cell- specific protein complement of the genome.

Encompasses all proteins expressed in a cell at one time, including isoforms and post-translational modifications.

Page 3: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 3

Proteome Contrast to genome

The genome is constant for one cell and identical for all cells of an organism, and does not change very much within a species

The proteome is very dynamic with time and in response to external factors, and differs substantially between cell types.

Variable In different cell and tissue types in same organism In different growth and developmental stages of organism

Dynamic Depends on response of genome to environmental factors

Disease state Drug challenge Growth conditions Stress

Page 4: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 4

Introduction to proteomics

Proteomics is the study of total protein complements, proteomes, e.g. from a given tissue or cell type. Don’t forget that the proteome is dynamic, changing to reflect the

environment that the cell is in Definitions

Classical - restricted to large scale analysis of gene products involving only proteins

Inclusive - combination of protein studies with analyses that have genetic components such as mRNA, genomics, and yeast two-hybrid

Examples of important proteomic questions:1) What proteins are present?2) What other proteins does a particular protein interact with (networks)?3) What does a particular protein look like (structure)?

Page 5: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 5

Genomics vs. proteomics

Genomics has provided spectacular amounts of data, but most of it remains uninterpretable at our current level of understanding.

In some ways, genomics raises more questions than it answers.

The emerging field of proteomics promises to answer some of those questions by systematically studying all of the proteins encoded by the genome.

Page 6: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 6

1 gene is no longer equal to one protein In fact, the definition of a gene is debatable. (ORF, promoter,

pseudogene, gene product, etc) 1 gene = how many proteins?

There are only 30,000 genes in the human genome, yet there are more than 100,000 proteins in the human proteome.

Actually, cataloguing the human proteome requires much more than just 100K proteins.

30,000 genes x myriad of modifications >> 100K protein forms! Modifications include: alternate RNA splicing, chemical modifications,

cleavage Chemical modifications include: phosphorylation, acetylation,

glycosylation, and many more.

1 gene = 1 protein?

Page 7: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 7

Why proteomics?

Annotation of genomes, i.e. functional annotation Genome + proteome = annotation

Protein Function Protein Post-Translational Modification Protein Localization and Compartmentalization Protein-Protein Interactions Protein Expression Studies

Differential gene expression is not the answer

Page 8: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 8

Microarray data doesn’t correlate perfectly with protein expression levels

Analysis of mRNA transcripts with microarray has provided dynamic information regarding which genes are expressed in cells under a given set of experimental conditions, yielding clues as to which proteins are involved in certain pathways and disease states.

However, differences in the half-lives of RNA and proteins, as well as post-translational modifications important to protein function prevent mRNA profiles from being perfectly correlated to the cells’ actual protein profiles.

Page 9: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 9

Introduction to proteomics

Composition of the proteome depends on cell type, developmental phase and conditions

Proteome analyses are still struggling to solve the ”basic proteome” of different cells and tissues or limited changes under changing conditions or during processes

Current methods can only ”see” the most abundant proteins

Page 10: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 10

Types of proteomics

Protein Expression Quantitative study of protein expression between samples

that differ by some variable

Structural Proteomics Goal is to map out the 3-D structure of proteins and protein

complexes

Functional Proteomics To study protein-protein interaction, 3-D structures, cellular

localization and PTMS in order to understand the physiological function of the whole set of proteome.

Page 11: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 11

Large-scale protein analysis

2D protein gels Yeast two-hybrid Rosetta Stone approach Pathways

Page 12: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 12

2D protein electrophoresis and mass spectrometry

Page 13: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 13

Two-dimensional protein gels

First dimension: isoelectric focusing

Electrophorese ampholytes to establisha pH gradient

Can use a pre-made strip

Proteins migrate to their isoelectric point(pI) then stop (net charge is zero)

Range of pI typically 4-9 (5-8 most common)

Page 14: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 14

Two-dimensional protein gels

Second dimension: SDS-PAGE

Electrophorese proteins through an acrylamidematrix

Proteins are charged and migrate through an electric field v = Eq / d6r

Conditions are denaturing

Can resolve hundreds to thousands of proteins

Page 15: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 15

Page 16: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 16

Proteins identified on 2D gels (IEF/SDS-PAGE)

Protein mass analysis by MALDI-TOF

-- done at core facilities-- often detect posttranslational modifications-- matrix assisted laser desorption/ionization time-of-flight spectroscopy

Page 17: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 17

Evaluation of 2D gels (IEF/SDS-PAGE)

Advantages:Visualize hundreds to thousands of proteinsImproved identification of protein spots

Disadvantages:Limited number of samples can be processedMostly abundant proteins visualizedTechnically difficultLabor-intensive, not really ”high-throughput” methods

Page 18: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 18

Yeast-Two-hybrid (Y2H)

Aim: Identify pairs of physical interactions among

proteins.

Solution: Use the transcription mechanism of the cell

Page 19: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 19

Yeast-two-hybrid: Principles

Recap of biology: Protein vs. domain

A protein is composed of modules or domains

Domains are individually folded units within the same protein chain.

The presence of multiple domains in a protein allow the protein to perform different functions.

The central dogma of biology

d1d2 d3p1

d4d5p2

TRANSCRIPTION

DNA

RNA

TRANSLATION

PROTEIN

Page 20: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 20

Yeast-two-hybrid: Principles

Normal transcription requires both the DNA-binding domain (BD) and the activation domain (AD) of a transcriptional activator (TA).

Transcriptional activator (TA) Protein that is required to activate transcription A DNA-binding domain (BD): binding to DNA, An activation domain (AD): activating transcription of the DNA

Page 21: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 21

Yeast-two-hybrid: Principles

The binding domain and the activation domain do not necessarily have to be on the same protein.

In fact, a protein with a DNA binding domain can activate transcription when simply bound to another protein containing an activation domain

this principle forms the basis for the yeast two-hybrid technique

Page 22: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 22

Major components of a Yeast-two-hybrid experiment: Bait protein – the protein of interest (X): with a DNA binding

domain attached to its N-terminus

Prey protein – its potential binding partner (Y): fused to an activation domain

A reporter gene (R): a gene whose protein product can be easily detected and measured

Yeast-two-hybrid: Principles

Protein X interacts with protein Y

X and Y form a functional transcriptional activator

the reporter gene is transcribed

Use the reporter produced as a measure of interaction between X and Y

Page 23: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 23

Yeast two-hybrid transcription

The yeast two-hybrid technique measures protein-protein interactions by measuring transcription of a reporter gene. If protein X and protein Y interact, then their DNA-binding domain and activation domain will combine to form a functional transcriptional activator (TA). The TA will then proceed to transcribe the reporter gene that is paired with its promoter.

Page 24: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 24

Yeast two-hybrid screens Screen a library of proteins for

potential binding partner Identifying interacting proteins in a

pairwise fashion Feasible at a large scale (genome

scale)X Y

Z

A

bait prey

Reporter Gene

BaitProtein

BindingDomain

Prey Protein

ActivationDomain

Bait-prey model

Page 25: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 25

http://depts.washington.edu/sfields/

Page 26: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 26

red = cellular role & subcellular localization of interacting proteins are identical; blue = localiations are identical; green = cellular roles are identical

Page 27: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 27

Y2H Identify proteins that are physically associated in vivo. Use yeast S. cerevisiae as a host

Disadvantage The fused proteins must be able to fold correctly and exist as a

stable protein inside the yeast cells Advantage

Yeast is closer to higher eukaryotics than in vitro experiments or those systems based on bacterial hosts

Weak and transient interactions Often the most interesting in signaling cascades Are more readily detected in two-hybrid since the reporter gene

strategy results in a significant amplification. Always a trade-off between the identification of weak

interactions and the number of false positives

Page 28: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 28

<4%

Low overlap among independent experiments

482 2422855

Uetz et al.

1337

Ito et al.

3277

proteins

<23%

1244 4274201

Uetz et al.

1445

Ito et al.

4475

interactions

High false positives and false negatives in yeast-two hybrid data

Two sets of independent experiments Ito et al PNAS 1999 Uetz et al Nature 2000

Page 29: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 29

False positives

Proteins with transcription activation activity (bait works by itself)

Proteins that normally never see each other (e.g. due to the time/space constraints) are expressed together and may be sticky

Proteins are expressed at high levels and this promotes promiscuous interaction

Another protein bridges the two interacting partners

Page 30: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 30

False negatives

Proteins become toxic upon expression in yeast Proteins are toxic when expressed and targeted into the

yeast nucleus. Proteins proteolyse essential yeast proteins or proteins

essential for the system like the DNA binding domain or the activation domain.

Proteins don’t get into the nucleus (membrane protein esp.)

Proteins are not modified correctly in heterologous environment

Page 31: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 31

Final Remark on Y2H

Although the outcome of a screening often results in many new hypotheses, they still need to be validated by other techniques.

There is enough reason to remain sceptic about two-hybrid screenings but the most convincing argument in favor of the two-hybrid is the number and speed

Referred to as functional screens Interacting proteins might give a functional hint if at least

one of the partners has a known functional commitment in a well understood signaling pathway.

Page 32: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 32

Analysis of protein complexes

Aim: Identification of complexes and their sub units.

Solution: a two step method Isolation of only relevant complexes Identification of complex units.

Page 33: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 33

Affinity chromatography/mass spec

Major methods High throughput mass spectrometric protein

complex identification (HMSPCI) Tandem affinity purification (TAP)

Again, bait – prey model Very sensitive method Identify multi-protein complexes

Not really possible in yeast two-hybrid

Page 34: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 34

Methods

1. Attach tags to bait proteins Introduce DNA encoding

these into cells Cells express modified

proteins Proteins form complexes

with other proteins in vivo Cells have to express

modified protein properly Tag can interfere with

protein folding and function Overexpressed protein

may be toxic to cell

1

2

3

4 5

6-9

Kumar and Snyder, 2002

Page 35: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 35

Methods2. Bait proteins and associated

proteins are precipitated on an affinity column

• Tag sticks to column along with protein complex

• Elute other proteins

• Elute tagged protein

3. Resolve proteins on an SDS-PAGE gel

• Separate by charge & weight

4. Cut out protein bands

• Proteins of same size will be in same band

5. Digest protein bands with trypsin Results in segments of proteins

1

2

3

4 5

6-9

Page 36: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 36

Methods

Mass spectrometry to analyze protein composition:

6. Samples are vaporized and ionized7. Ions enter mass analyzer and are separated by mass to charge

ratio 8. Ions are detected and a signal generated9. Compare signal to database to identify proteins in complex

Page 37: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 37

Methods

Page 38: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 38

Affinity chromatography/mass spec

Data on complexes deposited in databases

http://yeast.cellzome.comhttp://www.bind.ca

Page 39: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 39

Page 40: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 40

Page 41: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 41

Affinity chromatography/mass spec

False positives:• sticky proteins

Bait proteinGST

Page 42: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 42

Affinity chromatography/mass spec

False negatives:• Bait must be properly localized and in its native condition• Affinity tag may interfere with function• Transient protein interactions may be missed• Highly specific physiological conditions may be required• Bias against hydrophobic, and small proteins

Bait proteinGST

Page 43: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 43

The Rosetta Stone approach

Marcotte et al. (1999) and other groups hypothesized that some pairs of interacting proteins are encoded by two genes in many genomes, but occasionally theyare fused into a single gene.

By scanning many genomes for examples of “fusedgenes,” several thousand protein-protein predictionshave been made.

Page 44: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 44

Yeast topoisomerase II

E. coligyrase B

E. coligyrase A

Fig. 8.23Page 256

The Rosetta Stone approach

Page 45: EECS 730 Introduction to Bioinformatics Introduction to Proteomics

04/20/23 45

Function Prediction from Interaction

It is possible to deduct functions of a protein through the functions of its interaction partners.

A difficult task: Within-class, cross-class interactions

Available methods based on protein interaction Neighboring counting method Methods based on χ2-statistics Markov Random Fields Simulated annealing