42
Introduction to Protein Translation, Databases and Structural Alignment BMI 730 Victor Jin Department of Biomedical Informatics Ohio State University

Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Embed Size (px)

DESCRIPTION

Introduction to Protein Translation, Databases and Structural Alignment BMI 730. Victor Jin Department of Biomedical Informatics Ohio State University. Review of Protein Function and Translation Database and Software 3-D Alignment. Review of Protein Function and Translation - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Introduction to Protein Translation, Databases and

Structural Alignment

BMI 730 Victor Jin

Department of Biomedical InformaticsOhio State University

Page 2: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Review of Protein Function and Translation

Database and Software

3-D Alignment

Page 3: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Review of Protein Function and Translation

Database and Software

3-D Alignment

Page 4: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Protein function

Proteins are basic building blocks for every cellular structure from smallest membrane-bound receptor to largest organelle.

Proteins are involved in all processes inside a cell. a) Gene regulation b) Metabolism c) Signalling d) Development e) Structure

Page 5: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Proteins serve crucial roles in a cellProteins serve crucial roles in a cell

Catalysis:Almost all chemical reactions in a living cell are catalyzed by protein enzymes.

Transport:Some proteins transports various substances, such as oxygen, ions, and so on.

Information transfer:For example, hormones.

Alcohol dehydrogenase oxidizes alcohols to aldehydes or ketones

Haemoglobin carries oxygen

Insulin controls the amount of sugar in the blood

Page 6: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Translation of mRNA is highly regulated in multi-cellular eukaryotic organisms, whereas in prokaryotes regulation occurs mainly at the level of transcription.

There is global regulation of protein synthesis.

E.g., protein synthesis may be regulated in relation to the cell cycle or in response to cellular stresses such as starvation or accumulation of unfolded proteins in the endoplasmic reticulum.

Mechanisms include regulation by signal-activated phosphorylation or dephosphorylation of initiation and elongation factors.

Eukaryotic Translation

Page 7: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Translation of particular mRNAs may be inhibited by small single-stranded microRNA molecules about 20-22 nucleotides long.

MicroRNAs bind via base-pairing to 3' un-translated regions of mRNA along with a protein complex RISC (RNA-induced silencing complex), inhibiting translation and in some cases promoting mRNA degradation.

Tissue-specific expression of particular genome-encoded microRNAs is an essential regulatory mechanism controlling embryonic development.

Some forms of cancer are associated with altered expression of microRNAs that regulate synthesis of proteins relevant to cell cycle progression or apoptosis.

microRNAmicroRNA

Page 8: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Protein factors that mediate & control translation are more numerous in eukaryotes than in prokaryotes.

Eukaryotic factors are designated with the prefix "e".

Some factors are highly conserved across kingdoms.

E.g., the eukaryotic elongation factor eEF1A is structurally and functionally similar to the prokaryotic EF-TU (EF1A).

In contrast, eEF1B, the eukaryotic equivalent of the GEF EF-Ts, is relatively complex, having multiple subunits subject to regulatory phosphorylation.

Protein factorsProtein factors

Page 9: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Initiation of protein synthesis is much more complex in eukaryotes, & requires a large number of protein factors.

Some eukaryotic initiation factors (e.g., eIF3 & eIF4G) serve as scaffolds, with multiple domains that bind other proteins during

assembly of large initiation complexes.

InitiationInitiation

Page 10: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Usually a pre-initiation complex forms, including: several initiation factors the small ribosomal subunit the loaded initiator tRNA, Met-tRNAi

Met.

This then binds to a separate complex that includes: mRNA initiation factors including ones that interact with the 5'

methylguanosine cap & the 3' poly-A tail, structures unique to eukaryotic mRNA.

Within this complex mRNA is thought to circularize via interactions between factors that associate with the 5' cap & with a poly-A binding protein.

pre-initiation complex

Page 11: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

After the initiation complex assembles, it translocates along the mRNA in a process called scanning, until the initiation codon is reached.

Scanning is facilitated by eukaryotic initiation factor eIF4A, which functions as an ATP-dependent helicase to unwind mRNA secondary structure while releasing bound proteins.

A short sequence of bases adjacent to the AUG initiation codon may aid in recognition of the start site.

After the initiation codon is recognized, there is hydrolysis of GTP and release of initiation factors, as the large ribosomal subunit joins the complex and elongation commences.

TranslocationTranslocation

Page 12: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Protein Translation Demo

Page 13: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Review of Protein Function and Translation

Database and Software

3-D Alignment

Page 14: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Protein Databases

UniProt is the universal protein database, a central repository of protein data created by combining Swiss-Prot, TrEMBL and PIR. This makes it the world's most comprehensive resource on protein information.The Protein Information Resource (PIR), located at Georgetown University Medical Center (GUMC), is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies. Swiss-Prot is a curated biological database of protein sequences from different species created in 1986 by Amos Bairoch during his PhD and developed by the Swiss Institute of Bioinformatics and the European Bioinformatics Institute. Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains and families. PDBNCBIhttp://proteome.nih.gov/links.html

Page 15: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

PubMed – Protein Databases

The Protein database contains sequence data from the translated coding regions from DNA sequences in GenBank, EMBL, and DDBJ as well as protein sequences submitted to Protein Information Resource (PIR), SWISS-PROT, Protein Research Foundation (PRF), and Protein Data Bank (PDB) (sequences from solved structures).

The Structure database or Molecular Modeling Database (MMDB) contains experimental data from crystallographic and NMR structure determinations. The data for MMDB are obtained from the Protein Data Bank (PDB). The NCBI has cross-linked structural data to bibliographic information, to the sequence databases, and to the NCBI taxonomy. Use Cn3D, the NCBI 3D structure viewer, for easy interactive visualization of molecular structures from Entrez.

Tutorial: http://www.pdb.org/pdbstatic/tutorials/tutorial.html

Page 16: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Example – UniProt - Expasyhttp://www.uniprot.org/ http://www.expasy.org/

Page 17: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Example – PDBhttp://www.pdb.org Only proteins with known structures are included.

Page 18: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Example – PDB

Page 19: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Example – PDB

Page 20: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Example – PDB

Page 21: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Protein Visualization Softwares• Cn3d• RasMol• TOPS• Chime

• DSSP• Molscript• Ribbons• MSMS• Surfnet• …

Page 22: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Cn3d

Page 23: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Review of Protein Function and Translation

Database and Software

3-D Alignment

Page 24: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Why Align Structures

1. For homologous proteins (similar ancestry), this provides the “gold standard” for sequence alignment – elucidates the common ancestry of the proteins.

2. For nonhomologous proteins, allows us to identify common substructures of interest.

3. Allows us to classify proteins into clusters, based on structural similarity.

Page 25: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Example of Structural Homologs

Sequence alignmentSLSAAEADLAGKSWAPVFANKNANGLDFLVALFEKFPDSANFFADFK-GKSVADIKA-SVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG

PKLRDVSSRIFTRLNEFVNNAANAGKMSAMLSQFAKEHVGFGVGSAQFENVRSMFPGFVAKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP

Structural alignmentXSLSAAEADLAGKSW-APVFANKN-ANGLDFLVALFEKFPDSANFF-ADFKGKSVA—-DIK V-LSPADKTNVKAAWGK-VGAHA-GEYGAEALERMFLSFPTTKTYFPHF-------DLS-H

ASPKLRDVSSRIFTRLNEFVNNAANAGKMSA-MLSQ-FAKEHV-GFGVGSAQFENVRSM-FGSAQVKGHGKKVADALTNAVAHV-D—-DMPNAL—-SALSDLHAHKLRVDPVNFKLLS-HCL

PGFVALVTLAAHLPAEFTP

Page 26: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

• The existence of large numbers of remote homologs shows us that true structural similarity is hard to see in the primary amino acid sequence

• Structural conservation is stronger than sequence conservation

Sequence/Structure Homology

Page 27: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Remote Homology

• Remote homologs sometimes conserve function (all SH3-like domains bind peptides), and often conserve active site locations (TIM barrels active sites are at the ends of the barrels).

• Remote homologs probably are evolutionarily related and fold using the same folding pathway.

Page 28: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Example of Structural Homologs

4DFR: Dihydrofolate reductase

1YAC: Octameric Hydrolase of Unknown Specificity

5.9% sequence identity (best alignment)

1YAC structure solved without knowing function.

Alignment to 4DFR and others implies it is a hydrolase of some sort.

Page 29: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Example of Structural Homologs

DHFR: yellow & orange

YAC: green & purple

Sheets only

Helices only

Page 30: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Sander-Schneider Relationship- “Naturally occurring sequences with more than 25% sequence identity over 80 or more residues always adopt the same basic structure”.

- It only applies to naturally occurring proteins of known structure seen so far except for a few exceptions.

- It is the basis of comparative modeling. Guaranteed structural similarity given by the relationship is a means to predict structure.

Page 31: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

How to Align Structures

1. Visual inspection (by eye)

2. Computational approach• Point-based methods using point distances and

other properties to establish correspondences• Secondary structure-based methods use vectors

representing secondary structures to establish correspondences.

Page 32: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Global versus Local

Global alignment

Page 33: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Local Alignment

motif

Page 34: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Structural Alignment Algorithms

Alignment algorithms create a one-to-one mapping of subset(s) of one sequence to subset(s) of another sequence.

Structure-based alignment algorithms do this by minimizing the structure difference score or root-mean-square difference (rmsd) in alpha-carbon positions.

The Problem Is: we don’t know the alignment.

Structure-based alignment programs determine the alignment that minimizes the rmsd.

Page 35: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Evaluating Structural Alignments

• # of aligned residues• Percent identity in aligned residues• # of gaps• Size of two proteins• Conservation of known active site environments • RMSD (root mean square deviation) of

corresponding residues• Dihedral angle difference …

• No universal criterion• Application dependent

Page 36: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Least Squares Superposition

Problem: find the rotation matrix, R and a vector, v, that minimize the following quantity:

Where xi are the coordinates from one molecule and yi are the equivalent* coordinates from another molecule.

*equivalent based on alignment

Page 37: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Comparing dihedral anglesTorsion angles () are:- local by nature (error propagation)- invariant upon rotation and translation of the molecule- compact (O(n) angles for a protein of n residues)

Add 1 degreeTo all

Page 38: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Structural Alignments Methods

• STRUCTAL [Levitt, Subbiah, Gerstein]• Using dynamic programming with a distance

metric• DALI [Holm, Sander]

• Analysis of distance maps• LOCK [Singh, Brutlag]

• Analysis of secondary structure vectors, followed by refinement with distances

• SSAP [Orengo and Taylor, 1989]• VAST [Gibrat et al., 1996]• CE [Shindyalov and Bourne, 1998]• SSM [Krissinel and Henrik, 2004]• …

Page 39: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Two Subproblems• Find correspondence set

• Find alignment transform(protein superposition problem)

• Chicken-and-egg

Page 40: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

DALI (Distance ALIgnment)

• DALI has been used to do an ALL vs. ALL comparison of proteins in the PDB, and to create a hierarchical clustering of families.

• http://www.ebi.ac.uk/dali/

• FSSP = fold classification based on structure-structure alignment of proteins

• http://ekhidna.biocenter.helsinki.fi/dali/start

Page 41: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

VAST (Vector Alignment Search Tool)

• It places great emphasis on the definition of the threshold of significant structural similarity to avoid (many) similarities of small substructures that occur by chance in protein structure comparison.

• At the heart of VAST's significance calculation is definition of the "unit" of tertiary structure similarity as pairs of secondary structure elements (SSE's) that have similar type, relative orientation, and connectivity. In comparing two protein domains the most surprising substructure similarity is that where the sum of superposition scores across these "units" is greatest.

• http://www.ncbi.nlm.nih.gov/Structure/RESEARCH/iucrabs.html#Ref_6

Page 42: Introduction to Protein Translation, Databases and Structural Alignment BMI 730

Exercises• Look up Human Catalase in www.expasy.org. Find out:

• How long is the protein chain? Where is its active site? • Is its 3D structure available? If so, how was it obtained?• How long is its longest helix chain and where is it located?

• Look up PDB ID 1DGB in PDB. Find out:• What protein is it?• What is the resolution of its x-ray structure?• Visualize its structure using the tools provided on PDB website

(try them all).• Look up PDB ID 1DGB in MMDB (PubMed Structure Database). Find

out:• What is its MMDB ID?• Visualize its 3D structure using Cn3D. Export the images for

different rendering effects (e.g., worm, spacefill).• Search its structure neighbors using VAST. How many

neighbors are found for the entire chain?• Perform a VAST search for 2CZU chain A.

• View its alignment (in sequence) with 1X8P chain A, 1GKA chain B, and 1BJ7.

• Compare the structure alignment results with sequence alignment results (using ClustalW).

• View its alignment with 1X8P chain A in Cn3D.