© Paul Horton 2013
Recent advances in understanding of protein sub-cellular localisation signals
Paul Horton, Computational Biology Research Center, AIST, Japan
© Paul Horton 2013
TalkOutline
• Brief Summary of Protein Sub-cellular Localization
• Brief Discussion of Prediction and Causality
• Nuclear Localization Signals
• Mitochondrial Localization Signals
– My group is working on:• Matrix Targeting Signal Prediction
• Prediction of cleavage sites by mitochondrial peptidases (3 types)
• mRNA localization and co-translational translocation of mitochondrial proteins
– Saved for another 45 minute talk• I'll be around this week. Ask me if you are
interested!
© Paul Horton 2013
Motivation
• Aberrant localization has been implicated in many diseases– Will use Treacher Collins Syndrome gene as
example
– Zellweger Syndrome
– ...
• Co-localization can help validate protein-protein interaction and other “omic” data– co-localization is necessary condition for
biologically relevant interaction
© Paul Horton 2013
Why Predict?• Large scale data is available for some
organisms or organelles– yeast: Huh et al., Nature, 2003; Kumar et al.,
Genes & Dev., 2002,
• but still appear to contain many artifacts, effect of tags, expression levels, etc.– Independent error estimates of around 20%, Nair
& Rost, JMB, 2005, Heazlewood et al., Plant Cell, 2004.
• Most organisms have little or no direct experimental evidence for protein localization
© Paul Horton 2013
Why not use sequence similarity alone?
• Orthologous proteins generally would be expected to have the same localization
• Degree of sequence similarity needed to give high confidence of co-localization is much higher than that needed for high confidence of similar 3D structure, Nair & Rost, Protein Science 2002.
• Situation is complicated because isoforms of the same protein may have different localization, Nakao et al. NAR, 2005.
© Paul Horton 2013
Protein Subcellular Localization(aka Protein Sorting) in Eukaryotes
Nuclear encoded proteins are produced in the cytosol and generally require specific mechanisms to pass across membranes
The amino acid sequence of proteins contains much of thesignal information which determines localization and alsomuch non-causal information which correlates strongly withlocalization site in natural proteins
Translocation across membranes usually requires energy:GTP, ATP, proton gradient etc.
© Paul Horton 2013
Organelles and Function• cytosol: protein synthesis,...
• endoplasmic reticulum: membrane protein insertion, protein glycosylation, calcium sequestration
• Golgi body: modification (phosphorylation, removal/addition of sugar groups) of proteins and lipids
• mitochondria: aerobic respiration
• chloroplasts: photosyntheses,...
• lysosome: low PH hydrolysis
• peroxisime: fatty acid decomposition
• nucleus: transcription, handling of chromosomes...
© Paul Horton 2013
Sorting Signals Often Compared to Postal Address
Interleukin 24 (IL-24)E.R., Golgi, Vesicle,Extracellular Space
ER-Bound RibosomeDo not return to sender!
Interleukin 24 (IL-24)E.R., Golgi, Vesicle,Extracellular Space
ER-Bound RibosomeDo not return to sender!
© Paul Horton 2013
Protein trafficking pathways
By Christine [email protected]
darwin.bio.uci.edu/~bardwell/231B_2006_Suetterlin_Lec1.ppt
© Paul Horton 2013
Representative N-terminal Sorting Signals
• bacterial “signal peptide” exports proteins out of the cytoplasm.
• Eukaryotic “signal peptide”, N-terminal signal with variable length hydrophobic section, causes proteins to be co-translationally transported through (or into) the E.R. membrane
• Mitochondrial Targeting Sequence, roughly similar but often longer and somewhat less hydrophobic, can form a helix with +-charge on one side
• Chloroplast Targeting Sequence
© Paul Horton 2013
N-terminal signals cont.• Signal peptide and related sorting signals
all involve membrane translocation/insertion– signals and receptors homologous to each
other
• Cytosol --> E.R.– co-translational
• Cytosol --> Mito or Cholo– either co- or post-translation. Unfolded (by
chaperones).
© Paul Horton 2013
N-terminal signals largely independent of carrier protein
• Numerous experiments show signal peptides are generally interchangeable between different proteins
• Often cleaved
• Limited to first ~90 residues
• Cleavage, occurrence on the N-terminal, and co-translational recognition make signal peptides largely orthogonal to the rest of the protein– but not perfect separation of a postal address
© Paul Horton 2013
Sequence Logo for Eukaryotic Signal Peptide
http://clc.bio.com, I think from SignalP paper, Nielsen et al.
© Paul Horton 2013
C-terminal Sorting Signals
• KDEL (soluble) or KKXX (membrane protein) signal for E.R. retention
• SKL for peroxisomal targeting (soluble)
• NPIR vacuole
• LPXTG bacterial cell wall
• Y (or other aromatic residue) for β-barrels of Gram-negative bacterial outer membranes
© Paul Horton 2013
Internal Sorting Signals
• Nuclear Localization signals occur on surface of folded protein (possible after a conformational change) but can be anywhere on the 1D sequence
• There are others...
© Paul Horton 2013
Internal Sorting Signals● Nuclear Localization signals occur on surface of
folded protein (possible after conformational change) but can be anywhere in the sequence.● There are others...
Fatty acid bindingprotein with NLS and NES, both closer in 3D thanin 1D
© Paul Horton 2013
Two kinds of Correlation• Non-causal
– Good for predictions of naturally occurring proteins
– Easy to obtain from localization site labeled sequences
– Includes much information from seq. similarity and amino acid content
• Causal– Robust even when
applied to artificial proteins
– Difficult for machine learning methods to distinguish from non-causal
– Mutational analysis results may be useful here, but not found in Uniprot...
© Paul Horton 2013
(non)-Causal Correlations Example
NLS Zn Finger
Nuclear Localization DNA Binding
High Evolutionary Fitness
Appropriate Transcription Regulation
© Paul Horton 2013
(non)-Causal Correlations Example Revisited
NLS
Nuclear Localization
DNA Binding
Transcription Regulation
Evolutionary Fitness
DNA Binding Region
NuclearRetention?
often overlap(e.g. Zn Finger)
© Paul Horton 2013
Summary of Causality Discussion
• Surprisingly easy to overlook or confuse causality issues
• Whether heavy reliance on non-causal correlation is okay depends on the application
© Paul Horton 2013
• Brief Summary of Protein Sub-cellular Localization
• Brief Discussion of Prediction and Causality
• Nuclear Localization Signals
– Novel predictor for nuclear export signals
– Work on identifying cargo-carrier and carrier-signal relationships for nuclear import
Talk Outline
© Paul Horton 2013
NES predictor: NESsential“Prediction of leucine-rich nuclear export signal containing proteins with NESsential”Nucleic Acids Research, online, June 24, 2011.http://seq.cbrc.jp/NESsential/Szu-Chin Fu, Kenichiro Imai, Paul Horton
Note to US passport control:This has nothing to do withexporting nuclear weapons!!
Why does it take so longto get a US travel visa...
Rated as“must read”by F1000
"ValidNESs: a database of validated leucine-rich nuclear export signals",Szu-Chin Fu et al.Nucleic Acids Research Jan;41(Database issue):D338-43, 2013.
© Paul Horton 2013
Background on NES's
● We focus on the classical or “leucine-rich Nuclear Export Signal”
● Export signal to move proteins out of the nucleus● Recognized by a protein called CRM1 (exportin 1)● NES found in many viral proteins (e.g. HIV Rev2,
Influenza A NS2 protein) and oncogenes (e.g. P53,BRCA1,Survivin,nucleophosmin)
© Paul Horton 2013
Nucleus
Cytosol
Leucine-rich Nuclear Export Signal (NES)
2-way traffic through the Nuclear Pore Complex
The Exportin-1/CRM1 mediated export pathway
NES
N C10~12-mer
CRM1(Exportin
1)
NES-containing
Protein
Ran GTP
e.g. NES of HIV-1 REV: LPPLERLTL NES of MAPKK: LQKKLEELEL
First proposed consensus patternL-x(2,3)-[LIVFM]-x(2,3)-L-x-[LI]
© Paul Horton 2013
Nucleophosmin & Leukemia-- a motivating example --
Investigate how the acquisition of classical nuclear exportsignals in Nucleophosmin occurs in acute myeloid leukemia.
© Paul Horton 2013
Nucleophosmin & Leukemia-- a motivating example --
● Nucleophosmin is a multifunctional phosphoprotein which normally localizes mainly to the nucleolus
● ~30% of de novo acute myeloid leukemia (AML) carry NPM1 gene mutations that cause aberrant nucleophosmin accumulation in leukemic cell cytoplasm
© Paul Horton 2013
Nucleophosmin localization features(record from ValidNESs)
Weak, wild-type NES's ...WQW...Both W's bind to nucleoli
© Paul Horton 2013
Common Nucleophosmin AML patients creates NES
Falini et al., N Engl J Med, 352:254-66, 2005.
Duplication of 4-basescauses C-terminal frameshift
Deletes W's and createsnew NES signal, e.g. LclaVeeVsL
© Paul Horton 2013
Mechanism of altered localization of mutant nucleophosmin
The de novo NES plays a key rolein the etiology of acute myeloid leukemia!
Bolli et al. Cancer Res. 67:6230-7, 2007.
© Paul Horton 2013
The Exportin-1/CRM1 mediated export pathway-the major export pathway; with a broad range of substrates.
TRENDS in Cell Biology, 15:3 2005.
100+ proteins had been verified (now 200+)…• Which contain leucine-rich NES’s • and are Exported by the Exportin-1/CRM1-mediated export pathway
© Paul Horton 2013
NESbase (la Cour et al. 2003)
Containing 64 NES-containing proteins with experimental data on CRM1(Exportin1) dependency
NESbase has not been updated since 2003!
•Database of NES-containing proteins (Nucleic Acids Res 2003, 31:393-396.)
•Web prediction server of NES’s (Protein Engineering Design and Selection 2004, 17:527-536. )
NetNES web
server (la Cour et al. 2004)
• Trained by NES-containing proteins in NESbase• Using a combination of neural networks and hidden Markov models
Tested by only 5 independent NES-containing proteins discovered in 2004!
NetNES server is the only predictor currently available, but license is required for standalone version.
NES had been neglected!
© Paul Horton 2013
Project Goals
● Provide an updated NES dataset● We collected 70 proteins, 85 sites
– Later expanded to 221 proteins, 262 sites (ValidNES)
● Provide an open source predictor which can effectively be used to screen proteomes for promising new NES's
● seq.cbrc.jp/NESsential/
© Paul Horton 2013
6-mer pattern, 154 seqs 7-mer pattern, 114 seqs.
Sequence logos of NES's
Characteristic pattern of hydrophobic residues O..O.O or O...O.Owhere 'O' is a hydrophobic residue [LIVMF], and . is any residue,This pattern is sometimes preceded by another upstream hydrophobic residue.
© Paul Horton 2013
NES site prediction as binary classification problem
Although several exceptions exists, most confirmedNES's match either a 6-mer O..O.O or 7-mer O...O.Oconsensus match, where O ∈ {L,I,V,F,M} is ahydrophobic residue.
Prediction of NES sites can be formulated as a binary classificationproblem: is a given position in a protein the start of an NES or not.
However the ratio of false to true examples is extremely high, around100:1 even for NES containing examples. And the boundaries of NESsites are not always well defined.
We alleviate those problems by assuming that NES sites always matchthe consensus pattern (at the cost of having no hope to predictexceptional NES's.
© Paul Horton 2013
The region surronding NES's have a tendency to be disordered
6-mer 7-mer
Error barsrepresentstandarderror,not standarddeviation.
Prediction by POODLE-L (Hirose et al. 2007) DISOPRED (Ward et al. 2004)
© Paul Horton 2013
NES's tend to be disordered over a long range (around 100 residues)
6-mer
7-mer
POODLE-Lprediction
© Paul Horton 2013
Some NES's appear disordered
● NES's more likely to be disordered● The distribution for NES's may be bimodal
6-mer 7-mer
POODLE-Lprediction
© Paul Horton 2013
NESsential screening results
Top scoringNESsentialproteins oftenare true NES's
Yeast proteins
Both methods trainedon older databaseentries and tested onpost2003 entries
© Paul Horton 2013
NESsential Conclusions
● True NES sites are significantly more likely to be disordered than spurious matches
● NESsential can be effective to screen proteomes for promising candidate novel NES's
● But even for NESsential, the coverage is quite low● See NAR paper for the bad news...
© Paul Horton 2013
Importins/Exportins (Karyopherins)cargo specificity
● In humans, the 21 importin-β family proteins transport proteins and RNA molecular across the nuclear pore complex
● Why so many kinds of carriers?– Presumably for regulation
● The carrier(s) specific for most cargo proteins has not been clarified
● A small step in this direction● Kimura et al., Molecular & Cellular Proteomics, 2012.
© Paul Horton 2013
Experimental Screen for Transportin cargoes
Using mass spectroscopy, light proteins importedfrom outside the nucleus can be distinguished fromheavy proteins already there.
© Paul Horton 2013
Schematic of NLS types and carriers
For clarity, I outline the results first.
PY-NLS is recognized by Trn
The “BIB-domain like" NLS isrecognized by both Transportin andImportin-β. It hassimilar properties to the classicalNLS recognized by Importin-αin an Importin-α:β heterodimer.
© Paul Horton 2013
We confirmedthe newlyidentified Trncargoes includeboth PY-NLS'sand BIB-domainlike NLS's.