This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Exploring Internal Symmetry and Structural Repeats with CE-Symm
Spencer Bliven1,2,*, Aleix Lafita1,3, Peter W. Rose4, Guido Capitani1,3, Philip E. Bourne2, Andreas Prli41Paul Scherrer Institute 2National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health 3ETH Zrich 4RCSB Protein Data Bank, San Diego Supercomputer Center, University of California San Diego. *firstname.lastname@example.org
Poster first presented at 3DSIG 2016 in Orlando, Florida. This research was supported by the Intramural Research Program of the National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health.
The RCSB PDB is supported by the National Science Foundation [NSF DBI 0829586]; National Institute of General Medical Sciences; Office of Science, Department of Energy; National Library of Medicine; National Cancer Institute; National Institute of Neurological Disorders and Stroke; and the National Institute of Diabetes & Digestive & Kidney Diseases. The RCSB PDB is a member of the wwPDB.
AbstractUnderstanding the role and evolution of internal symmetry in protein structure is a fundamental question in structural biology. We present here CE-Symm 2.0, a key tool to address that question, which is able to detect all types of protein internal symmetry and provides a robust and intuitive sequence-to-structure analysis of all repeats. Notable features compared to the previous version1 include an optimized multiple alignment between repeats, determination of the full point group, and identification of multiple symmetry axes. We expect CE-Symm to find ample use in evolutionary studies, functional annotation, and structural classification of proteins.
1. Myers-Turnbull D, Bliven SE, Rose PW, Aziz ZK, Youkharibache P, Bourne PE, & Prli A. Systematic Detection of Internal Symmetry in Proteins Using CE-Symm. Journal of Molecular Biology, 426(11), 22552268 (2014).
2. Aravind, P. et al. Biochemistry, 48(51), 1218012190 (2009). 3. Mishra, A. et al. Progress in Biophysics and Molecular Biology,
115(1), 4251 (2014). 4. Juo, Z. S. et al. J Mol Biol 261, 239254 (1996). 5. Monod, J. et al. J Mol Biol 12, 88118 (1965). 6. Goodsell, D. S. & Olson, A. J. Annu Rev Biophys Biomol
Struct 29, 105153 (2000).
7. Gosavi, S. et al. J Mol Biol 357, 986996 (2006). 8. Fortenberry, C. et al. J Am Chem Soc 133, 1802618029
9. Neuwald, A. F. Nucleic Acids Research, 33(11), 36143628 (2005).
10. Lee, J. & Blaber, M. PNAS 108, 126130 (2011). 11. Zuccola, H. J., Filman, D. J., Coen, D. M., & Hogle, J. M.
Cell, 5(2), 267278 (2000).
12. Prli, A. et al. Bioinformatics, 28(20), 26932695 (2012).
13. Shindyalov, I. N. & Bourne, P. E. Protein Eng 11, 739747 (1998).
14. Bliven, S. E., Bourne, P. E., & Prli, A. Bioinformatics, 31(8), 13161318 (2015).
15. Guda, C., Scheeff, E. D., Bourne, P. E., & Shindyalov, I. N. Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing, 275286 (2001).
16. Kim, C. et al. BMC Bioinformatics 11, 303 (2010).
CE-Symm AvailabilityDownload & Source code: github.com/rcsb/
Levels of Symmetry
Symmetry can be analyzed at numerous levels. The most familiar is quaternary symmetry consisting of multiple identical polypeptide chains arranged in a symmetric fashion. Such symmetry is extremely common in proteins, occurring in approximately 90% of unique oligomeric structures in the Protein Data Bank (PDB).
Proteins can also have internal symmetry, when a single chain contains two or more equivalent structural repeats. The repeats generally will differ in the exact sequence, but have substantially similar structures. Internal symmetry is sometimes clarified as pseudosymmetry to reflect that the equivalence between repeats is generally at the level of residues or secondary structure elements rather than precise coordinates, as with quaternary symmetry.
Types of Symmetry
Symmetry can be classified by the types of operators that align each repeat onto the next. Closed symmetry consists of one or more pure rotational operators that form a single axis of rotation (cyclic), multiple perpendicular axes (dihedral), or more complex point groups. Open symmetry includes proteins with translational components, such as screw axes (helical), pure translation, or even superhelical cases such as solenoid proteins.
CE-Symm is able to identify any types of symmetry with a consistent orientation between all repeats. This is based on the principle that not only should the structure of the repeats be conserved but also the interfaces between repeats.
MethodsAll algorithms are included in BioJava12 version 4.2 and as a Java executable.
1.Self-alignment. A high-scoring autorotation of the structure is identified using t h e C o m b i n a t o r i a l Extension13 (CE) structural comparison method, with modifications similar to CE-CP14 to allow alignment of the first and last repeats while disallowing the trivial alignment.
2.Order Detection. The self-alignment is analyzed for patterns characteristic of open or closed symmetry to determine the number of repeats.
3.Refinement. The alignment is modified to create an initial multiple alignment between all repeats
4.Optimization. The multiple alignment is extended and optimized based on a Monte Carlo algorithm similar to CE-MC.15
5.Iteration. If the optimized multiple alignment is determined to be significant, then the repeats are recursively analyzed for additional levels of symmetry.
6.Point Group Detection. If multiple axes were identified, these are combined into a global point group for the whole structure.
Self alignment of Keap1 Kelch domain [1U6D]. (Left) Superposition upon ~60 rotation. (Right) Dot plot showing the identified alignment (red line) on the dynamic programming matrix (black indicates unfavorable scores).
Symmetry & FunctionBoth quaternary and internal symmetry are linked to a wide range of protein functions.
Ligand BindingLigands often bind near the axis of symmetry. Of symmetric domains with ligands, 63% have the ligand within 5 of the axis of symmetry; in 37% it is within 1.1
Symmetric proteins often bind symmetric ligands, such as metal ions.
DNA binding proteins often utilize symmetry. Many transcription factors are symmetric dimers and recognize palindromic sequences. The TATA binding protein (right) is an internally symmetric monomer which has evolved to recognize a non-palindromic sequence.4
AllostericRegulationCooperativity can arise from coordinated movements in symmetric subunits.5 This mechanism holds for both quaternary symmetry (e.g. in hemoglobin) and for internally symmetric proteins.6
Protein FoldingInternal symmetry can smooth the folding landscape and reduce folding time.7
Internal repeats can fold quasi-independently
Misfolding of one repeat can trigger degradation of the whole protein, unlike in quaternary symmetric complexes.
Experimental ToolsAid the computational design of large proteins8
Improve search for distant homologs9
TATA Binding Protein [1TGH]
Case Study: -Crystallin SuperfamilyThe -crystallin superfamily is primarily known for the important role of several members in eye lens, but calcium binding functions are also known to be widespread throughout the family.2,3 The core domain consists of two greek-key motifs arranged with C2 symmetry. CE-Symm is able to identify this symmetry, as well as align the conserved calcium-binding motif.
This family is also interesting due to the presence of varied domain architectures. Bovine B-crystallin contains four repeats. Sequence conservation shows that the repeats follow an ABAB pattern indicating two duplication events, consistent with the two levels of C2 symmetry identified by CE-Symm.
Cyclic (C8) Triose Phosphate
Dihedral (D2) Glyoxalase
Translational (R) Ankyrin Repeat
Helical (H3) Antifreeze Protein
Quaternary (3 chains) C3
AmtB Ammonia Channel [1U7G]
Internal (2 repeats/chain) C2
Combined (6 repeats) D3
1. Structural Self Alignment
TM-Score Asymmetry Symmetry
6. Point Group Detection
CensusAll superfamilies from SCOPe 2.06 were analyzed by CE-Symm based on a random representative. Consistent with prior results,1,16 about a quarter of domains were found to have internal symmetry or repeats.
Order Number of Superfamilies % symmetric
Asymmetric 1051 75.39%Rotational 302 21.66%
C2 237 78.48%
C3 19 6.29%
C4 12 3.97%
C5 2 0.66%
C6 8 2.65%
C7 16 5.30%
C8 8 2.65%
Dihedral 19 1.36%
D2 17 89.47%
D3 2 10.53%
Helical 7 0.50%
Translational 15 1.08%
M-crystallin from the archaea, M. acetivorans [3HZ2]. The conserved symmetric calcium-binding motif is highlighted in yellow.
Bovine B-crystallin [4GCR]. A central C2 axis is identified relating the domains, as well as C2 axes within each domain. The calcium-binding motif (yellow) of some subunits may have been lost.
EvolutionInternal symmetry can arise from quaternary symmetry by gene duplication or fusion. Thus, in addition to the many functional implications of symmetry, identifying protein symmetry can provide information about the evolutionary history of a protein. Such fission and fusion events often preserve the overall structure and function of the active complex.10
Many proteins with higher order symmetry appear to have undergone several duplication events. For instance, DNA clamps are composed of 12 structural repeats arranged in a ring. Pairs of these repeats form domains with the processivity fold, which can also be found in non-ring conformations in some species.11 Six such domains form a complete ring, but they are fused together into either two (bacteria) or three (eukaryotes, archaea, and viruses) chains.
Dimeric bacterial clamp: DNA polymerase III beta subunit from E. coli [1mmi]
Trimeric eukaryotic clamp: proliferating cell nuclear antigen [1VYM]
Trimeric clamp, colored to show the 12 structural repeats
Final alignment of 1U6D showing the six blades of the beta propeller. One residue has been deleted from the first repeat, with four residues inserted into the second.
Download this poster!
h t tp : //www. s l i d e sha r e . n e t/s b l i ve n / 3 d s i g - 2 0 1 6 - p o s t e r -exploring-internal-symmetry-and-structural-repeats-with-cesymm