14
Research review paper Computational tools for exploring sequence databases as a resource for antimicrobial peptides W.F. Porto a,c , A.S. Pires a , O.L. Franco a,b, a Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia Universidade Católica de Brasília, Brasília, DF, Brazil b S-Inova Biotech, Pós-graduação em Biotecnologia, Universidade Católica Dom Bosco, Campo Grande, MS, Brazil c Porto Reports, Brasília, DF, Brazil abstract article info Article history: Received 7 November 2016 Received in revised form 12 January 2017 Accepted 8 February 2017 Available online 12 February 2017 Data mining has been recognized by many researchers as a hot topic in different areas. In the post-genomic era, the growing number of sequences deposited in databases has been the reason why these databases have become a resource for novel biological information. In recent years, the identication of antimicrobial peptides (AMPs) in databases has gained attention. The identication of unannotated AMPs has shed some light on the distribution and evolution of AMPs and, in some cases, indicated suitable candidates for developing novel antimicrobial agents. The data mining process has been performed mainly by local alignments and/or regular expressions. Nev- ertheless, for the identication of distant homologous sequences, other techniques such as antimicrobial activity prediction and molecular modelling are required. In this context, this review addresses the tools and techniques, and also their limitations, for mining AMPs from databases. These methods could be helpful not only for the de- velopment of novel AMPs, but also for other kinds of proteins, at a higher level of structural genomics. Moreover, solving the problem of unannotated proteins could bring immeasurable benets to society, especially in the case of AMPs, which could be helpful for developing novel antimicrobial agents and combating resistant bacteria. © 2017 Elsevier Inc. All rights reserved. Keywords: Data mining Structural genomics Local alignments Prole-HMM Regular expression Antimicrobial activity prediction Molecular modelling Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 2. AMPs in databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 3. Sequence search: the simplest way . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 3.1. Local alignments and pattern matching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 3.2. All classes against one or more species . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 3.3. One class against all species . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 3.4. Additional applications for pattern matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 4. Antimicrobial activity prediction: sequence order versus sequence size variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 5. What can structure prediction reveal? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 1. Introduction The explosive growth in database data requires techniques and tools to transform the amount of data into useful information and knowledge. Data mining, which is also referred to as knowledge discovery in databases, has clearly increased in importance. Mining information from databases has been recognized by many researchers as a hot spot in different areas (Chen et al., 1996; Porto et al., 2014b). Biotechnology Advances 35 (2017) 337349 Corresponding author at: Centro de Análises Proteômicas e Bioquímicas, Pós- Graduação em Ciências Genômicas e Biotecnologia Universidade Católica de Brasília, Brasília, DF, Brazil. E-mail address: [email protected] (O.L. Franco). URL: http://www.capb.com.br (O.L. Franco). http://dx.doi.org/10.1016/j.biotechadv.2017.02.001 0734-9750/© 2017 Elsevier Inc. All rights reserved. Contents lists available at ScienceDirect Biotechnology Advances journal homepage: www.elsevier.com/locate/biotechadv

Computational tools for exploring sequence databases as a ...download.xuebalib.com/xuebalib.com.24177.pdf · Research review paper Computational tools for exploring sequence databases

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computational tools for exploring sequence databases as a ...download.xuebalib.com/xuebalib.com.24177.pdf · Research review paper Computational tools for exploring sequence databases

Biotechnology Advances 35 (2017) 337–349

Contents lists available at ScienceDirect

Biotechnology Advances

j ourna l homepage: www.e lsev ie r .com/ locate /b iotechadv

Research review paper

Computational tools for exploring sequence databases as a resource forantimicrobial peptides

W.F. Porto a,c, A.S. Pires a, O.L. Franco a,b,⁎a Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia Universidade Católica de Brasília, Brasília, DF, Brazilb S-Inova Biotech, Pós-graduação em Biotecnologia, Universidade Católica Dom Bosco, Campo Grande, MS, Brazilc Porto Reports, Brasília, DF, Brazil

⁎ Corresponding author at: Centro de Análises ProGraduação em Ciências Genômicas e Biotecnologia UniBrasília, DF, Brazil.

E-mail address: [email protected] (O.L. Franco).URL: http://www.capb.com.br (O.L. Franco).

http://dx.doi.org/10.1016/j.biotechadv.2017.02.0010734-9750/© 2017 Elsevier Inc. All rights reserved.

a b s t r a c t

a r t i c l e i n f o

Article history:Received 7 November 2016Received in revised form 12 January 2017Accepted 8 February 2017Available online 12 February 2017

Data mining has been recognized by many researchers as a hot topic in different areas. In the post-genomic era,the growing number of sequences deposited in databases has been the reasonwhy these databases have becomea resource for novel biological information. In recent years, the identification of antimicrobial peptides (AMPs) indatabases has gained attention. The identification of unannotated AMPs has shed some light on the distributionand evolution of AMPs and, in some cases, indicated suitable candidates for developing novel antimicrobialagents. The datamining process has been performedmainly by local alignments and/or regular expressions. Nev-ertheless, for the identification of distant homologous sequences, other techniques such as antimicrobial activityprediction andmolecular modelling are required. In this context, this review addresses the tools and techniques,and also their limitations, for mining AMPs from databases. These methods could be helpful not only for the de-velopment of novel AMPs, but also for other kinds of proteins, at a higher level of structural genomics. Moreover,solving the problem of unannotated proteins could bring immeasurable benefits to society, especially in the caseof AMPs, which could be helpful for developing novel antimicrobial agents and combating resistant bacteria.

© 2017 Elsevier Inc. All rights reserved.

Keywords:Data miningStructural genomicsLocal alignmentsProfile-HMMRegular expressionAntimicrobial activity predictionMolecular modelling

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3372. AMPs in databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3383. Sequence search: the simplest way . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

3.1. Local alignments and pattern matching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3383.2. All classes against one or more species . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3413.3. One class against all species . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3413.4. Additional applications for pattern matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341

4. Antimicrobial activity prediction: sequence order versus sequence size variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3425. What can structure prediction reveal? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3446. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347

teômicas e Bioquímicas, Pós-versidade Católica de Brasília,

1. Introduction

The explosive growth in database data requires techniques and toolsto transform the amount of data into useful information and knowledge.Data mining, which is also referred to as knowledge discovery indatabases, has clearly increased in importance. Mining informationfrom databases has been recognized by many researchers as a hotspot in different areas (Chen et al., 1996; Porto et al., 2014b).

Page 2: Computational tools for exploring sequence databases as a ...download.xuebalib.com/xuebalib.com.24177.pdf · Research review paper Computational tools for exploring sequence databases

338 W.F. Porto et al. / Biotechnology Advances 35 (2017) 337–349

In the post-genomic era, the growing number of sequences deposit-ed in databases has been the key reason why databases are becominga resource of novel biological information. Nowadays, the UniversalProtein Resource (UniProt) (The UniProt Consortium, 2014) has71,555,392 sequences deposited, split into 553,231 (0.8%)manually an-notated from Swiss-Prot and 71,002,161 (99.2%) automatically annotat-ed and not reviewed from TrEMBL (accessed in January, 2017).Therefore, the number of proteins from Swiss-Prot does not track thenumber of TrEMBL's, and a number of sequences are still annotated ashypothetical, unnamed or unknown sequences. In this context, theavailability of complete genome sequences has given a newaim tomod-ern biology: cataloguing all proteins that are responsible for every es-sential cellular function (Galperin and Koonin, 2012). Among the mostdiverse functions, the defence function against infectious microorgan-isms has gained attention. With the availability of the first genomesand transcript data (e.g. expression sequence tags, EST), the first reportsof small cysteine-rich sequences resembling antimicrobial peptides(AMPs) were performed (Fedorova et al., 2002; Graham et al., 2004;Mergaert et al., 2003).

AMPs are evolutionarily ancientmolecules that have been identifiedfrom diverse sources, such asmicroorganisms, plants and animals. Theyplay an important role in the innate immune system and are the firstline of defence to protect internal and external surfaces of the host -reviewed in Silva et al. (Silva et al., 2011). Overall, AMPs have an amphi-pathic and cationic structure, with a positive net charge between +3and +9, and their size can range from 12 to 100 amino acid residues.Nowadays, there are two major classifications for AMPs, the first onebeing based on the structure (Silva et al., 2011) and the second one onthe presence or absence of disulphide bonds (Brogden, 2005) (Fig. 1).

In addition to these classical classifications, in this review, AMPs areclassified according to the origins of their amino acid sequence, beingsplit into natural, encrypted1 and designed AMPs. According to this,the natural AMPs are a product of specific genes (Dürr et al., 2006;Nguyen et al., 2013; Vlasak et al., 1983; Zasloff, 1987; Zhu et al.,2012); the encrypted AMPs are those that are encrypted in larger pro-teins, which are released after proteolytic cleavage (Brand et al., 2012;Okubo et al., 2012; Papareddy et al., 2010; Sigurdardottir et al., 2006;Wang et al., 2012); and the designed ones are artificial AMPs, developedthrough rational design techniques (Cardoso et al., 2016; Landon et al.,2008; Loose et al., 2006; Wieczorek et al., 2010) (Table 1).

In the last decade, the identification of AMPs from databases hasgained attention as a branch of structural genomics and bioinformatics.Several approaches have been applied for the identification of AMPsfrom databases, including local alignments, regular expressions(REGEX), activity prediction by machine learning methods and alsothree-dimensional structure predictions (Fig. 2). These approacheshave been useful to improve the annotation of such sequences andprovide feedback from the databases (Fig. 2). For biotechnology, theseapproaches are a valuable tool, since they have the advantages of fastsequence identification and low costs, if compared to the peptidepurification process, allowing the rapid discovery of promising novelantimicrobial agents. Therefore, this review is focused on approachesand new techniques for the identification of natural and encryptedAMPs, and also their advantages and limitations.

2. AMPs in databases

Before searching for a peptide in a database, it is important to knowwhat to seek. Although the characteristics listed above are useful, theyare extremely generic. Considering themore than 60million sequencesin TrEMBL, how many sequences will have a net charge between +3and +9, 12 to 100 amino acid residues, stabilized or not by disulphidebridges? Therefore, some additional refinements are needed. These

1 The term “encrypted”was used here instead of “cryptic” in order to keep the contextof the manuscript by Brand and co-workers (Brand et al., 2012).

refinements can come fromprevious knowledge on AMP classes, arisingfrom the literature itself, or from a set of sequences from an AMPdatabase.

There are two main classes of AMP database: (i) the generalist,which holds any kind of AMP and (ii) the specific, which could be divid-ed into two subclasses: (a) databases that hold a specific class of AMP(e.g. only defensins or cyclotides) and (b) databases that hold a super-group of AMPs (e.g. only plant peptides or only cyclic peptides).

Unfortunately, a universal database with all AMP data does notexist yet, and therefore the information is split into several databases(Table 2). Recently, it was demonstrated that there is an overlap be-tween AMP databases; however, each one has a degree of uniqueness,containing some exclusive sequences (Aguilera-Mendoza et al., 2015).

3. Sequence search: the simplest way

3.1. Local alignments and pattern matching

Sequence alignment is themainmethod for comparing biological se-quences (Polyanovsky et al., 2011). Since the proteins are deposited indatabases as their primary sequences, the most convenient way tosearch for similar sequences is to use local sequence alignments. Forthis purpose, the main tools are BLAST (Altschul et al., 1997) andFASTA (Pearson, 1990), with BLAST being themost popular for identify-ing AMPs.

The search for AMPs using local alignments has been performedthrough a number of iterations of local alignments: the first iterationuses a seed sequence for performing the initial search, and then in thesecond one, the sequences retrieved from the previous iteration areused as new seed sequences. This process is repeated until no new se-quences are found. Additional filters could be used, such as the presenceof signal peptide and/or some amino acid pattern.

Despite the efficacy of the local alignment strategy for finding novelAMPs in databases (Mulvenna et al., 2006; Zhu, 2008), this approachcould fail to identify some sequences, if compared to pattern-matchingapproaches (Mulvenna et al., 2006; Porto et al., 2012c). There are twomain strategies for searching for sequences by patterns: the use of pro-file Hidden Markov Models (profile-HMM) (Eddy, 1998; Silverstein etal., 2005, 2007) or regular expressions (REGEX) (Thompson, 1968;Mulvenna et al., 2006; Porto et al., 2012c). The two methods are quitesimilar. A REGEX is a precise and succinct description of a search patternin text string format, providing a method for locating specific characterstrings embedded in character text (Thompson, 1968), where eachREGEX position may be fixed, ambiguous or wild card; and when a pro-tein sequence is described by a REGEX, it means that the REGEXmatches the protein sequence (i.e. the REGEX “[WU]IL{1,2}IA[MN]” vir-tually matches any variation of the name “WILLIAM”, starting with “W”or “U”, indicated by “[WU]”, with one or two L's, “L{1,2}” and finishingwith “M” or “N”, indicated by “[MN]”, without variation in the othercharacters). A profile-HMM is a probabilistic model, which describes amultiple sequence alignment profile (Eddy, 1998), giving the probabil-ity distribution over a potentially infinite number of sequences. SinceREGEXhas noprobabilities associatedwith its positions, it is less restric-tive than profile-HMM.

The methodology of REGEX or profile-HMM for the identification ofnovel AMPs is similar. Overall, a set of homologous sequences must bealigned, and the alignment must be submitted to a specific programsuch as Pratt (Jonassen, 1997) or HMMER (Finn et al., 2011) for theidentification of REGEX or profile-HMM, respectively. In addition, in-stead of building the pattern search, it could be retrieved from a data-base, such as Prosite for REGEX (Sigrist et al., 2013) or Pfam forprofile-HMM (Finn et al., 2014). It is important to highlight that in thecase of REGEX, it could be constructed and/or edited by hand, by usingthe amino acid physico-chemical properties, for example. Then, afterbuilding, selecting and/or editing a pattern, the search against the data-base is performed, and aswell as through local alignments, the presence

Page 3: Computational tools for exploring sequence databases as a ...download.xuebalib.com/xuebalib.com.24177.pdf · Research review paper Computational tools for exploring sequence databases

Fig. 1. Junction of twomain classifications of antimicrobial peptides and some representative structures. Indolicidin (Rozek et al., 2000) representing the unstructured peptides; magaininII (Gesell et al., 1997) representing the disulphide-free α-helical peptides; EcAMP1 (Nolde et al., 2011) representing the cysteine-stabilized α-helical peptides; brazzein (Caldwell et al.,1998) representing themixedα- and β-structure peptides; and human defensin 5 (HD5) (Szyk et al., 2006) representing the peptides composed only of β-strands. Disulphide bridges arerepresented as ball and stick.

339W.F. Porto et al. / Biotechnology Advances 35 (2017) 337–349

of a signal peptide could be used as an additional filter. Although patternmatching performs a deeper search than local alignments, it is less se-lective. Silverstein and co-workers (Silverstein et al., 2007) showedthat some sequences could scorewell against profile-HMMs fromdiffer-ent peptide families, such as thionins and snakins/GASA peptides, forexample. Therefore in the case of REGEX, each match is indicated to beanalysed by InterPro Scan (Jones et al., 2014), a resource that combinesdifferent protein signature recognition methods. This approach hasbeen appliedmainly for cysteine-stabilized peptides, due to the cysteineconservation that typifies the classes.

Table 1Sample of AMPs classified according to their origins.

Origin Peptide Source protein

Natural Magainin Precursor proteinMelittin Precursor proteinMicasin Precursor proteinLL-37 Precursor proteinPanitide L1 Precursor protein

Encrypted Q6TV81(25–52) ORF107 virion morphogenesisA5LDU0(184–211) Pseudouridine synthaseP61458(35–60) Pterin-4-alpha-carbinolamineBmLAO-f1 L-Amino acid oxidaseBmLAO-f2 L-Amino acid oxidaseBmLAO-f3 L-Amino acid oxidaseGKE LL-37LL-23 LL-37GKY25 ProthrombinVFR17 Prothrombin

Designed DEF-AcAA n/aD28 n/aPaMAP1.9 n/aIDR1018 n/a

In fact, taking into account the classification of AMPs (Fig. 1), localalignment and patternmatching could be applied for both cysteine-sta-bilized or disulphide-free peptides; however, pattern matching worksbest with cysteine-stabilized peptides (with at least six cysteines),while local alignmentworks bestwith disulphide-free peptides (naturalor encrypted). This tendency could be caused by the differences in theconservation degrees of primary and secondary structures. Due totheir ability to make disulphide bridges, cysteine residues are morestructurally conserved than others. In CSαβ-defensins, the distances be-tween cysteines in the canonical motifs CX3C and CXC are 6.1± 0.4 and

Source organism Reference

Xenopus laevis Zasloff (1987)Apis mellifera Vlasak et al. (1983)Microsporum canis Zhu et al. (2012)Homo sapiens Dürr et al. (2006)Panicum laxum Nguyen et al. (2013)Bovine papular stomatitis virus Brand et al. (2012)Streptococcus pneumoniae Brand et al. (2012)Mus musculus Brand et al. (2012)Bothropoides mattogrosensis Okubo et al. (2012)

Bothropoides mattogrosensis Okubo et al. (2012)

Bothropoides mattogrosensis Okubo et al. (2012)

Homo sapiens Sigurdardottir et al. (2006)Homo sapiens Wang et al. (2012)Homo sapiens Papareddy et al. (2010)Homo sapiens Papareddy et al. (2010)Synthetic construct Landon et al. (2008)Synthetic construct Loose et al. (2006)Synthetic construct Cardoso et al. (2016)Synthetic construct Wieczorek et al. (2010)

Page 4: Computational tools for exploring sequence databases as a ...download.xuebalib.com/xuebalib.com.24177.pdf · Research review paper Computational tools for exploring sequence databases

Fig. 2. Current methods for mining databases for generating knowledge and useful data for sequence annotation. Overall, the pipelines start at the sequence level, which is lesscomputationally costly and is suitable for filtering tons of sequences. The antimicrobial activity prediction may or may not be used; it is important for adding another layer ofconfidence, but there are few systems available. Structural methods have been used as a complement for sequence search; however, there are some reports that show that they canalso be used for the identification of remote homologous sequences.

340 W.F. Porto et al. / Biotechnology Advances 35 (2017) 337–349

6.6 ± 0.4 Å, respectively (Wu et al., 2016), while the remaining se-quence is extremely variable (e.g. in plant defensins, the cysteine resi-dues account for ~16% of identity, which would not be detectable in alocal alignment search). Therefore, pattern matching is best for suchpeptides, since we can focus only on the cysteine residues, while thelocal alignment focuses on the whole sequence. In turn, disulphide-free peptides are less conserved and do not have specific residues that

Table 2The main databases of antimicrobial peptides. Only databases with active web sites are listed.

Database Description

APD This database stores AMPs from all biological sources. They were manPubMed, PDB, Google and Swiss-Prot). Approximately 2% of the sequhttp://aps.unmc.edu/AP/main.php

DAMPD It stores AMPs retrieved from UniProt with experimental validation. Tet al., 2004). Available at http://apps.sanbi.ac.za/dampd/

YADAMP AMP data were collected from other databases and also literature seaAvailable at http://www.yadamp.unisa.it/

LAMP This database has been partitioned into three databases: experimentathe literature, UniProt or other AMP-related databases, for which cropeptides. Available at http://biotechlab.fudan.edu.cn/database/lamp/

DBAASP It stores peptides data from PubMed, concerning AMPs of different roand level of complexity (monomers, dimers and covalent-linked pep

CAMP Sequences and structural information of AMPs were retrieved from psectioned into sequence, structure and patent databases. The sequencand predicted ones. Available at http://www.camp.bicnirrh.res.in/

SATPdb This database holds structurally annotated peptides with therapeuticThe structural information about these peptides was predicted or annBank (PDB). Available at http://crdd.osdd.net/raghava/satpdb

DRAMP This database is divided into three datasets: general, patent, and clinisequence, structure, physicochemical, patent, and clinical information

ADAM This database is focused on the establishment of associations betweensearch for structure-sequence clusters, which are annotated by CATHthe information about AMPs, it also provides external links to externahttp://bioinformatics.cs.ntou.edu.tw/adam/

Hemolytik It is a manually curated database and provides a comprehensive collewere selected from other databases and also literature. Available at h

AMPer This database was built for creating hidden Markov models for recogAMSDb (Tossi and Sandri, 2002) and UniProt. It contains all major eukAvailable at http://marray.cmdr.ubc.ca/cgi-bin/amp.pl

PhytAMP It stores only antimicrobial peptides from plants, collected from literahttp://phytamp.pfba-lab-tun.org/main.php

CyBase This database is dedicated to studying proteins and peptides with a cpeptide bond, including cyclotides and bacteriocins. Available at: http

DADP This database focuses on precursors and their specific regions (signalprecursor structure is not documented, only the mature bioactive seq

MilkAMP It is dedicated to AMPs derived from milk. Sequences were retrievedkeywords such as ‘milk peptides’, ‘antimicrobial peptides’ and commat http://milkampdb.org/home.php

Peptaibol It comprises the sequences and structure information of the peptaibothe presence of an unusual amino acid, a-aminoisobutyric acid and ahttp://peptaibol.cryst.bbk.ac.uk/home.shtml

DefensinsKnowledgebase

This is dedicated to the defensin family of AMPs, including sequences(CSαβ-defensins) and plants (CSαβ-defensins). Available at http://de

BACTIBASE Specifically dedicated to Bacteriocins, bacterial AMPs that display grobacteria. All information was collected via PubMed search. Available

typify them, hence it is better to focus on their whole sequences to iden-tify the homologues.

Overall, the applications for identifying peptides in databases searcheither by one class against all species or by all classes against one ormore species. In the case of one class against all species, three classesof cysteine-stabilized peptides have been highlighted: cyclotides,CSαβ-defensins and hevein-like peptides, giving a wider vision of

Reference

ually collected and curated from the literature (viaences correspond to synthetic peptides. Available at

Wang et al. (2009)

his database replaced the ANTIMIC database (Brahmachary Seshadri et al. (2012)

rch. It contains AMPs from all available biological sources. Piotto et al. (2012)

l, predicted and patent. AMPs are collected manually fromss-links have been established. It also stores synthetic

Zhao et al. (2013)

utes of synthesis (ribosomal, nonribosomal and synthetic)tides). Available at http://dbaasp.org/

Gogoladze et al. (2014)

rotein databases such as NCBI, UniProt and PDB. CAMP ise database is split into experimentally validated peptides

Waghu et al. (2014)

activities, including antimicrobial, anticancer and antiviral.otated using state-of-the-art techniques and Protein Data

Singh et al. (2016)

cal AMPs. It harbors diverse annotations, including. Available at http://dramp.cpu-bioinfor.org/

Fan et al. (2016)

AMP sequences and structures. It offers the opportunity to, SCOP and Pfam, if available. Since it does not contain alll databases. Available at

Lee et al. (2015)

ction of hemolytic and non-hemolytic peptides. Sequencesttp://crdd.osdd.net/raghava/hemolytik/

Gautam et al. (2014)

nition of individual AMP classes. It was compiled fromaryotic AMP classes, including defensins, and cathelicidins.

Fjell et al. (2007)

ture and also the UniProt database. Available at Hammami et al. (2009)

yclic backbone, in which N- and C-terminals are linked by a://cybase.org.au/

Wang et al. (2008)

, acidic and bioactive) anuran defence peptides. If theuence is stored. Available at http://split4.pmfst.hr/dadp/

Novković et al. (2012)

from PubMed and Google Scholar using a combination ofon milk protein names, aliases and abbreviations. Available

Théolier et al. (2014)

ls (Chugh and Wallace, 2001), peptides characterized byC-terminal hydroxylated amino acid. Available at

Whitmore and Wallace(2004)

from vertebrates (α-, β- and θ-defensins), invertebratesfensins.bii.a-star.edu.sg/

Seebah et al. (2007)

wth-inhibition activity against other closely relatedat http://bactibase.pfba-lab-tun.org/main.php

Hammami et al. (2010)

Page 5: Computational tools for exploring sequence databases as a ...download.xuebalib.com/xuebalib.com.24177.pdf · Research review paper Computational tools for exploring sequence databases

341W.F. Porto et al. / Biotechnology Advances 35 (2017) 337–349

distribution of such classes across the species. While in the case ofall classes against one or more species, it has been applied in modelorganisms or others with genome or transcriptome data, giving a betterannotation of such sequences.

3.2. All classes against one or more species

The lack of adequate annotation of genomes motivated the initialprojects of identification of AMPs into databases. In the early 2000s,thefirstworkswith ESTs frommodel and/or crop plants (e.g. Arabidopsisthaliana,Medicago truncatula, Lotus japonicus, Glycine max), identified anumber of cysteine-rich peptides with resemblance to CSαβ-defensins(Fedorova et al., 2002; Graham et al., 2004; Mergaert et al., 2003).Based on this, in 2005, Silverstein and co-workers identified morethan 300 defensin-like genes in A. thaliana by means of profile-HMMsearch, in addition, the expression in several tissues was verified forsome of the 300 sequences (Silverstein et al., 2005).

The same scenario of absence of annotation was observed for othercysteine-rich peptides from other classes in 33 plant species(Silverstein et al., 2007). Silverstein and co-workers developed a seriesof profile-HMMs for a number of cysteine-stabilized peptides, includingdefensins, thionins and snakins (Silverstein et al., 2007). The searchwasperformed against ESTs, retrieving 145,721 matches (representing12,824 distinct plant cysteine-rich peptides) from a total of 4,801,711sequences (Silverstein et al., 2007). It was estimated that 24% of cyste-ine-rich proteins from A. thalianawere not annotated in the Arabidopsisgenome releases TIGR5 and TAIR6.

Until now, plants have shown more AMP classes in comparison toother studied organisms. Nevertheless it is important to emphasizethat such approaches are not restricted to plants. In 2010, Tian and co-workers (2010) described the identification of four AMP classes in thegenome of the parasitic wasp Nasonia vitripennis. They applied a com-bined strategy of BLAST, pattern search and inspection of secreted pep-tides, identifying 44 AMP sequences, of whichwere 1α-helical peptide,3 tachystatin (ICK)-type AMPs, 18 Pro- Gly- and/or His-righ peptides;and 22 defensins. Besides, the inducible expression of nine peptideswas also verified (Tian et al., 2010). Among the defensins identified,the antimicrobial activity of the defensin nasonin-1was assessed, show-ing that this peptide has a narrow spectrum of activity, being restrictedto Gram-negative bacteria (Tian et al., 2010).

Two years later, the same approach was applied in the genome ofseven ant species, and a total of 69 AMPs were identified, of which 7were crustin-like peptides, 17 Pro- Gly- and/or His-righ peptides; 19defensins, and 26 tachystatin (ICK)-type AMPs. In addition, the induc-ible expression of three peptides was described in Solenopsis invicta(Zhang and Zhu, 2012).

Nowadays, with the next generation sequencing technology, patternmatching has been used for discovering AMPs in RNASeq experiments.Recently, Cândido and co-workers (Cândido et al., 2014), by using thepatterns from Silverstein and co-workers (Silverstein et al., 2007), iden-tified three lipid transfer proteins in the spathe transcriptome fromarum lily (Zantedeschia aethiopica). These three peptides were associat-ed with the antimicrobial activity of the protein-rich fraction of thearum lily's spathe (Cândido et al., 2014). Similarly, Zhang and co-workers used a Microsoft Excel® template, which works similarly toREGEX, to search for CRPs in the transcriptome of Viola baoshanensis, re-trieving 9687 novel CRPs, including lipid transfer proteins, cyclotideprecursors (CPs), thaumatins, and a variety of zinc-finger-containingproteins (Zhang et al., 2015). Also, Ke and co-workers identified 606peptides from different classes into EST data from Brassica napus,including endochitinases and defensins, and lipid transfer proteins(Ke et al., 2015).

Due to the volume of data, annotation is themain contributionmadeby such studies, at a global level. However, for a local level, it is neces-sary to reduce the amount of data, in order to perform more detailedanalysis for individual sequences.

3.3. One class against all species

The use of database search taking into account a single class of AMPallows such detailed analysis, since there are fewer retrieved sequences(unless in exceptional cases, such as defensin-like proteins in A.thaliana). It is therefore possible to perform structural analysis andeven to evaluate the activity in vitro of discovery peptides.

In 2006, Mulvenna and co-workers, by means of automated BLASTand REGEX searches, identified a total of 23 sequences of cyclotide-like peptides (Mulvenna et al., 2006). This was the first demonstrationthat REGEX is more sensitive than local alignments, since using localalignments, 22 novel cyclotide-like peptides were found, while usingREGEX, 1 additional sequence was retrieved, together with the other22 already identified by local alignments (Mulvenna et al., 2006). Suchsequences were identified from crop plants, where until then only onecyclotide-like peptide had beendescribed (Basse, 2005). Two sequences(Bcl1, frombarley; and Rcl1, from rice) had their expression analysed byquantitative PCR, showing that their genes are expressed in a tissue-specific manner (Mulvenna et al., 2006).

In 2008, Zhu reported the identification of six novel families of fun-gal defensin-like peptides (Zhu, 2008). The amino acid sequence ofplectasin, the first described fungal defensin (Mygind et al., 2005),was used as a seed sequence for a TBLASTN search against a databasecontaining 53 fungal species (Zhu, 2008). As a result, 18 genes encoding25 defensin-like peptides were identified, with 5 genes having beenpreviously annotated as hypothetical proteins. Using this information,the phylogenetic tree of CSαβ-defensins was drawn, indicating thatthe fungal defensins have a common ancestor with plant and insectdefensins (Zhu, 2008).

In 2012, Porto and co-workers applied REGEX to search for novelhevein-like peptides (Porto et al., 2012c). This strategy was used forthe identification of novel hevein-like peptides, since the local align-ment search did not retrieve novel sequences (Porto et al., 2012c). Theresults from the local alignment search were used to build a REGEX,and then it was combined with the chitin binding motif from Prosite.Four novel hevein-like peptide sequences were retrieved, showingonce more that REGEX is more sensitive than local alignments. It is im-portant to highlight that one of the novel heveins was retrieved fromthe phytopathogenic fungus Phaeosphaeria nodorum; until then, thehevein-like peptides had been restricted to plants.

These reports were the first in their respective classes; however, forcyclotides and CSαβ-defensins, other searches were done using up-dated databases (Table 3). These additional reports, in addition toreporting novel sequences, demonstrated tissue expression, the biolog-ical activities of such sequences, insights into the mechanism of action,and/or the phylogenetic relationships. Thus, in the case of one classagainst all species, it is important to perform the search periodically asthe database grows.

3.4. Additional applications for pattern matching

Besides the identification of AMPs, there are other uses for patternmatching related to AMPs. REGEX could be used to explore the varietyof functions (Silva et al., 2012) or even to design novel AMPs (Loose etal., 2006).

Since some peptides have limited identity with others, but theiramino acids present similar physicochemical properties, it is possibleto draw a REGEX expanding the residues by their properties and thenperform the identification of distantly related peptides in databases(Silva et al., 2012; Tomita et al., 2008). This approach was used to ana-lyse themultifunctional activity of Cn-AMP1, where initially a similaritywas identified to another multifunctional peptide, alloferon-1(Chernysh et al., 2002), and then, based on a pairwise alignment andphysicochemical properties, a REGEX was designed. Through thisREGEX, two other multifunctional peptides, in addition to alloferon-1,were retrieved from APD, and in all cases, the REGEX was matched

Page 6: Computational tools for exploring sequence databases as a ...download.xuebalib.com/xuebalib.com.24177.pdf · Research review paper Computational tools for exploring sequence databases

Table 3Updates for cyclotides and CSαβ-defensins identified from databases.

Peptide class Approach Outcomea Main findings Reference

CSαβ-Defensins Localalignments

25 fungal defensin-like sequences (i) The evolutionary inference of the origin of CSαβ Defensins and (ii) itsclassification in six defensin families.

Zhu (2008)

Localalignments

13 fungal defensin-like sequences (i) The demonstration of micasin antimicrobial activity by means of anon-disrupting membrane mechanism.

Zhu et al. (2012)

Localalignments

132 typical defensin-likesequences and 63 atypical ones(λ-CSαβ-Defensins)

(i) The identification of distant homologous sequences of CSαβ-Defensins withaltered fold and cysteine framework in Zymoseptoria genus.

Wu et al. (2016)

Cyclotides Localalignmentsand REGEX

23 cyclotide-like sequences (i) The dissemination of cyclotide-like genes in crop plants and (ii) theexpression of Bcl1 and Rcl1 in barley and rice, respectively.

Mulvenna et al. (2006)

Localalignments

11 cyclotide-like sequences (i) The first report of cyclotide-like peptides from Poaceae in protein level and(ii) their cytotoxicity and antimicrobial activities.

Nguyen et al. (2013)

CyPerl andCyExcelb

145 cyclotide-like sequences (i) The identification of cyclotide-like proteins in seven additional plantfamilies.

Zhang et al. (2015)a

REGEX 6 cyclotide-like sequences (i) The evolutionary inference of the origin of cyclotides by convergentevolution and/or lateral gene transfer; and (ii) the expression of thecyclotide-like peptides in maize roots.

Porto et al. (2016)

a The numbers are non-cumulative.b Such methods work similarly to REGEX.

342 W.F. Porto et al. / Biotechnology Advances 35 (2017) 337–349

with theN-terminal portion, indicating that themultifunctionality is re-lated to this portion (Silva et al., 2012).

The versatility of pattern matching was also demonstrated by Looseand co-workers (Loose et al., 2006). They developed amethod for ratio-nal design of AMPs based on REGEX. Around 700 regular expressionsfrom the antimicrobial peptides database were extracted, composingthe “language of AMPs” (Loose et al., 2006). All the combinations ofamino acids covering the developed regular expressions were writtenout, computing 12 million sequences. From these, a set of 41 peptides,retrieved after redundancy removal of 70%,were tested against bacteria,18 being active AMPs (Loose et al., 2006). They clearly demonstratedthat the use of REGEX could extrapolate the content of sequencedatabases, generating novel sequences which will feed the databaseagain.

In fact, the above works showed that local alignments and patternmatching are useful techniques to discover and improve the annotationof AMPs in databases, although these techniques are not restricted tothis purpose. However, it is important to highlight that AMPs can per-form different functions under different environmental conditions(Franco, 2011; Phoenix et al., 2013). This multifunctional charactercould be a drawback in the identification of peptideswith an antimicro-bial function from their sequences. Despite the efficient identification ofnovel members of AMP classes, due to their multiple functions, bothmethods have the limitation of not ensuring that a matched sequencecorresponds to an AMP. Therefore, they should be accompanied atleast by an antimicrobial activity prediction (Porto et al., 2012c,2014a) or, preferably, an antimicrobial assay, as described by Zhu andco-workers in the case of micasin (Zhu et al., 2012).

4. Antimicrobial activity prediction: sequence order versus sequencesize variation

AMPs can perform different functions under different environmen-tal conditions, an ability also known as “peptide promiscuity”. Thereare two levels of multifunctionality, where one peptide could havemul-tiple functions (pure promiscuity) or the peptide family may havemembers with different functions (family promiscuity) (Franco,2011). This situation means that some members of a peptide familydo not have antimicrobial activity. Therefore, this putative peptidecould be identified using local alignments and/or pattern matching;however, it would not have antimicrobial activity.

The direct antimicrobial assay is the most suitable way to addressthis issue; however, it can be impracticable in some cases, due to thenumber of sequences retrieved from database search, as observed in

Silverstein and co-workers (Silverstein et al., 2007), where more than12,000 sequenceswere retrieved. Therefore, computational tools are re-quired for predicting their antimicrobial activity prior to synthesis.

In this context, supervised machine learning methods have beenused for developing models for predicting antimicrobial activity. Sincethis review is focused on application tomining databases, it does not ad-dress the construction of prediction models - reviewed previously(Porto et al., 2012b; Randou et al., 2013; Rondón-Villarreal et al.,2014). Currently, in practical terms, there are two distinct strategies ofprediction, which take into account either (i) the sequence order or(ii) the sequence size variation. If the sequence order is selected, thesize cannot vary, since the prediction is made based on the frequencyof amino acids in each position; on the other hand, if sequence size var-iation is selected, the sequences are converted into descriptors. Thesedescriptors might be physico-chemical properties, which in turn donot take into account the sequence order, since shuffled sequencescould have the same descriptor values as the ordered sequence (Looseet al., 2006).

Taking this into account, AMP classifications are crucial to select asuitable prediction strategy. Currently, there are three prediction sys-tems available as standalone programs or web servers, and these areAntiBP (Lata et al., 2007, 2010), CAMP (Thomas et al., 2010; Waghu etal., 2014), CS-AMPPred (Porto et al., 2010, 2012a), ClassAMP (Josephet al., 2012), iAMP-2L (Xiao et al., 2013) and ADAM (Lee et al., 2015)(Table 4).

Although these predictors have good accuracies, it is not possible toperform a direct database search with them, except in cases ofencrypted AMPs. Brand and co-workers identified the encrypted pep-tides by means of the in-house predictor Kamal (Table 1). Two se-quences, P61458(35-60) and A5LDU0 (184-211), showed potential toinhibit the fixation of the phytopathogen Phakopsora pachyrhizi intosoybean leaves (Brand et al., 2012).

However, the prediction systems were constructed taking into ac-count only the mature peptide (Table 4), but in databases the peptidesare found as their precursor sequences. The precursor sequences addanother layer of complexity to the prediction, since they have severalkinds of arrangements, from the simplest one, with a signal peptideand AMP, to very complex arrangements, with tandem repetitions ofAMPs (Pinto et al., 2011; Zhu, 2008). In addition, the heterogeneity ofAMPs seems to be another great challenge in this field, since AMPs com-pose a group with different sequences, structures and mechanisms ofaction. Despite this, such prediction methods have been applied as acomplement to sequence search (Cândido et al., 2014; Porto et al.,2012c, 2014a).

Page 7: Computational tools for exploring sequence databases as a ...download.xuebalib.com/xuebalib.com.24177.pdf · Research review paper Computational tools for exploring sequence databases

Table 4The main online available systems for antimicrobial peptide prediction. Only databases with active web sites are listed.

Predictor Methodology Advantage Disadvantage Reference

AntiBP This supports the algorithms support vector machine(SVM), artificial neural network (ANN) andquantitative matrices (QM). It uses binary patterns,where each amino acid is represented by one binarypattern and uses a sliding window of 15 residues.Combining the 15 residues from N- and C-terminals,the system achieved 92.11% of accuracy. In the secondversion, only SVM was used. The N + C-terminalapproach showed a drop in accuracy against a blinddata set (92.11 to 91.64%). However, the last versionwas more reliable, since it was trained with moresequences than the previous version. AntiBP isavailable as an online server atbhttp://www.imtech.res.in/raghava/antibp/N.

It takes into account the order ofamino acids in the sequence. As itworks in windows of 15 residues, itcould be useful for identification ofencrypted peptides.

It is not applicable for largersequences and Natural AMPs. Besides,this predictor is generalist and wouldnot be accurate in specific cases.

Lata et al. (2007, 2010)

CAMP This supports the algorithms random forest (RF),discriminant analysis (DA) and also SVM. This systemuses 275 features, including composition,physicochemical properties and structuralcharacteristics of each amino acid. RF showed the finestaccuracy (93.2%), followed by SVM (91.5%) and DA(87.5%), against a blind data set. In its new version, atotal of 64 features and a larger training data set wasused, resulting in a slight improvement in theaccuracies of RF (93.4%) and SVM (92.6%), while DAshowed a slight drop in accuracy (86.9%) against ablind data set. CAMP is available as an online server atbhttp://www.camp.bicnirrh.res.in/N.

Since it can handle sequences withdifferent sizes, it is appropriate forpredicting peptides that belong to aclass instead of encrypted AMPs.

It does not take into account the orderof the sequence; therefore, asequence with a similar compositionto an AMP would have the samevalues for each descriptor, generatingfalse positives.

Thomas et al. (2010),Waghu et al. (2014)

CS-AMPPred This supports only SVM. The first version ofCS-AMMPred was based on physico-chemicalproperties, which allowed the prediction of sequencesof different sizes. The hydrophobic moment wasincluded in the system, since the modification of asequence modifies the hydrophobic moment. Thisapproach resulted in an overall accuracy of 83% againsta blind data set. Then, the system was rebuilt,generating the CS-AMPPred itself. The updated versionwas implemented with five descriptors and thehydrophobic moment was removed. The updatedimplementation showed an accuracy of 90%, which incomparison to other methods, using the same blinddata set, was the highest accuracy forcysteine-stabilized peptides. CS-AMPPred is availableas standalone softwarebhttp://sourceforge.net/projects/csamppred/N.

Since it can handle sequences withdifferent sizes, it is appropriate forpredicting peptides that belong to aclass instead of encrypted AMPs.Besides, it is specific for predictingcysteine-stabilized peptides.

It does not take into account the orderof the sequence; therefore, asequence with a similar compositionto an AMP would have the samevalues for each descriptor, generatingfalse positives. Besides, it cannot beused for linear peptides.

Porto et al. (2010,2012a)

ClassAMP This supports the algorithms SVM and RF and predictsthe propensity for the peptide to be antibacterial,antifungal or antiviral. Three one-against-all predictionmodels were created, where sequences from one classwere taken as positive and the other two as negative(i.e. for antibacterial, the negative set is composed ofantifungal and antiviral peptides). The sequences arepredicted by the three models and the class is assignedaccording to the majority of votes. All models haveaccuracies higher than 90%. It is available as an onlineserver at bhttp://www.bicnirrh.res.in/classamp/N

It predicts AMP's propensity to haveantibacterial, antifungal, or antiviralactivity. There are no limitations inthe size of the sequence.

The main bias for this server is that anactivity among antibacterial,antifungal and antiviral is alwaysreturned, in other words it lacks a“non-antimicrobial” classification (i.e.an inactive peptide would bepredicted in some of these classes,even being inactive). Besides, it doesnot predict dual activity (e.g.antibacterial and antiviral)

Joseph et al. (2012)

iAMP-2L This supports the algorithm fuzzy K-nearest neighbour(FKNN) and it was trained using a pseudo amino acidcomposition strategy (Chou, 2001). The sequenceswere decomposed in 40 descriptors, the first being 20,the frequency of each amino acid and the other 20, thecombinations between five physicochemical properties(hydrophobicity, pK1, pK2, pI at 25 °C and molecularweight) and four tiers of correlation factors, (5 × 4 =20). The tiers of correlation factors are responsible foradding the contribution of the order of the sequence.The system has a two-level prediction (this is thereason for the “2L” in the name), in the first level, itpredicts the sequence as AMP or non-AMP, withaccuracy of 86.32%; in case of positive prediction,multilevel prediction is performed, predicting the kindof activity with 66% of accuracy. The system is availableas an online server at bhttp://jci-bioinfo.cn/iAMP-2LN

It takes into account the order of thesequence and, similar to ClassAMP, italso predicts which kind(s) ofactivity(ies) the peptide has (e.g.antibacterial, antifungal, anticancer,anti-HIV and antiviral). There are nolimitations in the size of thesequence.

One half of sequence descriptors arebased on amino acid frequencies,which somehow makes a biasedprediction, since peptides withsimilar frequencies could differ inactivity (Loose et al., 2006).

Xiao et al. (2013)

ADAM This supports the algorithm SVM. The system wastrained using the amino acid compositions as the

Since it can handle sequences withdifferent sizes, it is appropriate for

It does not take into account the orderof the sequence; therefore, a

Lee et al. (2015)

(continued on next page)

343W.F. Porto et al. / Biotechnology Advances 35 (2017) 337–349

Page 8: Computational tools for exploring sequence databases as a ...download.xuebalib.com/xuebalib.com.24177.pdf · Research review paper Computational tools for exploring sequence databases

Table 4 (continued)

Predictor Methodology Advantage Disadvantage Reference

learning features. There is no information about theaccuracy. However, we calculated the accuracy ofADAM SVM using the blind data sets from iAMP-2L andCS-AMMPred. The system showed accuracies of 88.04%and 91.33%, respectively. It is available as an onlineserver atbhttp://bioinformatics.cs.ntou.edu.tw/adam/tool.htmlN

predicting peptides that belong to aclass instead of encrypted AMPs.

sequence with a similar compositionto an AMP would have the samevalues for each descriptor, generatingfalse positives.

344 W.F. Porto et al. / Biotechnology Advances 35 (2017) 337–349

5. What can structure prediction reveal?

Molecular modelling can be used in the search for peptides that be-long to classes already described, and coupled with other methodolo-gies, such as the prediction of antimicrobial and/or ligand prediction,since some AMPs could bind specific cell targets (Cândido et al., 2014;Mandal et al., 2013; Porto et al., 2012c; Schmitt et al., 2010; Schneideret al., 2010; van der Weerden et al., 2010). However, instead of beingused for direct identification of AMPs in databases, the structure predic-tion methods have been used only for further validation in a patternand/or local alignment search (Cândido et al., 2014; Mulvenna et al.,2006; Porto et al., 2012c, 2014a; Zhu, 2008). This section presentssome applications of molecular modelling and molecular dynamics inthe search and identification of sequences in databases, as well astheir possible application in the identification of AMPs.

By means of molecular modelling, it is possible to identify homolo-gous sequences with little identity between them, since they havegreat structural conservation (Tomczak et al., 2012). Based on this,modelling enables the identification of those similarities and could bea useful tool in the identification of AMPs in databases (Porto et al.,2014b). For this purpose, two techniques are indicated: threading andab initiomodelling (Table 5).

On the one hand, threading techniques predict the 3D structure of aquery sequence using experimentally determined structures as tem-plates, without relying on sequence similarity, since the structural sim-ilarity of distant homologous proteins is often preserved even with lowsequence identity (Gille et al., 2000). On the other hand, ab initiomodel-ling, also known as de novo and free modelling, is capable of predicting

Table 5Summary of principal threading and ab initiomodelling methods and respective method descr

Modellingmethod

Modellingserver

Description

Threading FUGUE It uses structural profiles from HOMSTRAD (Mizuguchi et al.,template is selected by a dynamic programming algorithm.

PROSPECT2 It uses secondary structure propensity, solvent accessibility, ralignments.

SPARKS2 It uses single-body knowledge-based statistical potential linkSP3 It uses structural protein fragments to make sequence profiles

structure-derived sequence that improves alignments.SAM-T02 It performs a PSI-BLAST with a query sequence. From multiple

will be used by the Viterbi algorithm to find the best templateHHSEARCH It generates a HMM profile to query and templates. HHM profiLOMETS It uses all the threading methods described above. In addition

additional methods developed for its implementation: PPA-I,secondary structure match; PPA-II, which works similarly to PSAM-T99 sequence alignments; and PAINT, which uses energtemplates is done by the structural consensus of all threading2004).

I-TASSER Initially it uses LOMETS to find best templates in PDB. These slibrary generates full-length models.

3D–Jury It uses several servers to make initial structural databank. Thepair-to-pair to find best overlap. Best overlaps are then analysthe same fold.

Ab initio QUARK The query is broken in sliding windows of 20 residues that arReplica-exchange Monte Carlo simulations are then used to a

ROSETTA The query sequence and known structural sequences are brokevaluated for distance and similarity to make final structures.

PEP-FOLD It makes simulations to generate several models of the query sbest clusters are selected and returned.

the protein structure from scratch, by using a designed energy functionto guide the conformational search (Lee et al., 2009). However, it is im-portant to highlight that for ab initio modelling, an additional step ofstructural alignment is required, using other methods such as DALIServer (Holm and Rosenström, 2010) and/or COFACTOR (Roy et al.,2012), in order to make comparisons between predicted and solvedstructures, giving insights into the function of proteins, and to providethe identification of distant homologues by structural alignments. Forboth modelling methods, the subsequent use of molecular dynamicssimulations is crucial to evaluate themolecularmodels, addingmore re-liability to functional predictions (Porto et al., 2014b),mainly in the caseof ab initio models, which has a success rate between 20 and 25% forconstructing good quality models (Porto et al., 2014b; Rigden, 2011).

Molecular dynamics can be defined as computer simulation of mol-ecules using parameters based on basic physical laws. In practice, thisenables the evaluation of structural modifications of proteins overtime, as well as flexibility changes and movements of the differentatoms or molecules, making it possible also to access temporal statesof structure. Thus, this methodology can broadly be applied with theaim of validating structural predictions, being possible to addmore reli-ability to the generated data (Porto et al., 2012c, 2014a, 2014b; Tomczaket al., 2012). Nevertheless, this technique requires high computationalpower, once it demands a long computational time,mainly in larger sys-tems. This drawback can hinder the simulation, because it is only possi-ble to simulate a few nanoseconds or microseconds (Klepeis et al.,2009). Despite this limitation, some works have used ab initio and/orthreading molecular modelling, structural alignment and moleculardynamics with the aim of validating structures (Table 6).

iption.

Reference

1998) to find the best alignment with the query. The best Shi et al. (2001)

esidue mutation and pairwise contact potential to optimize Xu and Xu (2000)

ed with sequence alignments. Zhou and Zhou (2004). These sequence profiles are used to make a Zhou and Zhou (2005)

alignment of obtained sequences, it generates a HMM that.

Karplus et al. (2003)

les are aligned and the best alignment is selected. Söding (2005)to methods described above, LOMETS has also threewhich uses alignments of sequence linked with predictedPA-I, however it takes structural information fromy function to improve alignments. The selection of the bestmethods calculated by TM-Score (Zhang and Skolnick,

Wu and Zhang (2007)

tructures are fragmented and an assembly of the fragment Zhang (2008)

se first structures are compared by C-alpha atomed by MaxSub tool (Siew et al., 2000) to find the pair with

Ginalski et al. (2003)

e modelled separately from experimental structures.ssemble fragments and generate full-length structures.

Xu and Zhang (2012)

en into fragments that are aligned. These alignments are Simons et al. (1997)

equence. These models are grouped in clusters and the five Thevenet et al. (2012)

Page 9: Computational tools for exploring sequence databases as a ...download.xuebalib.com/xuebalib.com.24177.pdf · Research review paper Computational tools for exploring sequence databases

Table 6Summary of methodologies that use molecular modelling for protein identification.

Approach Methodology description Outcome Reference

Identification of DUFs with DNAbinding properties by ab initiomodelling.

Firstly, all DUFs from Pfam were collected; sequenceswere selected with absence of transmembrane region,length from 30 to 100 amino acids and DNA relateddomains. Structural prediction was carried in ROSETTA(Simons et al., 1997); and function was determined byDNA_BIND analyses (Szilágyi and Skolnick, 2006).

It identified 32 DNA binding domains among theDUF proteins.

(Rigden, 2011)

Identification of novel humanchemokines by means of threading.

It used UniProt Knowledgebase related to Homo sapienswithout functional annotation and sequences containingtwo or more cysteines as initial database. All sequenceswith (i) more than 30% similarity with PDB proteins, (ii)without signal peptide, (iii) with transmembrane regionsand (iv) less than 55 amino acids residues were removed.Threading alignments were carried with 270 structuraltemplates with IL8-like chemokine fold extracted by PDB.The remaining sequences were analysed by Interpro Scan;molecular modelling was done by modeler; and moleculardynamics was used for molecular modelling validation.

It identified two novel chemokines with noveldisulphide bridges arrangement.

Tomczak et al. (2012)

Functional identification ofhypothetical proteins fromEscherichia coli by means ofthreading and ab initio modelling.

The non-redundant NCBI database was used. Sequenceswith (i) 30 to 100 amino acids, (ii) no transmembraneportions, (iii) absence of similar PDB structures (less than30% of identity), (iv) similarity to Eukaryote proteins(with more than 40% of identity), (v) absence ofconserved domains, (vi) prediction to be expressed and(vii) without disordered regions were selected. Themolecular modelling (threading or ab initio) was done byLOMETS (Wu and Zhang, 2007) and QUARK (Xu andZhang, 2012), respectively; structural alignments wereperformed by DALI Server (Holm and Rosenström, 2010)and COFACTOR (Roy et al., 2012). Molecular dynamics wasused for structural validation.

It predicted the function of three sequences: (i)one with remote homology with cupredoxin; (ii)one β-barrel family; and (iii) one lipid-bindingprotein.

Porto et al. (2014b)

345W.F. Porto et al. / Biotechnology Advances 35 (2017) 337–349

In addition, several reports showed that some AMP classes have re-semblances to other ones (Mandal et al., 2013; Porto and Franco, 2013;Singh et al., 2016; Yeung et al., 2016), comparing different solved struc-tures or predicted and solved structures (Fig. 3).

Furthermore, studies have shown that despite variation in the se-quences in many cases the structure is conserved. In CSαβ-defensins,for example, it was shown that it is possible for a fourth and a fifth di-sulphide bridge to be present. These additional disulphide bridges arevariable and could be between different cysteines along the structure,depending on the peptide (Fig. 4) (Janssen et al., 2003; Zhu, 2008).However, despite this variation in sequence level, there is no structuralmodification and the structural motifs of the family are maintained(Janssen et al., 2003; Zhu, 2008). Similarly, other families of antimicro-bial peptides show structural congruence despite sequence variation.Thionins (α andβ), for example, show similar folding despite the differ-ences at sequence level, and indeed this group of proteins can havethree or four disulphide bridges but with the same fold (Fig. 4)(Rao etal., 1995; Romagnoli et al., 2000; Stec et al., 1995). Cyclotides, however,present the maintenance of the same disulphide pattern as well asstructural similarity (Barry et al., 2004; Felizmenio-Quimio et al.,2001; Jennings et al., 2005; Mulvenna et al., 2005) (Fig. 4). Taking thisinformation into account, it could be possible to identify novel cyste-ine-stabilized peptides by means of the methodology described byTomczak and co-workers (Tomczak et al., 2012) (Table 6).

However, some features of AMPs may hamper the use of this meth-odology, as occurs with the antimicrobial activity prediction methods.Pre and pro regions, for example, may make it difficult to predict thecorrect structure of these peptides. This happens due to the absence ofreliable in silico identification of these regions. Other great limitationsare the posttranslational modifications such as cyclization, thioether(monosulphide) and disulphide bridges, and this occurs because the ac-curate prediction and modelling of peptides is difficult and sometimesunfeasible. In some cases, the establishment of a correct disulphidebridge pattern could be very difficult, since there are a huge numberof possible combinations between cysteines. The case of snakin-1 is,perhaps, the most emblematic example of such a limitation. The first

structural insights weremade by ab initiomolecular modelling togetherwith disulphide bridges prediction, where the structure was predictedto contain an N-terminal helix-turn-helix motif and a C-terminal ex-tended loop (Porto and Franco, 2013). The great problem with snakin-1 is the fact that it contains 12 cysteine residues, which could generate66 possibilities of disulphide connections; and, therefore, the possibilityof an erroneous prediction of all disulphide bridges is 98.5%. Meantime,in a laterwork, three disulphide bridgeswere solved bymass spectrom-etry analyses, with only one matching the prediction (Harris et al.,2014). And recently, the full structure of snakin-1was elucidated, show-ing that only two predicted bridges were correct (Yeung et al., 2016).However, it is important to highlight that the crystal and the predictedstructures share the helix-turn-helix motif, indicating that the predic-tion was not totally wrong. Therefore, ab initio molecular modellingcould also be used to verify structural similarity. Hence, the applicationof molecularmodelling and dynamics can be useful in the identificationof AMPs in databases.

Another drawback to applying molecular modelling in searching forAMPs is that different functions may be performed by orthologues withsimilar structures; and the same structure may perform multipleactivities (Franco, 2011; Phoenix et al., 2013). Therefore, due to a directcorrelation between structure and function, the identification of a se-quencewith similar AMP structure does not mean that such a sequenceis actually an AMP.

Thus, despite the current applications of modelling strategies andstructural validation only to support other techniques, why not usesuch approaches for AMPs? In fact, the methods described in Table 6could be applied to AMPs, due to the structural resemblance amongthe classes (Mandal et al., 2013; Porto and Franco, 2013; Singh et al.,2016; Yeung et al., 2016). However, it is important to emphasize thatin the search for AMPs the presence of precursor sequences can skewsuch techniques, hindering their wider application. The non-accurateprediction of presence and extension of precursors makes the correctmodelling and structural analysis of proteins difficult. Thus, despitetheir disadvantages, these methodologies could be used in anotherway to identify AMPs in databases.

Page 10: Computational tools for exploring sequence databases as a ...download.xuebalib.com/xuebalib.com.24177.pdf · Research review paper Computational tools for exploring sequence databases

Fig. 3. Structural resemblance of antimicrobial peptides. (Top) The predicted structure of Ps-AFP1 (αβ-trumpet) (Mandal et al., 2013) and the structures of plectasin (Mygind et al., 2005)and brazzein (Caldwell et al., 1998) (CSαβ-defensins), sharing the α-helix and the β-strands; (Middle) the structures of laterosporulin (bacteriocin) (Singh et al., 2016) and human α-defensin 1 (HD5) (Szyk et al., 2006), sharing the β-strands; and (Bottom) the structure of snakin-1 (snakin/GASA) (Yeung et al., 2016) and the structures of EcAMP1 (α-helicalhairpin) (Nolde et al., 2011) and Viscotoxin A3 (thionin) (Romagnoli et al., 2000), sharing the helix-turn-helix motif. Disulphide bridges are represented in ball and stick.

346 W.F. Porto et al. / Biotechnology Advances 35 (2017) 337–349

6. Conclusions

There is still a great deal of research to be donewithunannotated an-timicrobial peptides in databases. This branch of structural genomics isjust beginning to be explored, and it could be seen as an opportunityto integratemultiple techniques from computational sciences, structur-al biology, biochemistry and molecular biology.

The future challenges in this field are the development ofmore accu-rate prediction systems, which could take into account the length of thesequence and its amino acid order; and also the direct application ofstructural prediction for identification of newmolecules. Such methodswill be helpful not only for novel AMP development, but also for otherkinds of proteins, at a higher level of structural genomics.

Such improvements could be ready to lend a hand in the identifica-tion of AMPs with complex precursor organizations (e.g. withpropeptides before and/or after the mature sequence, or even tandemrepeats or the mature peptide). On the other hand, the identification

of unannotated AMPs with a simple precursor organization is perfectlyapplicable by means of the approaches discussed here (i.e. as in thecase of MdesDEF-2 (Porto et al., 2014a) or hevein-like peptides (Portoet al., 2012c), where the removal of the signal peptide releases the ma-ture peptide).

Indeed, researchers must continue to try to find better and more ef-ficient strategies for helping to solve the problem of unannotated pro-teins, which could bring immeasurable benefits to society. This isespecially the case of AMPs, which could be helpful in developingnovel antimicrobial agents and combating resistant bacteria. It hasrecently been estimated that, annually, about 30 million sepsiscases occur worldwide, 20 million being severe cases, with, potentially,5 million deaths (Fleischmann et al., 2016). Although this kind of re-search is just beginning, it shows great potential for developing novelantimicrobial agents. The best examples so far are the works of Zhuand co-workers (Zhu et al., 2012) and Brand and co-workers (Brandet al., 2012), which started with a database search and finished by

Page 11: Computational tools for exploring sequence databases as a ...download.xuebalib.com/xuebalib.com.24177.pdf · Research review paper Computational tools for exploring sequence databases

Fig. 4. Alignment between different members of three classes of AMPs with structural conservancy despite sequence variation. CSαβ-defensins are shown on top, with sequences fromplants (PhD1, PDB ID:1N4N) (Janssen et al., 2003), insects (Drosomycin, PDB ID:1MYN) (Landon et al., 1997) and fungi (Plectasin, PDB ID:1ZFU) (Mygind et al., 2005). Plant α- and β-thionins are shown in the middle, represented by α-1-purothionin (PDB ID:2PLH) (Rao et al., 1995), β-purothionin (PDB ID:1BHP) (Stec et al., 1995) and Viscotoxin A3 (PDB ID:1ED0)(Romagnoli et al., 2000). Plant cyclotides are shown at the bottom, aligned in the cyclic cysteine knot topology, represented by bracelet members tricyclon A (PDB ID: 1YP8)(Mulvenna et al., 2005) and palicourein (PDB ID:1R1F) (Barry et al., 2004), Möebius member kalata B2 (PDB ID:1PT4) (Jennings et al., 2005); and the Trypsin inhibitor MCoTI-II (PDBID:1IB9) (Felizmenio-Quimio et al., 2001). Cysteine residues are highlighted in grey. The conserved disulphide bridges are represented by solid lines and the additional bridges are bydotted lines.

347W.F. Porto et al. / Biotechnology Advances 35 (2017) 337–349

demonstrating the potential application of micasin and P61458(35-60)and A5LDU0 (184-211), respectively. Therefore, in the near future, re-search in databases could be a key to developing new antimicrobialagents.

References

Aguilera-Mendoza, L., Marrero-Ponce, Y., Tellez-Ibarra, R., Llorente-Quesada, M.T.,Salgado, J., Barigye, S.J., Liu, J., 2015. Overlap and diversity in antimicrobial peptide da-tabases: compiling a non-redundant set of sequences. Bioinformatics 31:2553–2559.http://dx.doi.org/10.1093/bioinformatics/btv180.

Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller,W., Lipman, D.J., 1997.Gapped BLAST and PSI-BLAST: a new generation of protein database search pro-grams. Nucleic Acids Res. 25:3389–3402. http://dx.doi.org/10.1093/nar/25.17.3389.

Barry, D.G., Daly, N.L., Bokesch, H.R., Gustafson, K.R., Craik, D.J., 2004. Solution structure ofthe cyclotide palicourein: implications for the development of a pharmaceuticalframework. Structure 12, 85–94.

Basse, C.W., 2005. Dissecting defense-related and developmental transcriptional re-sponses of maize during Ustilago maydis infection and subsequent tumor formation.Plant Physiol. 138:1774–1784. http://dx.doi.org/10.1104/pp.105.061200.

Brand, G.D., Magalhães, M.T.Q., Tinoco, M.L.P., Aragão, F.J.L., Nicoli, J., Kelly, S.M., Cooper,A., Bloch, C., 2012. Probing protein sequences as sources for encrypted antimicrobialpeptides. PLoS One 7, e45848. http://dx.doi.org/10.1371/journal.pone.0045848.

Brogden, K.A., 2005. Antimicrobial peptides: pore formers or metabolic inhibitors in bac-teria? Nat. Rev. Microbiol. 3:238–250. http://dx.doi.org/10.1038/nrmicro1098.

Caldwell, J.E., Abildgaard, F., Dzakula, Z., Ming, D., Hellekant, G., Markley, J.L., 1998. Solu-tion structure of the thermostable sweet-tasting protein brazzein. Nat. Struct. Biol.5, 427–431.

Cândido, E. de S., Fernandes, G. da R., de Alencar, S.A., Cardoso, M.H. e S., Lima, S.M. de F.,Miranda, V. de J., Porto, W.F., Nolasco, D.O., de Oliveira-Júnior, N.G., Barbosa, A.E.A. deD., Pogue, R.E., Rezende, T.M.B., Dias, S.C., Franco, O.L., 2014. Shedding some light overthe floral metabolism by arum lily (Zantedeschia aethiopica) spathe de novo tran-scriptome assembly. PLoS One 9, e90487. http://dx.doi.org/10.1371/journal.pone.0090487.

Cardoso, M.H., Ribeiro, S.M., Nolasco, D.O., de la Fuente-Núñez, C., Felício, M.R., Gonçalves,S., Matos, C.O., Liao, L.M., Santos, N.C., Hancock, R.E.W., Franco, O.L., Migliolo, L., 2016.A polyalanine peptide derived from polar fish with anti-infectious activities. Sci. Rep.6:21385. http://dx.doi.org/10.1038/srep21385.

Chen, M.S., Han, J., Yu, P.S., 1996. Data mining: an overview from a database perspective.IEEE Trans. Knowl. Data Eng. 8:866–883. http://dx.doi.org/10.1109/69.553155.

Chernysh, S., Kim, S.I., Bekker, G., Pleskach, V.A., Filatova, N.A., Anikin, V.B., Platonov, V.G.,Bulet, P., 2002. Antiviral and antitumor peptides from insects. Proc. Natl. Acad. Sci. U.S. A. 99:12628–12632. http://dx.doi.org/10.1073/pnas.192301899.

Chugh, J.K., Wallace, B.A., 2001. Peptaibols: models for ion channels. Biochem. Soc. Trans.29, 565–570.

Dürr, U.H.N., Sudheendra, U.S., Ramamoorthy, A., 2006. LL-37, the only humanmember ofthe cathelicidin family of antimicrobial peptides. Biochim. Biophys. Acta 1758:1408–1425. http://dx.doi.org/10.1016/j.bbamem.2006.03.030.

Eddy, S.R., 1998. Profile hidden Markov models. Bioinformatics 14, 755–763.Fan, L., Sun, J., Zhou, M., Zhou, J., Lao, X., Zheng, H., Xu, H., 2016. DRAMP: a comprehensive

data repository of antimicrobial peptides. Sci. Rep. 6:24482. http://dx.doi.org/10.1038/srep24482.

Fedorova, M., van de Mortel, J., Matsumoto, P.A., Cho, J., Town, C.D., VandenBosch, K.A.,Gantt, J.S., Vance, C.P., 2002. Genome-wide identification of nodule-specific tran-scripts in the model legume Medicago truncatula. Plant Physiol. 130:519–537.http://dx.doi.org/10.1104/pp.006833.

Felizmenio-Quimio, M.E., Daly, N.L., Craik, D.J., 2001. Circular proteins in plants: solutionstructure of a novel macrocyclic trypsin inhibitor from Momordica cochinchinensis.J. Biol. Chem. 276:22875–22882. http://dx.doi.org/10.1074/jbc.M101666200.

Finn, R.D., Clements, J., Eddy, S.R., 2011. HMMERweb server: interactive sequence similar-ity searching. Nucleic Acids Res. 39:W29–W37. http://dx.doi.org/10.1093/nar/gkr367.

Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A.,Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E.L.L., Tate, J., Punta, M., 2014.Pfam: the protein families database. Nucleic Acids Res. 42:D222–D230. http://dx.doi.org/10.1093/nar/gkt1223.

Fjell, C.D., Hancock, R.E.W., Cherkasov, A., 2007. AMPer: a database and an automated dis-covery tool for antimicrobial peptides. Bioinformatics 23:1148–1155. http://dx.doi.org/10.1093/bioinformatics/btm068.

Fleischmann, C., Scherag, A., Adhikari, N.K.J., Hartog, C.S., Tsaganos, T., Schlattmann, P.,Angus, D.C., Reinhart, K., International Forum of Acute Care Trialists, 2016. Assess-ment of global incidence and mortality of hospital-treated sepsis. Current estimatesand limitations. Am. J. Respir. Crit. Care Med. 193:259–272. http://dx.doi.org/10.1164/rccm.201504-0781OC.

Franco, O.L., 2011. Peptide promiscuity: an evolutionary concept for plant defense. FEBSLett. 585:995–1000. http://dx.doi.org/10.1016/j.febslet.2011.03.008.

Galperin, M.Y., Koonin, E.V., 2012. Divergence and convergence in enzyme evolution.J. Biol. Chem. 287:21–28. http://dx.doi.org/10.1074/jbc.R111.241976.

Gautam, A., Chaudhary, K., Singh, S., Joshi, A., Anand, P., Tuknait, A., Mathur, D., Varshney,G.C., Raghava, G.P.S., 2014. Hemolytik: a database of experimentally determined he-molytic and non-hemolytic peptides. Nucleic Acids Res. 42:D444–D449. http://dx.doi.org/10.1093/nar/gkt1008.

Gesell, J., Zasloff, M., Opella, S.J., 1997. Two-dimensional 1H NMR experiments show thatthe 23-residue magainin antibiotic peptide is an alpha-helix indodecylphosphocholine micelles, sodium dodecylsulfate micelles, andtrifluoroethanol/water solution. J. Biomol. NMR 9, 127–135.

Gille, C., Goede, A., Preissner, R., Rother, K., Frömmel, C., 2000. Conservation of substruc-tures in proteins: interfaces of secondary structural elements in proteasomal sub-units. J. Mol. Biol. 299:1147–1154. http://dx.doi.org/10.1006/jmbi.2000.3763.

Ginalski, K., Elofsson, A., Fischer, D., Rychlewski, L., 2003. 3D-Jury: a simple approach toimprove protein structure predictions. Bioinformatics 19, 1015–1018.

Gogoladze, G., Grigolava, M., Vishnepolsky, B., Chubinidze, M., Duroux, P., Lefranc, M.P.,Pirtskhalava, M., 2014. DBAASP: database of antimicrobial activity and structure of

Page 12: Computational tools for exploring sequence databases as a ...download.xuebalib.com/xuebalib.com.24177.pdf · Research review paper Computational tools for exploring sequence databases

348 W.F. Porto et al. / Biotechnology Advances 35 (2017) 337–349

peptides. FEMS Microbiol. Lett. 357:63–68. http://dx.doi.org/10.1111/1574-6968.12489.

Graham, M.A., Silverstein, K.A.T., Cannon, S.B., VandenBosch, K.A., 2004. Computationalidentification and characterization of novel genes from legumes. Plant Physiol. 135:1179–1197. http://dx.doi.org/10.1104/pp.104.037531.

Hammami, R., Ben Hamida, J., Vergoten, G., Fliss, I., 2009. PhytAMP: a database dedicatedto antimicrobial plant peptides. Nucleic Acids Res. 37:D963–D968. http://dx.doi.org/10.1093/nar/gkn655.

Hammami, R., Zouhir, A., Le Lay, C., Ben Hamida, J., Fliss, I., 2010. BACTIBASE second re-lease: a database and tool platform for bacteriocin characterization. BMC Microbiol.10:22. http://dx.doi.org/10.1186/1471-2180-10-22.

Harris, P.W.R., Yang, S.-H., Molina, A., López, G., Middleditch, M., Brimble, M.A., 2014. Plantantimicrobial peptides snakin-1 and snakin-2: chemical synthesis and insights intothe disulfide connectivity. Chemistry 20:5102–5110. http://dx.doi.org/10.1002/chem.201303207.

Holm, L., Rosenström, P., 2010. Dali server: conservation mapping in 3D. Nucleic AcidsRes. 38:W545–W549. http://dx.doi.org/10.1093/nar/gkq366.

Janssen, B.J.C., Schirra, H.J., Lay, F.T., Anderson, M.A., Craik, D.J., 2003. Structure of Petuniahybrida defensin 1, a novel plant defensin with five disulfide bonds. Biochemistry 42:8214–8222. http://dx.doi.org/10.1021/bi034379o.

Jennings, C.V., Rosengren, K.J., Daly, N.L., Plan, M., Stevens, J., Scanlon, M.J., Waine, C.,Norman, D.G., Anderson, M.A., Craik, D.J., 2005. Isolation, solution structure, and in-secticidal activity of kalata B2, a circular protein with a twist: do Möbius stripsexist in nature? Biochemistry 44:851–860. http://dx.doi.org/10.1021/bi047837h.

Jonassen, I., 1997. Efficient discovery of conserved patterns using a pattern graph.Comput. Appl. Biosci. 13, 509–522.

Jones, P., Binns, D., Chang, H.-Y., Fraser, M., Li, W., McAnulla, C., McWilliam, H., Maslen, J.,Mitchell, A., Nuka, G., Pesseat, S., Quinn, A.F., Sangrador-Vegas, A., Scheremetjew, M.,Yong, S.-Y., Lopez, R., Hunter, S., 2014. InterProScan 5: genome-scale protein functionclassification. Bioinformatics 30:1236–1240. http://dx.doi.org/10.1093/bioinformatics/btu031.

Joseph, S., Karnik, S., Nilawe, P., Jayaraman, V.K., Idicula-Thomas, S., 2012. ClassAMP: aprediction tool for classification of antimicrobial peptides. IEEE/ACM Trans. Comput.Biol. Bioinform. 9:1535–1538. http://dx.doi.org/10.1109/TCBB.2012.89.

Karplus, K., Karchin, R., Draper, J., Casper, J., Mandel-Gutfreund, Y., Diekhans, M., Hughey,R., 2003. Combining local-structure, fold-recognition, and new fold methods for pro-tein structure prediction. Proteins 53 Suppl 6:491–496. http://dx.doi.org/10.1002/prot.10540.

Ke, T., Cao, H., Huang, J., Hu, F., Huang, J., Dong, C., Ma, X., Yu, J., Mao, H., Wang, X., Niu, Q.,Hui, F., Liu, S., 2015. EST-based in silico identification and in vitro test of antimicrobialpeptides in Brassica napus. BMC Genomics 16:653. http://dx.doi.org/10.1186/s12864-015-1849-x.

Klepeis, J.L., Lindorff-Larsen, K., Dror, R.O., Shaw, D.E., 2009. Long-timescale molecular dy-namics simulations of protein structure and function. Curr. Opin. Struct. Biol. 19:120–127. http://dx.doi.org/10.1016/j.sbi.2009.03.004.

Landon, C., Sodano, P., Hetru, C., Hoffmann, J., Ptak, M., 1997. Solution structure ofdrosomycin, the first inducible antifungal protein from insects. Protein Sci. 6:1878–1884. http://dx.doi.org/10.1002/pro.5560060908.

Landon, C., Barbault, F., Legrain, M., Guenneugues, M., Vovelle, F., 2008. Rational design ofpeptides active against the gram positive bacteria Staphylococcus aureus. Proteins 72:229–239. http://dx.doi.org/10.1002/prot.21912.

Lata, S., Sharma, B.K., Raghava, G.P.S., 2007. Analysis and prediction of antibacterial pep-tides. BMC Bioinform. 8:263. http://dx.doi.org/10.1186/1471-2105-8-263.

Lata, S., Mishra, N.K., Raghava, G.P.S., 2010. AntiBP2: improved version of antibacterialpeptide prediction. BMC Bioinform. 11 (Suppl. 1):S19. http://dx.doi.org/10.1186/1471-2105-11-S1-S19.

Lee, J., Wu, S., Zhang, Y., 2009. Ab initio protein structure prediction. In: Rigden, D.J. (Ed.),From Protein Structure to Function with Bioinformatics. Springer, pp. 3–25.

Lee, H.-T., Lee, C.-C., Yang, J.-R., Lai, J.Z.C., Chang, K.Y., 2015. A large-scale structural classi-fication of antimicrobial peptides. Biomed. Res. Int. 2015:475062. http://dx.doi.org/10.1155/2015/475062.

Loose, C., Jensen, K., Rigoutsos, I., Stephanopoulos, G., 2006. A linguistic model for the ra-tional design of antimicrobial peptides. Nature 443:867–869. http://dx.doi.org/10.1038/nature05233.

Mandal, S.M., Porto, W.F., Dey, P., Maiti, M.K., Ghosh, A.K., Franco, O.L., 2013. The attack ofthe phytopathogens and the trumpet solo: identification of a novel plant antifungalpeptide with distinct fold and disulfide bond pattern. Biochimie 95:1939–1948.http://dx.doi.org/10.1016/j.biochi.2013.06.027.

Mergaert, P., Nikovics, K., Kelemen, Z., Maunoury, N., Vaubert, D., Kondorosi, A.,Kondorosi, E., 2003. A novel family in Medicago truncatula consisting of more than300 nodule-specific genes coding for small, secreted polypeptides with conservedcysteine motifs. Plant Physiol. 132:161–173. http://dx.doi.org/10.1104/pp.102.018192.

Mizuguchi, K., Deane, C.M., Blundell, T.L., Overington, J.P., 1998. HOMSTRAD: a database ofprotein structure alignments for homologous families. Protein Sci. 7:2469–2471.http://dx.doi.org/10.1002/pro.5560071126.

Mulvenna, J.P., Sando, L., Craik, D.J., 2005. Processing of a 22 kDa precursor protein to pro-duce the circular protein tricyclon A. Structure 13:691–701. http://dx.doi.org/10.1016/j.str.2005.02.013.

Mulvenna, J.P., Mylne, J.S., Bharathi, R., Burton, R.A., Shirley, N.J., Fincher, G.B., Anderson,M.A., Craik, D.J., 2006. Discovery of cyclotide-like protein sequences in graminaceouscrop plants: ancestral precursors of circular proteins? Plant Cell 18:2134–2144.http://dx.doi.org/10.1105/tpc.106.042812.

Mygind, P.H., Fischer, R.L., Schnorr, K.M., Hansen, M.T., Sönksen, C.P., Ludvigsen, S.,Raventós, D., Buskov, S., Christensen, B., De Maria, L., Taboureau, O., Yaver, D., Elvig-Jørgensen, S.G., Sørensen, M.V., Christensen, B.E., Kjaerulff, S., Frimodt-Moller, N.,

Lehrer, R.I., Zasloff, M., Kristensen, H.-H., 2005. Plectasin is a peptide antibiotic withtherapeutic potential from a saprophytic fungus. Nature 437:975–980. http://dx.doi.org/10.1038/nature04051.

Nguyen, G.K.T., Lian, Y., Pang, E.W.H., Nguyen, P.Q.T., Tran, T.D., Tam, J.P., 2013. Discoveryof linear cyclotides in monocot plant Panicum laxum of Poaceae family provides newinsights into evolution and distribution of cyclotides in plants. J. Biol. Chem. 288:3370–3380. http://dx.doi.org/10.1074/jbc.M112.415356.

Nolde, S.B., Vassilevski, A.A., Rogozhin, E.A., Barinov, N.A., Balashova, T.A.,Samsonova, O.V., Baranov, Y.V., Feofanov, A.V., Egorov, T.A., Arseniev, A.S.,Grishin, E.V., 2011. Disulfide-stabilized helical hairpin structure and activity ofa novel antifungal peptide EcAMP1 from seeds of barnyard grass (Echinochloacrus-galli). J. Biol. Chem. 286:25145–25153. http://dx.doi.org/10.1074/jbc.M110.200378.

Novković, M., Simunic, J., Bojovic, V., Tossi, A., Juretic, D., 2012. DADP: the database of an-uran defense peptides. Bioinformatics 28:1406–1407. http://dx.doi.org/10.1093/bioinformatics/bts141.

Okubo, B.M., Silva, O.N., Migliolo, L., Gomes, D.G., Porto, W.F., Batista, C.L., Ramos, C.S.,Holanda, H.H.S., Dias, S.C., Franco, O.L., Moreno, S.E., 2012. Evaluation of an antimicro-bial L-amino acid oxidase and peptide derivatives from Bothropoides mattogrosensispitviper venom. PLoS One 7, e33639. http://dx.doi.org/10.1371/journal.pone.0033639.

Papareddy, P., Rydengård, V., Pasupuleti, M., Walse, B., Mörgelin, M., Chalupka, A.,Malmsten, M., Schmidtchen, A., 2010. Proteolysis of human thrombin generatesnovel host defense peptides. PLoS Pathog. 6, e1000857. http://dx.doi.org/10.1371/journal.ppat.1000857.

Pearson, W.R., 1990. Rapid and sensitive sequence comparison with FASTP and FASTA.Methods Enzymol. 183, 63–98.

Phoenix, D.A., Dennison, S.R., Harris, F., 2013. Antimicrobial peptides: their history, evolu-tion, and functional promiscuity. Antimicrobial Peptides. Wiley-VCH Verlag GmbH &Co. KGaA, Weinheim, Germany:pp. 1–37 http://dx.doi.org/10.1002/9783527652853.ch1.

Pinto, M.F.S., Almeida, R.G., Porto, W.F., Fensterseifer, I.C.M., Lima, L.a., Dias, S.C., Franco,O.L., 2011. Cyclotides: from gene structure to promiscuous multifunctionality.J. Evid. Based Complement. Alternat. Med. 17:40–53. http://dx.doi.org/10.1177/2156587211428077.

Piotto, S.P., Sessa, L., Concilio, S., Iannelli, P., 2012. YADAMP: yet another database of anti-microbial peptides. Int. J. Antimicrob. Agents 39:346–351. http://dx.doi.org/10.1016/j.ijantimicag.2011.12.003.

Polyanovsky, V.O., Roytberg, M.A., Tumanyan, V.G., 2011. Comparative analysis of thequality of a global algorithm and a local algorithm for alignment of two sequences.Algoritm. Mol. Biol. 6:25. http://dx.doi.org/10.1186/1748-7188-6-25.

Porto,W.F., Franco, O.L., 2013. Theoretical structural insights into the snakin/GASA family.Peptides 44:163–167. http://dx.doi.org/10.1016/j.peptides.2013.03.014.

Porto, W.F., Fernandes, F.C., Franco, O.L., 2010. An SVM model based on physicochemicalproperties to predict antimicrobial activity from protein sequences with cysteineknot motifs. In: Ferreira, C.E., Miyano, S., Stadler, P.F. (Eds.), Advances in Bioinformat-ics and Computational Biology. Lecture Notes in Computer Science. Springer, BerlinHeidelberg, Berlin, Heidelberg:pp. 59–62 http://dx.doi.org/10.1007/978-3-642-15060-9.

Porto, W.F., Pires, Á.S., Franco, O.L., 2012a. CS-AMPPred: an updated SVM model for anti-microbial activity prediction in cysteine-stabilized peptides. PLoS One 7, e51444.http://dx.doi.org/10.1371/journal.pone.0051444.

Porto, W.F., Silva, O.N., Franco, O.L., 2012b. Prediction and rational design of antimicrobialpeptides. In: Faraggi, E. (Ed.), Protein Structure:pp. 377–396 http://dx.doi.org/10.5772/2335 (InTech).

Porto, W.F., Souza, V.A., Nolasco, D.O., Franco, O.L., 2012c. In silico identification of novelhevein-like peptide precursors. Peptides 38:127–136. http://dx.doi.org/10.1016/j.peptides.2012.07.025.

Porto, W.F., Fensterseifer, G.M., Franco, O.L., 2014a. In silico identification, structural char-acterization, and phylogenetic analysis of MdesDEF-2: a novel defensin from the Hes-sian fly, Mayetiola destructor. J. Mol. Model. 20:2339. http://dx.doi.org/10.1007/s00894-014-2339-9.

Porto, W.F., Maria-Neto, S., Nolasco, D.O., Franco, O.L., 2014b. Screening and functionalprediction of conserved hypothetical proteins from Escherichia coli. J. ProteomicsBioinform. 7:203–213. http://dx.doi.org/10.4172/jpb.1000321.

Porto, W.F., Miranda, V.J., Pinto, M.F.S., Dohms, S.M., Franco, O.L., 2016. High-performancecomputational analysis and peptide screening from databases of cyclotides frompoaceae. Biopolymers 106, 109–118.

Randou, E.G., Veltri, D., Shehu, A., 2013. Systematic analysis of global features and modelbuilding for recognition of antimicrobial peptides. 2013 IEEE 3rd International Con-ference on Computational Advances in Bio and Medical Sciences (ICCABS). IEEE:pp. 1–6 http://dx.doi.org/10.1109/ICCABS.2013.6629215.

Rao, U., Stec, B., Teeter, M.M., 1995. Refinement of purothionins reveals solute particlesimportant for lattice formation and toxicity. Part 1: alpha1-purothionin revisited.Acta Crystallogr. D Biol. Crystallogr. 51:904–913. http://dx.doi.org/10.1107/S0907444995002964.

Rigden, D.J., 2011. Ab initio modeling led annotation suggests nucleic acid binding func-tion for many DUFs. OMICS 15:431–438. http://dx.doi.org/10.1089/omi.2010.0122.

Romagnoli, S., Ugolini, R., Fogolari, F., Schaller, G., Urech, K., Giannattasio, M., Ragona, L.,Molinari, H., 2000. NMR structural determination of viscotoxin A3 from Viscumalbum L. Biochem. J. 350 (Pt 2), 569–577.

Rondón-Villarreal, P., Sierra, D.A., Torres, R., 2014. Machine learning in the rational designof antimicrobial peptides. Curr. Comput. Aided Drug Des. 10, 183–190.

Roy, A., Yang, J., Zhang, Y., 2012. COFACTOR: an accurate comparative algorithm for struc-ture-based protein function annotation. Nucleic Acids Res. 40:W471–W477. http://dx.doi.org/10.1093/nar/gks372.

Page 13: Computational tools for exploring sequence databases as a ...download.xuebalib.com/xuebalib.com.24177.pdf · Research review paper Computational tools for exploring sequence databases

349W.F. Porto et al. / Biotechnology Advances 35 (2017) 337–349

Rozek, A., Friedrich, C.L., Hancock, R.E., 2000. Structure of the bovine antimicrobial peptideindolicidin bound to dodecylphosphocholine and sodium dodecyl sulfate micelles.Biochemistry 39, 15765–15774.

Schmitt, P., Wilmes, M., Pugnière, M., Aumelas, A., Bachère, E., Sahl, H.-G., Schneider, T.,Destoumieux-Garzón, D., 2010. Insight into invertebrate defensin mechanism of ac-tion: oyster defensins inhibit peptidoglycan biosynthesis by binding to lipid II.J. Biol. Chem. 285:29208–29216. http://dx.doi.org/10.1074/jbc.M110.143388.

Schneider, T., Kruse, T., Wimmer, R., Wiedemann, I., Sass, V., Pag, U., Jansen, A., Nielsen,A.K., Mygind, P.H., Raventós, D.S., Neve, S., Ravn, B., Bonvin, A.M.J.J., De Maria, L.,Andersen, A.S., Gammelgaard, L.K., Sahl, H.-G., Kristensen, H.-H., 2010. Plectasin, afungal defensin, targets the bacterial cell wall precursor lipid II. Science 328:1168–1172. http://dx.doi.org/10.1126/science.1185723.

Seebah, S., Suresh, A., Zhuo, S., Choong, Y.H., Chua, H., Chuon, D., Beuerman, R., Verma, C.,2007. Defensins knowledgebase: a manually curated database and informationsource focused on the defensins family of antimicrobial peptides. Nucleic Acids Res.35:D265–D268. http://dx.doi.org/10.1093/nar/gkl866.

Seshadri, S.V., Gabere, M.N., Pretorius, A., Adam, S., Christoffels, A., Lehväslaiho, M., Archer,J.A.C., Bajic, V.B., 2012. DAMPD: a manually curated antimicrobial peptide database.Nucleic Acids Res. 40:D1108–D1112. http://dx.doi.org/10.1093/nar/gkr1063.

Shi, J., Blundell, T.L., Mizuguchi, K., 2001. FUGUE: sequence-structure homology recogni-tion using environment-specific substitution tables and structure-dependent gappenalties. J. Mol. Biol. 310:243–257. http://dx.doi.org/10.1006/jmbi.2001.4762.

Siew, N., Elofsson, A., Rychlewski, L., Fischer, D., 2000. MaxSub: an automatedmeasure forthe assessment of protein structure prediction quality. Bioinformatics 16, 776–785.

Sigrist, C.J.A., de Castro, E., Cerutti, L., Cuche, B.A., Hulo, N., Bridge, A., Bougueleret, L.,Xenarios, I., 2013. New and continuing developments at PROSITE. Nucleic Acids Res.41:D344–D347. http://dx.doi.org/10.1093/nar/gks1067.

Sigurdardottir, T., Andersson, P., Davoudi, M., Malmsten, M., Schmidtchen, A., Bodelsson,M., 2006. In silico identification and biological evaluation of antimicrobial peptidesbased on human cathelicidin LL-37. Antimicrob. Agents Chemother. 50:2983–2989.http://dx.doi.org/10.1128/AAC.01583-05.

Silva, O.N., Mulder, K.C.L., Barbosa, A.E.A.D., Otero-Gonzalez, A.J., Lopez-Abarrategui, C.,Rezende, T.M.B., Dias, S.C., Franco, O.L., 2011. Exploring the pharmacological potentialof promiscuous host-defense peptides: from natural screenings to biotechnologicalapplications. Front. Microbiol. 2:232. http://dx.doi.org/10.3389/fmicb.2011.00232.

Silva, O.N., Porto, W.F., Migliolo, L., Mandal, S.M., Gomes, D.G., Holanda, H.H.S., Silva, R.S.P.,Dias, S.C., Costa, M.P., Costa, C.R., Silva, M.R., Rezende, T.M.B., Franco, O.L., 2012. Cn-AMP1: a new promiscuous peptide with potential for microbial infections treatment.Biopolymers 98:322–331. http://dx.doi.org/10.1002/bip.22071.

Silverstein, K.A.T., Graham, M.A., Paape, T.D., VandenBosch, K.A., 2005. Genome organiza-tion of more than 300 defensin-like genes in Arabidopsis. Plant Physiol. 138:600–610.http://dx.doi.org/10.1104/pp.105.060079.

Silverstein, K.A.T., Moskal, W.A., Wu, H.C., Underwood, B.A., Graham, M.A., Town, C.D.,VandenBosch, K.A., 2007. Small cysteine-rich peptides resembling antimicrobial pep-tides have been under-predicted in plants. Plant J. 51:262–280. http://dx.doi.org/10.1111/j.1365-313X.2007.03136.x.

Simons, K.T., Kooperberg, C., Huang, E., Baker, D., 1997. Assembly of protein tertiary struc-tures from fragments with similar local sequences using simulated annealing andBayesian scoring functions. J. Mol. Biol. 268:209–225. http://dx.doi.org/10.1006/jmbi.1997.0959.

Singh, S., Chaudhary, K., Dhanda, S.K., Bhalla, S., Usmani, S.S., Gautam, A., Tuknait, A.,Agrawal, P., Mathur, D., Raghava, G.P.S., 2016. SATPdb: a database of structurally an-notated therapeutic peptides. Nucleic Acids Res. 44, D1119–D1126.

Söding, J., 2005. Protein homology detection by HMM-HMM comparison. Bioinformatics21:951–960. http://dx.doi.org/10.1093/bioinformatics/bti125.

Stec, B., Rao, U., Teeter, M.M., 1995. Refinement of purothionins reveals solute particlesimportant for lattice formation and toxicity. Part 2: structure of beta-purothionin at1.7 A resolution. Acta Crystallogr. D Biol. Crystallogr. 51:914–924. http://dx.doi.org/10.1107/S0907444995002976.

Szilágyi, A., Skolnick, J., 2006. Efficient Prediction of Nucleic Acid Binding Function fromLow-resolution Protein Structures. J. Mol. Biol. 358:922–933. http://dx.doi.org/10.1016/j.jmb.2006.02.053.

Szyk, A., Wu, Z., Tucker, K., Yang, D., Lu, W., Lubkowski, J., 2006. Crystal structures ofhuman alpha-defensins HNP4, HD5, and HD6. Protein Sci. 15:2749–2760. http://dx.doi.org/10.1110/ps.062336606.

The UniProt Consortium, 2014. UniProt: a hub for protein information. Nucleic Acids Res.http://dx.doi.org/10.1093/nar/gku989.

Théolier, J., Fliss, I., Jean, J., Hammami, R., 2014. MilkAMP: a comprehensive database ofantimicrobial peptides of dairy origin. Dairy Sci. Technol. 94:181–193. http://dx.doi.org/10.1007/s13594-013-0153-2.

Thevenet, P., Shen, Y., Maupetit, J., Guyon, F., Derreumaux, P., Tuffery, P., 2012. PEP-FOLD:an updated de novo structure prediction server for both linear and disulfide bondedcyclic peptides. Nucleic Acids Res. 40:W288–W293. http://dx.doi.org/10.1093/nar/gks419.

Thomas, S., Karnik, S., Barai, R.S., Jayaraman, V.K., Idicula-Thomas, S., 2010. CAMP: a usefulresource for research on antimicrobial peptides. Nucleic Acids Res. 38:D774–D780.http://dx.doi.org/10.1093/nar/gkp1021.

Thompson, K., 1968. Programming techniques: regular expression search algorithm.Commun. ACM 11:419–422. http://dx.doi.org/10.1145/363347.363387.

Tian, C., Gao, B., Fang, Q., Ye, G., Zhu, S., 2010. Antimicrobial peptide-like genes in Nasoniavitripennis: a genomic perspective. BMC Genomics 11:187. http://dx.doi.org/10.1186/1471-2164-11-187.

Tomczak, A., Sontheimer, J., Drechsel, D., Hausdorf, R., Gentzel, M., Shevchenko, A., Eichler,S., Fahmy, K., Buchholz, F., Pisabarro, M.T., 2012. 3D profile-based approach to prote-ome-wide discovery of novel human chemokines. PLoS One 7, e36151. http://dx.doi.org/10.1371/journal.pone.0036151.

Tomita, Y., Kato, R., Okochi, M., Honda, H., 2008. Amotif detection and classificationmeth-od for peptide sequences using genetic programming. J. Biosci. Bioeng. 106:154–161.http://dx.doi.org/10.1263/jbb.106.154.

Tossi, A., Sandri, L., 2002. Molecular diversity in gene-encoded, cationic antimicrobialpolypeptides. Curr. Pharm. Des. 8, 743–761.

van der Weerden, N.L., Hancock, R.E.W., Anderson, M.A., 2010. Permeabilization of fungalhyphae by the plant defensin NaD1 occurs through a cell wall-dependent process.J. Biol. Chem. 285:37513–37520. http://dx.doi.org/10.1074/jbc.M110.134882.

Vlasak, R., Unger-Ullmann, C., Kreil, G., Frischauf, A.M., 1983. Nucleotide sequence ofcloned cDNA coding for honeybee prepromelittin. Eur. J. Biochem. 135, 123–126.

Waghu, F.H., Gopi, L., Barai, R.S., Ramteke, P., Nizami, B., Idicula-Thomas, S., 2014. CAMP:collection of sequences and structures of antimicrobial peptides. Nucleic Acids Res.42:D1154–D1158. http://dx.doi.org/10.1093/nar/gkt1157.

Wang, G., Li, X., Wang, Z., 2009. APD2: the updated antimicrobial peptide database and itsapplication in peptide design. Nucleic Acids Res. 37:D933–D937. http://dx.doi.org/10.1093/nar/gkn823.

Wang, C.K.L., Kaas, Q., Chiche, L., Craik, D.J., 2008. CyBase: a database of cyclic protein se-quences and structures, with applications in protein discovery and engineering.Nucleic Acids Res. 36:D206–D210. http://dx.doi.org/10.1093/nar/gkm953.

Wang, G., Elliott, M., Cogen, A.L., Ezell, E.L., Gallo, R.L., Hancock, R.E.W., 2012. Structure,dynamics, and antimicrobial and immune modulatory activities of human LL-23and its single-residue variants mutated on the basis of homologous primatecathelicidins. Biochemistry 51:653–664. http://dx.doi.org/10.1021/bi2016266.

Wieczorek, M., Jenssen, H., Kindrachuk, J., Scott, W.R.P., Elliott, M., Hilpert, K., Cheng, J.T.J.,Hancock, R.E.W., Straus, S.K., 2010. Structural studies of a peptide with immunemod-ulating and direct antimicrobial activity. Chem. Biol. 17:970–980. http://dx.doi.org/10.1016/j.chembiol.2010.07.007.

Whitmore, L., Wallace, B.A., 2004. The Peptaibol Database: a database for sequences andstructures of naturally occurring peptaibols. Nucleic Acids Res. 32:D593–D594.http://dx.doi.org/10.1093/nar/gkh077.

Wu, S., Zhang, Y., 2007. LOMETS: a local meta-threading-server for protein structure pre-diction. Nucleic Acids Res. 35:3375–3382. http://dx.doi.org/10.1093/nar/gkm251.

Wu, Y., Gao, B., Zhu, S., 2016. New fungal defensin-like peptides provide evidence for foldchange of proteins in evolution. Biosci. Rep. http://dx.doi.org/10.1042/BSR20160438.

Xiao, X., Wang, P., Lin, W.-Z., Jia, J.-H., Chou, K.-C., 2013. iAMP-2L: a two-level multi-labelclassifier for identifying antimicrobial peptides and their functional types. Anal.Biochem. 436:168–177. http://dx.doi.org/10.1016/j.ab.2013.01.019.

Xu, Y., Xu, D., 2000. Protein threading using PROSPECT: design and evaluation. Proteins40, 343–354.

Xu, D., Zhang, Y., 2012. Ab initio protein structure assembly using continuous structurefragments and optimized knowledge-based force field. Proteins 80:1715–1735.http://dx.doi.org/10.1002/prot.24065.

Yeung, H., Squire, C.J., Yosaatmadja, Y., Panjikar, S., López, G., Molina, A., Baker, E.N., Harris,P.W.R., Brimble, M.A., 2016. Radiation damage and racemic protein crystallographyreveal the unique structure of the GASA/snakin protein superfamily. Angew. Chem.Int. Ed. Engl. http://dx.doi.org/10.1002/anie.201602719.

Zasloff, M., 1987. Magainins, a class of antimicrobial peptides from Xenopus skin: isolation,characterization of two active forms, and partial cDNA sequence of a precursor. Proc.Natl. Acad. Sci. U. S. A. 84, 5449–5453.

Zhang, Y., 2008. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics9:40. http://dx.doi.org/10.1186/1471-2105-9-40.

Zhang, Y., Skolnick, J., 2004. Scoring function for automated assessment of protein struc-ture template quality. Proteins 57:702–710. http://dx.doi.org/10.1002/prot.20264.

Zhang, Z., Zhu, S., 2012. Comparative genomics analysis of five families of antimicrobialpeptide-like genes in seven ant species. Dev. Comp. Immunol. 38:262–274. http://dx.doi.org/10.1016/j.dci.2012.05.003.

Zhang, J., Li, J., Huang, Z., Yang, B., Zhang, X., Li, D., Craik, D.J., Baker, A.J.M., Shu, W., Liao, B.,2015. Transcriptomic screening for cyclotides and other cysteine-rich proteins in themetallophyte Viola baoshanensis. J. Plant Physiol. 178:17–26. http://dx.doi.org/10.1016/j.jplph.2015.01.017.

Zhao, X., Wu, H., Lu, H., Li, G., Huang, Q., 2013. LAMP: A Database Linking AntimicrobialPeptides. PLoS One 8:e66557. http://dx.doi.org/10.1371/journal.pone.0066557.

Zhou, H., Zhou, Y., 2004. Single-body residue-level knowledge-based energy score com-bined with sequence-profile and secondary structure information for fold recogni-tion. Proteins 55:1005–1013. http://dx.doi.org/10.1002/prot.20007.

Zhou, H., Zhou, Y., 2005. Fold recognition by combining sequence profiles derived fromevolution and from depth-dependent structural alignment of fragments. Proteins58:321–328. http://dx.doi.org/10.1002/prot.20308.

Zhu, S., 2008. Discovery of six families of fungal defensin-like peptides provides insightsinto origin and evolution of the CSalphabeta defensins. Mol. Immunol. 45:828–838.http://dx.doi.org/10.1016/j.molimm.2007.06.354.

Zhu, S., Gao, B., Harvey, P.J., Craik, D.J., 2012. Dermatophytic defensin with antiinfectivepotential. Proc. Natl. Acad. Sci. U. S. A. 109:8495–8500. http://dx.doi.org/10.1073/pnas.1201263109.

Page 14: Computational tools for exploring sequence databases as a ...download.xuebalib.com/xuebalib.com.24177.pdf · Research review paper Computational tools for exploring sequence databases

本文献由“学霸图书馆-文献云下载”收集自网络,仅供学习交流使用。

学霸图书馆(www.xuebalib.com)是一个“整合众多图书馆数据库资源,

提供一站式文献检索和下载服务”的24 小时在线不限IP

图书馆。

图书馆致力于便利、促进学习与科研,提供最强文献下载服务。

图书馆导航:

图书馆首页 文献云下载 图书馆入口 外文数据库大全 疑难文献辅助工具